Back

GROQ-seq Datasets Across Transcription Factors (LacI, RamR, VanR), T7 RNA Polymerase and TEV Protease

Spinner, A.; Sreenivasan, S.; McLellan, J. R.; Ikonomova, S. P.; Cortade, D. L.; dOelsnitz, S.; Sheldon, K.; Vasilyeva, O. B.; Alperovich, N. Y.; Chadha, A.; Nematollahi, L.; Dhroso, A.; Sisson, Z.; Hudson, C. M.; DeBenedictis, E.; Kelly, P. J.; Reider Apel, A.; Ross, D.; Baranowski, C.

2026-04-18 bioengineering

10.64898/2026.04.15.718744 bioRxiv

Show abstract

Predicting any proteins function from its sequence alone would be a significant breakthrough in molecular biology. Although machine learning approaches have sought to tackle this, their limited generalizability reflects the absence of sufficiently large, open, diverse, and unified datasets. To address this data gap, we developed a high-throughput experimental platform called GROQ-seq (Growth-based Quantitative Sequencing). In GROQ-seq, a proteins function can be linked to a sequencing-based readout that enables scalable characterization of large variant libraries in Escherichia coli. Here, we present pilot datasets demonstrating its performance across three distinct protein function classes: transcription factors, polymerases, and proteases. The objective of this report is to present the datasets and to provide users with a clear and transparent characterization of their properties, including both the strengths and limitations.

GROQ-seq Datasets Across Transcription Factors (LacI, RamR, VanR), T7 RNA Polymerase and TEV Protease

Matching journals