Universal Baseline for in vitro Selection of Genetically Encoded Libraries
Yan, K.; Lima, G. M.; Bahadur, T.; Albert, V.; O'Gara, Z.; Bao, G.; Kossmann, C.; Kirby, W.; Mejia, F. B.; Michnik, M. L.; Maiorana, K.; Derda, R.
Show abstract
Genetically encoded (GE) libraries enable identification of high-affinity ligands for diverse molecular targets through iterative in vitro selection and DNA sequencing or next-generation sequencing (NGS). Despite their impact in therapeutic development, a systematic framework for evaluating reproducibility in GE-molecular discoveries remains limited. To aid such analysis, we introduce the concept of baseline response, which reproducibly partitions active and inactive members of in vitro selection. The baseline response is provided by spiking a random DNA-barcoded population. We calibrated the baseline concept using Bioconductor EdgeR differential enrichment (DE) analysis of NGS of phage-displayed selection on oligosaccharide chitin and hepatitis virus NS3a* protease as model targets. We further show that mixing discovery campaigns also offers an effective baseline: chitin-enriched peptides serve as a baseline for DE-analysis of NS3a* selection and NS3a*-enriched peptides serve as a baseline for chitin binders. We applied baseline-stratified DE-analysis to 66 parallel selections performed in 3-5 replicates across 22 extracellular targets, including HER1-3, EpCAM, CAIX, PD-L1, and eight integrin receptors. Automated DE-analysis across hundreds of NGS files produced hits validated in a secondary screen and yielded synthetic macrocyclic ligands with mid-nanomolar affinity confirmed in 2-3 biophysical assays. For PD-L1, we further demonstrated how baseline-calibrated NGS data provide decision-enabling information for optimization of peptide macrocycles to yield potent single-digit nanomolar ligands for the cell-surface receptor. We anticipate that baseline-based analyses of NGS data from in vitro selection procedures will offer a scalable framework for reproducible hit discovery and standardized analysis across diverse in vitro selection campaigns. Significance StatementGenetically encoded selection technologies such as phage, mRNA and ribosome display, have produced FDA-approved therapeutics and numerous clinical candidates. Yet reproducibility in such in vitro discovery systems is rarely evaluated against a defined experimental baseline. Here, we establish a universal baseline by spiking unrelated, DNA-barcoded peptide sequences into selection libraries and quantifying their binding alongside target-enriched populations. This composition-agnostic strategy enables rigorous normalization, confidence assessment, and cross-target comparison of molecular discovery outcomes. Our framework introduces practical standards for reproducibility and statistical benchmarking across genetically encoded display platforms.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.