Back

Universal Baseline for in vitro Selection of Genetically Encoded Libraries

Yan, K.; Lima, G. M.; Bahadur, T.; Albert, V.; O'Gara, Z.; Bao, G.; Kossmann, C.; Kirby, W.; Mejia, F. B.; Michnik, M. L.; Maiorana, K.; Derda, R.

2026-02-15 biochemistry
10.64898/2026.02.14.705946 bioRxiv
Show abstract

Genetically encoded (GE) libraries enable identification of high-affinity ligands for diverse molecular targets through iterative in vitro selection and DNA sequencing or next-generation sequencing (NGS). Despite their impact in therapeutic development, a systematic framework for evaluating reproducibility in GE-molecular discoveries remains limited. To aid such analysis, we introduce the concept of baseline response, which reproducibly partitions active and inactive members of in vitro selection. The baseline response is provided by spiking a random DNA-barcoded population. We calibrated the baseline concept using Bioconductor EdgeR differential enrichment (DE) analysis of NGS of phage-displayed selection on oligosaccharide chitin and hepatitis virus NS3a* protease as model targets. We further show that mixing discovery campaigns also offers an effective baseline: chitin-enriched peptides serve as a baseline for DE-analysis of NS3a* selection and NS3a*-enriched peptides serve as a baseline for chitin binders. We applied baseline-stratified DE-analysis to 66 parallel selections performed in 3-5 replicates across 22 extracellular targets, including HER1-3, EpCAM, CAIX, PD-L1, and eight integrin receptors. Automated DE-analysis across hundreds of NGS files produced hits validated in a secondary screen and yielded synthetic macrocyclic ligands with mid-nanomolar affinity confirmed in 2-3 biophysical assays. For PD-L1, we further demonstrated how baseline-calibrated NGS data provide decision-enabling information for optimization of peptide macrocycles to yield potent single-digit nanomolar ligands for the cell-surface receptor. We anticipate that baseline-based analyses of NGS data from in vitro selection procedures will offer a scalable framework for reproducible hit discovery and standardized analysis across diverse in vitro selection campaigns. Significance StatementGenetically encoded selection technologies such as phage, mRNA and ribosome display, have produced FDA-approved therapeutics and numerous clinical candidates. Yet reproducibility in such in vitro discovery systems is rarely evaluated against a defined experimental baseline. Here, we establish a universal baseline by spiking unrelated, DNA-barcoded peptide sequences into selection libraries and quantifying their binding alongside target-enriched populations. This composition-agnostic strategy enables rigorous normalization, confidence assessment, and cross-target comparison of molecular discovery outcomes. Our framework introduces practical standards for reproducibility and statistical benchmarking across genetically encoded display platforms.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 3%
14.6%
2
Nature Biotechnology
147 papers in training set
Top 0.9%
8.3%
3
Cell Systems
167 papers in training set
Top 2%
6.3%
4
Cell Reports Methods
141 papers in training set
Top 0.4%
6.3%
5
Nature Communications
4913 papers in training set
Top 33%
4.8%
6
Molecular & Cellular Proteomics
158 papers in training set
Top 0.6%
4.3%
7
mAbs
28 papers in training set
Top 0.1%
4.1%
8
Nucleic Acids Research
1128 papers in training set
Top 5%
3.9%
50% of probability mass above
9
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
10
eLife
5422 papers in training set
Top 29%
3.0%
11
Science
429 papers in training set
Top 13%
1.9%
12
Journal of the American Chemical Society
199 papers in training set
Top 3%
1.9%
13
Cell Chemical Biology
81 papers in training set
Top 1%
1.8%
14
Molecular Systems Biology
142 papers in training set
Top 0.6%
1.7%
15
Nature Methods
336 papers in training set
Top 4%
1.7%
16
Cell Genomics
162 papers in training set
Top 3%
1.7%
17
ACS Chemical Biology
150 papers in training set
Top 1%
1.7%
18
BMC Genomics
328 papers in training set
Top 3%
1.6%
19
ACS Central Science
66 papers in training set
Top 1%
1.2%
20
Science Advances
1098 papers in training set
Top 23%
1.2%
21
Nature Chemical Biology
104 papers in training set
Top 2%
1.2%
22
PLOS ONE
4510 papers in training set
Top 62%
1.1%
23
Scientific Reports
3102 papers in training set
Top 71%
0.9%
24
Bioinformatics
1061 papers in training set
Top 9%
0.9%
25
Communications Biology
886 papers in training set
Top 19%
0.9%
26
Nature
575 papers in training set
Top 15%
0.8%
27
Protein Science
221 papers in training set
Top 2%
0.8%
28
Journal of Cell Biology
333 papers in training set
Top 4%
0.7%
29
Nature Protocols
30 papers in training set
Top 0.2%
0.7%
30
Cell Reports
1338 papers in training set
Top 34%
0.7%