Back

Aiki-GeNano: Multi-Stage Preference Optimization for Generative Design of Developable Nanobodies

Meda, R. S.; Doshi, J.; Iyer, E.; Shastry, S.; Mysore, V.

2026-05-01 bioinformatics
10.64898/2026.04.28.721526 bioRxiv
Show abstract

Therapeutic nanobodies must combine target binding with biophysical and chemical properties that determine manufacturability, stability, and clinical viability, collectively termed developability, yet most computational design pipelines still treat developability as a post-hoc filter rather than an integrated training objective. We present Aiki-GeNano, a three-stage language-model alignment pipeline for epitope-conditioned nanobody generation that integrates multiple developability signals directly into training, using only sequence information and previously published predictors. Across 65 target epitopes and relative to the supervised baseline, the combined pipeline raised predicted mean melting temperature by 6.6 {degrees}C, halved isomerization-motif severity, reduced deamidation, N-glycosylation sequons and CDR methionine-oxidation motifs, and preserved predicted humanness and solubility. On a shared 10-target GPCR benchmark, Aiki-GeNano achieved the highest predicted melting temperature and the lowest isomerization severity among five contemporary VHH generators. Starting from ProtGPT2 and a 1.35-million-pair binder dataset generated on an mRNA-display platform, the pipeline applies supervised fine-tuning, Direct Preference Optimization on 522,800 pairs ranked by a composite of selectivity, predicted thermal stability, solubility, and humanness, and Group Reward-Decoupled Policy Optimization against six sequence-based rewards (FR2 hydrophobicity, hydrophobic-patch coverage, chemical-liability motifs, Wilkinson-Harrison expression probability, VHH hallmark residues, scaffold integrity). Generated sequences differ from the nearest training sequence by a mean of 8.1-9.0 amino acids out of 126, and two alternative training trajectories converge to distinct amino-acid-composition strategies with similar liability outcomes but different thermal-stability gains, indicating initialization-dependent convergence of the reward-optimized policy. Predicted humanness was preserved at the level of the camelid VHH scaffold of the training library -- a data-side limitation rather than a methodological one, since the framework was effectively constant across all preference pairs. Applicability to the drug discovery and development pipeline, limitations of predicted-property evaluation, and future work are discussed.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.5%
17.1%
2
Nature Biotechnology
147 papers in training set
Top 1%
8.2%
3
Nature Methods
336 papers in training set
Top 2%
6.2%
4
mAbs
28 papers in training set
Top 0.1%
6.2%
5
Bioinformatics
1061 papers in training set
Top 4%
6.2%
6
Nature Communications
4913 papers in training set
Top 34%
4.7%
7
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.2%
50% of probability mass above
8
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
3.5%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.5%
10
Advanced Science
249 papers in training set
Top 8%
2.5%
11
Frontiers in Immunology
586 papers in training set
Top 3%
2.0%
12
Science
429 papers in training set
Top 13%
1.8%
13
PLOS Computational Biology
1633 papers in training set
Top 15%
1.7%
14
Cell Genomics
162 papers in training set
Top 4%
1.7%
15
Cell Reports Methods
141 papers in training set
Top 3%
1.7%
16
Patterns
70 papers in training set
Top 1%
1.7%
17
Nucleic Acids Research
1128 papers in training set
Top 12%
1.4%
18
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.4%
19
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
20
Cell Reports Medicine
140 papers in training set
Top 6%
1.2%
21
Chemical Science
71 papers in training set
Top 2%
0.9%
22
Genome Medicine
154 papers in training set
Top 7%
0.9%
23
Structure
175 papers in training set
Top 3%
0.9%
24
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 42%
0.9%
25
Communications Biology
886 papers in training set
Top 20%
0.9%
26
eLife
5422 papers in training set
Top 54%
0.9%
27
Science Advances
1098 papers in training set
Top 29%
0.8%
28
Journal of Molecular Biology
217 papers in training set
Top 3%
0.8%
29
PLOS ONE
4510 papers in training set
Top 69%
0.7%
30
ACS Synthetic Biology
256 papers in training set
Top 3%
0.7%