Back

Integrating Diffusion and Liquid AI Models for Predicting Peptide Affinity from mRNA Display Selections

Leaf, C. M.; Qi, P.; Gandhi, Y. P.; Jalali-Yazdi, F.; Ong, J. N.; Takahashi, T. T.; Kalia, R.; Roberts, R. W.

2026-05-11 bioengineering
10.64898/2026.05.05.723033 bioRxiv
Show abstract

In vitro selection and directed evolution technologies such as mRNA display, explore large libraries ([≥]1014 variants) and generate thousands to millions of functional polypeptide ligands to a variety of targets. Denoising diffusion implicit machine learning models (DDIMs) trained using display-derived deep sequencing data can greatly expand these functional sequences beyond what is accessible experimentally. However, methods are needed to predict peptide properties such as binding free energies ({Delta}G{degrees}). Here, we applied machine learning methods to predict binding free energies of both experimental and DDIM-generated peptide ligands against a target of interest, the oncogenic protein Bcl-xL. To do this, we trained a Closed-form Continuous (CfC) neural network using a dataset of 15,700 peptide ligands where pairs of sequences and their corresponding binding free energies ({Delta}G{degrees}) were used as inputs. This type of model was chosen due to its ability to represent irregular series. The resulting CfC model accurately predicts the rank order, within error, and binding free energies ({Delta}G{degrees}) for both experimental and DDIM-generated peptides, identifying five DDIM-generated peptides with single-digit picomolar affinities. Combining trained DDIM and CfC models offers a unified route to expand the scope of experimental ligand discovery, predict the molecular properties of both experimental and generated ligands, and highlights the utility of large quantitative datasets for making accurate in silico predictions of high-affinity peptide candidates. StatementHigh-throughput sequencing analysis of mRNA display libraries enables generating novel peptide ligands and expands the scope of functional sequences beyond what is accessible experimentally. Closed-form Continuous neural networks trained using sequences and their corresponding free energies accurately predict the binding free energies of both experimental and machine learning-generated peptides, enabling a route to quantitatively predict peptide properties using directed evolution data.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.2%
23.0%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 5%
10.7%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
7.0%
4
Nature Machine Intelligence
61 papers in training set
Top 0.4%
6.5%
5
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1.0%
4.4%
50% of probability mass above
6
eLife
5422 papers in training set
Top 21%
4.1%
7
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
4.1%
8
Cell Genomics
162 papers in training set
Top 1%
3.7%
9
Scientific Reports
3102 papers in training set
Top 49%
2.1%
10
ACS Synthetic Biology
256 papers in training set
Top 1%
1.8%
11
Nature Communications
4913 papers in training set
Top 50%
1.8%
12
Bioinformatics
1061 papers in training set
Top 7%
1.7%
13
Cell Reports
1338 papers in training set
Top 23%
1.7%
14
Journal of The Royal Society Interface
189 papers in training set
Top 2%
1.7%
15
iScience
1063 papers in training set
Top 14%
1.7%
16
Advanced Science
249 papers in training set
Top 14%
1.3%
17
Biophysical Journal
545 papers in training set
Top 4%
1.0%
18
Angewandte Chemie International Edition
81 papers in training set
Top 3%
1.0%
19
Evolutionary Applications
91 papers in training set
Top 0.9%
1.0%
20
Protein Engineering, Design and Selection
14 papers in training set
Top 0.1%
0.9%
21
Science Advances
1098 papers in training set
Top 31%
0.7%
22
mAbs
28 papers in training set
Top 0.4%
0.7%
23
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 1%
0.7%
24
Journal of the American Chemical Society
199 papers in training set
Top 5%
0.7%
25
International Journal of Molecular Sciences
453 papers in training set
Top 16%
0.7%
26
Protein Science
221 papers in training set
Top 2%
0.7%
27
Nucleic Acids Research
1128 papers in training set
Top 19%
0.7%
28
Cell Reports Methods
141 papers in training set
Top 7%
0.5%
29
Communications Biology
886 papers in training set
Top 31%
0.5%
30
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.5%