Back

How AlphaFold and related models predict protein-peptide complex structures

Guan, L.; Keating, A. E.

2025-06-24 bioinformatics
10.1101/2025.06.18.660495 bioRxiv
Show abstract

Protein-peptide interactions mediate many biological processes, and access to accurate structural models, through experimental determination or reliable computational prediction, is essential for understanding protein function and designing novel protein-protein interactions. AlphaFold2-Multimer (AF2-Multimer), AlphaFold3 (AF3), and related models such as Boltz-1 and Chai-1 are state-of-the-art protein structure predictors that successfully predict protein-peptide complex structures. Using a dataset of experimentally resolved protein-peptide structures, we analyzed the performance of these four structure prediction models to understand how they work. We found evidence of bias for previously seen structures, suggesting that models may struggle to generalize to novel target proteins or binding sites. We probed how models use the protein and peptide multiple sequence alignments (MSAs), which are often shallow or of poor quality for peptide sequences. We found weak evidence that models use coevolutionary information from paired MSAs and found that both the target and peptide unpaired MSAs contribute to performance. Our work highlights the promise of deep learning for peptide docking and the importance of diverse representation of interface geometries in the training data for optimal prediction performance.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.1%
22.5%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.3%
18.6%
3
PLOS Computational Biology
1633 papers in training set
Top 4%
8.4%
4
Structure
175 papers in training set
Top 0.4%
6.4%
50% of probability mass above
5
Bioinformatics
1061 papers in training set
Top 4%
6.4%
6
Protein Science
221 papers in training set
Top 0.3%
4.3%
7
Bioinformatics Advances
184 papers in training set
Top 1%
4.0%
8
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.3%
3.6%
9
The Journal of Physical Chemistry B
158 papers in training set
Top 0.6%
3.6%
10
Biophysical Journal
545 papers in training set
Top 3%
1.9%
11
Journal of Cheminformatics
25 papers in training set
Top 0.4%
1.3%
12
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
13
Scientific Reports
3102 papers in training set
Top 66%
1.2%
14
Cell Systems
167 papers in training set
Top 10%
0.9%
15
Journal of Molecular Biology
217 papers in training set
Top 3%
0.9%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
17
Nature Communications
4913 papers in training set
Top 60%
0.9%
18
PLOS ONE
4510 papers in training set
Top 66%
0.8%
19
eLife
5422 papers in training set
Top 56%
0.8%
20
Frontiers in Bioinformatics
45 papers in training set
Top 0.9%
0.7%
21
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.6%