Back

Deep Learning Structural Ensembles as Proxies for Protein Flexibility

Tunc, M. T.; Dizkirici Tekpinar, A.; Tekpinar, M.

2026-05-18 bioinformatics
10.64898/2026.05.16.725658 bioRxiv
Show abstract

Protein dynamics are essential to biological function, yet understanding whether deep learning models contain information about these dynamics remains an open question. In this study, we quantitatively investigate the capacity of deep learning structure generation methods to predict protein flexibilities by directly comparing residue-level mean squared fluctuation (MSF) profiles derived from structural ensembles with experimental or simulation-informed flexibility profiles. We assembled four diverse benchmark datasets representing different types of structural information, including 70 NMR ensembles, 43 X-ray crystallographic protein pairs in two distinct conformational states, 82 high-resolution cryo-EM structures, and molecular dynamics simulations of 10 proteins. Utilizing AlphaFold3, AlphaFold2, and RosettaFold to generate multiple structural models, we applied ranksort normalization to place the profiles on a comparable scale and quantified similarity primarily using cosine and Pearson similarities. Our results demonstrate that the flexibility predictions from deep learning-generated models agree well with experimental data, suggesting that fluctuations in these predicted ensembles can serve as effective proxies for protein flexibility. Notably, AlphaFold3 consistently produced the best results across the datasets. We also observed that flexibility prediction accuracy generally improves as the number of models increases up to 15, and our findings remain robust even when terminal residues are excluded from the analysis. To facilitate broader application, we provide three publicly accessible Jupyter Notebooks to calculate MSF from deep learning outputs. Ultimately, this work provides evidence that deep learning structural ensembles can serve as proxies for protein flexibility.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.3%
16.9%
2
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.1%
11.9%
3
The Journal of Physical Chemistry B
158 papers in training set
Top 0.1%
9.7%
4
PLOS Computational Biology
1633 papers in training set
Top 4%
8.1%
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.7%
50% of probability mass above
6
Protein Science
221 papers in training set
Top 0.4%
3.5%
7
Bioinformatics
1061 papers in training set
Top 6%
3.5%
8
Biophysical Journal
545 papers in training set
Top 2%
3.5%
9
Scientific Reports
3102 papers in training set
Top 42%
3.0%
10
Nature Communications
4913 papers in training set
Top 43%
3.0%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.5%
12
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.4%
1.7%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 34%
1.6%
14
Cell Systems
167 papers in training set
Top 8%
1.6%
15
Communications Chemistry
39 papers in training set
Top 0.5%
1.3%
16
Journal of Molecular Biology
217 papers in training set
Top 3%
1.1%
17
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.1%
18
Journal of Cheminformatics
25 papers in training set
Top 0.5%
0.9%
19
International Journal of Molecular Sciences
453 papers in training set
Top 12%
0.9%
20
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
0.9%
21
Structure
175 papers in training set
Top 3%
0.9%
22
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
23
Advanced Science
249 papers in training set
Top 20%
0.7%
24
Communications Biology
886 papers in training set
Top 26%
0.7%
25
eLife
5422 papers in training set
Top 60%
0.7%
26
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
27
Journal of Structural Biology
58 papers in training set
Top 2%
0.6%
28
Nature Computational Science
50 papers in training set
Top 2%
0.6%
29
Physical Biology
43 papers in training set
Top 3%
0.6%
30
Journal of Computational Chemistry
11 papers in training set
Top 0.3%
0.6%