Back

Do AI Models for Protein Structure Prediction Get Electrostatics Right?

Makhatadze, G. I.

2026-03-13 biophysics
10.64898/2026.03.11.711144 bioRxiv
Show abstract

A variant of the U1A protein containing four substitutions to ionizable residues was generated serendipitously due to a miscommunication. Biophysical measurements show that this variant has at least twice as much helical structure as the wild-type U1A and is trimeric in solution, in contrast to the monomeric wild type. In sharp contrast, structures predicted by deep-learning AI tools (AlphaFold2 and RoseTTAFold2) and transformer-based tools (OmegaFold and ESMFold) are all highly similar to the wild-type U1A (backbone RMSD < 1 [A]). Even more surprising, two of the substituted ionizable residues are predicted to be fully buried in the non-polar core of the protein, an outcome that contradicts well-established physico-chemical principles, as ionizable residues are normally located on the protein surface. To explore this effect further, we generated sequences containing up to all twelve residues that make up the non-polar core of U1A. Across thousands of sequences, and depending on the AI model used, the majority of predicted structures contained fully buried ionizable residues while still maintaining the overall U1A fold. We then examined two additional proteins of comparable size, acylphosphatase and the de novo-designed TOP7 fold, and observed the same phenomenon: AI models frequently predicted structures with buried ionizable residues that nevertheless retained the parent fold. When these AI-predicted structures were subjected to short (50 ns) molecular dynamics simulations using physics-based force fields such as CHARMM or AMBER, the structures rapidly relaxed into ensembles that exposed ionizable residues. We conclude that while AI-based structure prediction tools perform extremely well on naturally occurring sequences, they do not reliably encode the physico-chemical principles governing the placement of ionizable residues. A straightforward remedy is to include a brief molecular dynamics simulation as a final validation step for AI-generated structures.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.1%
18.3%
2
Protein Science
221 papers in training set
Top 0.1%
12.1%
3
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.1%
10.3%
4
Structure
175 papers in training set
Top 0.2%
9.9%
50% of probability mass above
5
Biophysical Journal
545 papers in training set
Top 1.0%
6.3%
6
PLOS Computational Biology
1633 papers in training set
Top 7%
4.8%
7
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
4.8%
8
Frontiers in Molecular Biosciences
100 papers in training set
Top 0.5%
3.5%
9
The Journal of Physical Chemistry B
158 papers in training set
Top 0.7%
3.0%
10
Scientific Reports
3102 papers in training set
Top 44%
2.7%
11
Journal of Molecular Biology
217 papers in training set
Top 1%
2.0%
12
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.2%
1.9%
13
eLife
5422 papers in training set
Top 40%
1.8%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 34%
1.6%
15
Physical Biology
43 papers in training set
Top 1%
1.5%
16
IUCrJ
29 papers in training set
Top 0.2%
1.5%
17
PLOS ONE
4510 papers in training set
Top 67%
0.8%
18
Journal of General Physiology
56 papers in training set
Top 0.1%
0.8%
19
Journal of Structural Biology
58 papers in training set
Top 2%
0.7%
20
The Journal of Physical Chemistry Letters
58 papers in training set
Top 2%
0.7%
21
Chemical Science
71 papers in training set
Top 2%
0.6%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%
23
Nature Communications
4913 papers in training set
Top 66%
0.6%
24
Biochemistry
130 papers in training set
Top 2%
0.6%