Back

Influence of molecular representation and charge on protein-ligand structural predictions by popular co-folding methods

Bugrova, A.; Orekhov, P.; Gushchin, I.

2026-02-18 bioinformatics
10.64898/2026.02.18.706547 bioRxiv
Show abstract

Recently developed deep learning-based tools can effectively generate structural models of complexes of proteins and non-proteinaceous compounds. While some of their predictive capabilities are truly exciting, others remain to be thoroughly tested. Here, we probe whether the ligand input format (Chemical Component Dictionary, CCD, or Simplified Molecular Input Line Entry System, SMILES) and charge (which depends on protonation) will affect the results of the predictions by four popular algorithms: AlphaFold 3, Boltz-2, Chai-1, and Protenix-v1. We chose methylamine and acetic acid as two of the simplest titratable chemicals that are omnipresent in proteins as amino and carboxy moieties, and are consequently ubiquitous in the Protein Data Bank models that are most commonly used for training. Unexpectedly, we found that for both molecules, in many cases the input format affected the prediction results, and did it much stronger compared to protonation, whereas changes in the formally specified charge of the molecules did not lead to changes in binding expected from experiments. We conclude that (i) ensuring identical results irrespective of input formats and (ii) inclusion of protonation-related steps into training and prediction pipelines are the two available paths for improvement of protein-ligand structure prediction algorithms.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.2%
19.6%
2
PLOS Computational Biology
1633 papers in training set
Top 4%
7.2%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.8%
4.9%
4
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.1%
4.9%
5
Journal of Cheminformatics
25 papers in training set
Top 0.1%
4.3%
6
Protein Science
221 papers in training set
Top 0.3%
4.3%
7
Molecules
37 papers in training set
Top 0.2%
3.6%
8
International Journal of Molecular Sciences
453 papers in training set
Top 3%
3.6%
50% of probability mass above
9
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
2.4%
10
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.4%
2.4%
11
Scientific Reports
3102 papers in training set
Top 47%
2.4%
12
The Journal of Physical Chemistry Letters
58 papers in training set
Top 0.6%
2.1%
13
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.1%
14
The Journal of Physical Chemistry B
158 papers in training set
Top 0.9%
1.9%
15
Journal of Molecular Biology
217 papers in training set
Top 1%
1.9%
16
Chemical Science
71 papers in training set
Top 0.8%
1.9%
17
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
18
PLOS ONE
4510 papers in training set
Top 52%
1.8%
19
Bioinformatics
1061 papers in training set
Top 7%
1.8%
20
Biomolecules
95 papers in training set
Top 0.5%
1.7%
21
Structure
175 papers in training set
Top 2%
1.7%
22
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.7%
23
Communications Chemistry
39 papers in training set
Top 0.6%
1.1%
24
ACS Omega
90 papers in training set
Top 3%
1.0%
25
Frontiers in Bioinformatics
45 papers in training set
Top 0.6%
0.9%
26
Journal of Computational Chemistry
11 papers in training set
Top 0.2%
0.8%
27
PeerJ
261 papers in training set
Top 15%
0.8%
28
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.2%
0.8%
29
Nature Communications
4913 papers in training set
Top 64%
0.7%
30
Journal of Structural Biology
58 papers in training set
Top 2%
0.7%