Back

AI-derived Protein Structures Validation: AlphaFold2 Models in the Twilight Zone

Griffin, P.; Deganutti, G.; Jadeja, K.; Idigbe, C.; Pipito', L.; Mejuto, L.; Ng, C. P.; Peck, S.; Greaves, J.; Reynolds, C. A.

2026-05-12 bioinformatics
10.64898/2026.05.12.724499 bioRxiv
Show abstract

In any field, unquestioningly accepting artificial intelligence (AI) results should be considered bad practise. Here, we devised a comparative modelling-based strategy for validating protein structures that exploits the well-known observation that protein folds are far more conserved than protein sequences. We identify proteins with a similar fold to the AlphaFold-generated query protein and determine their structural alignment to the query. The hypothesis is that if the sequence alignment coincides with the structural alignment, then the structure is validated. The strategy is implemented on a helix-by-helix and strand-by-strand basis using a multi-template pairwise local profile alignment method that works well into the twilight zone. The method is illustrated by application to the transmembrane transporter PEPT1, for which the structure is known, and the S-deacylases ABHD13 and ABHD16A, for which only AI-generated models exist. ABHD16A is particularly challenging because a sequence alignment search with BLASTp does not reveal any structural homologues and therefore requires work with extremely remote homologues; however, both models are validated through this strategy and are stable during classical molecular dynamics simulations. The ability of the strategy to identify errors is assessed with reference to misaligned ABHD13 models and misfolded decoy proteins.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.3%
18.7%
2
PLOS Computational Biology
1633 papers in training set
Top 1%
18.6%
3
Bioinformatics
1061 papers in training set
Top 4%
6.8%
4
BMC Bioinformatics
383 papers in training set
Top 2%
6.3%
50% of probability mass above
5
Frontiers in Molecular Biosciences
100 papers in training set
Top 0.3%
4.0%
6
Journal of Cheminformatics
25 papers in training set
Top 0.1%
3.9%
7
PLOS ONE
4510 papers in training set
Top 39%
3.6%
8
Scientific Reports
3102 papers in training set
Top 37%
3.6%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
10
Bioinformatics Advances
184 papers in training set
Top 2%
3.1%
11
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.3%
2.6%
12
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.1%
13
Journal of Molecular Biology
217 papers in training set
Top 1%
2.1%
14
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.4%
2.1%
15
International Journal of Molecular Sciences
453 papers in training set
Top 9%
1.3%
16
Protein Science
221 papers in training set
Top 1%
1.1%
17
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
18
Communications Biology
886 papers in training set
Top 17%
0.9%
19
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.2%
0.8%
20
Molecules
37 papers in training set
Top 2%
0.7%
21
Journal of Computational Chemistry
11 papers in training set
Top 0.2%
0.7%