Back

On the necessity to include multiple types of evidence when predicting molecular function of proteins

de Crecy-Lagard, V.; Swairjo, M.

2023-12-19 biochemistry
10.1101/2023.12.18.571875 bioRxiv
Show abstract

Machine learning-based platforms are currently revolutionizing many fields of molecular biology including structure prediction for monomers or complexes, predicting the consequences of mutations, or predicting the functions of proteins. However, these platforms use training sets based on currently available knowledge and, in essence, are not built to discover novelty. Hence, claims of discovering novel functions for protein families using artificial intelligence should be carefully dissected, as the dangers of overpredictions are real as we show in a detailed analysis of the prediction made by Kim et al 1 on the function of the YciO protein in the model organism Escherichia coli.

Matching journals

The top 12 journals account for 50% of the predicted probability mass.

1
Biochemical and Biophysical Research Communications
78 papers in training set
Top 0.1%
8.3%
2
Biochemistry
130 papers in training set
Top 0.1%
6.4%
3
PLOS ONE
4510 papers in training set
Top 31%
4.9%
4
Protein Science
221 papers in training set
Top 0.3%
4.3%
5
Scientific Reports
3102 papers in training set
Top 31%
4.0%
6
Heliyon
146 papers in training set
Top 0.3%
4.0%
7
F1000Research
79 papers in training set
Top 0.4%
4.0%
8
Frontiers in Molecular Biosciences
100 papers in training set
Top 0.3%
3.9%
9
Biomolecules
95 papers in training set
Top 0.1%
3.6%
10
ACS Omega
90 papers in training set
Top 0.5%
3.6%
11
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.3%
2.8%
12
International Journal of Molecular Sciences
453 papers in training set
Top 4%
2.8%
50% of probability mass above
13
Journal of Molecular Biology
217 papers in training set
Top 1%
2.1%
14
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
15
The Journal of Physical Chemistry Letters
58 papers in training set
Top 0.6%
2.1%
16
Chemical Communications
24 papers in training set
Top 0.4%
1.9%
17
Cells
232 papers in training set
Top 2%
1.9%
18
Journal of Proteome Research
215 papers in training set
Top 1%
1.8%
19
Biochemistry and Biophysics Reports
28 papers in training set
Top 0.5%
1.7%
20
Bioscience Reports
25 papers in training set
Top 0.6%
1.7%
21
Molecules
37 papers in training set
Top 0.8%
1.7%
22
PeerJ
261 papers in training set
Top 7%
1.7%
23
Journal of Biomolecular Structure and Dynamics
43 papers in training set
Top 0.6%
1.7%
24
Physical Biology
43 papers in training set
Top 1%
1.7%
25
Journal of Molecular Evolution
21 papers in training set
Top 0.2%
1.2%
26
FEBS Open Bio
29 papers in training set
Top 0.3%
1.2%
27
Access Microbiology
22 papers in training set
Top 0.4%
1.0%
28
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.9%
29
BioMed Research International
25 papers in training set
Top 3%
0.9%
30
Open Biology
95 papers in training set
Top 2%
0.8%