Back

Advances in protein function prediction from the fifth CAFA challenge

De Paolis Kaluza, M. C.; Ramola, R.; Joshi, P.; Piovesan, D.; Reade, W.; Orchard, S.; Martin, M. J.; Ignatchenko, A.; Kaggle Competition Participants, ; Rost, B.; Orengo, C. A.; Robinson-Rechavi, M.; Durand, D.; Brenner, S. E.; Greene, C. S.; Mooney, S. D.; Friedberg, I.; Radivojac, P.

2026-04-30 bioinformatics
10.64898/2026.04.27.716980 bioRxiv
Show abstract

The Critical Assessment of Functional Annotation (CAFA) is a long-standing community effort to independently assess computational methods for protein function prediction, to highlight wellperforming methodologies, to identify bottlenecks in the field, and to provide a forum for the dissemination of results and exchange of ideas. In its fifth round (CAFA5) of triennial challenges, a partnership with Kaggle Inc. facilitated participation from a large community of data scientists and computational biologists through a competitive prospective challenge on the crowdsourcing platform. In this work, we present an in-depth analysis of the submitted predictions and report improvements in accuracy over all methods from the previous CAFA challenges. We further introduce a new evaluation setting for proteins with pre-existing (incomplete) annotations and identify the need for methods that better leverage existing annotations to predict those that will be discovered later. Finally, we characterize the prospective evaluation framework by examining performance on a strict set of unpublished annotations and across intermediate database releases. Our results indicate that recent developments in the field, such as the availability of protein language models and accurately predicted 3D structures, as well as the growth of experimental annotations through biocuration, have all contributed to performance improvements.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
22.1%
2
Bioinformatics Advances
184 papers in training set
Top 0.2%
9.0%
3
BMC Bioinformatics
383 papers in training set
Top 1%
8.3%
4
PLOS Computational Biology
1633 papers in training set
Top 6%
6.3%
5
Protein Science
221 papers in training set
Top 0.3%
4.8%
50% of probability mass above
6
Journal of Molecular Biology
217 papers in training set
Top 0.3%
4.8%
7
Cell Systems
167 papers in training set
Top 3%
4.2%
8
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 21%
3.5%
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.3%
10
Nucleic Acids Research
1128 papers in training set
Top 9%
2.0%
11
Nature Communications
4913 papers in training set
Top 49%
1.9%
12
Nature Methods
336 papers in training set
Top 4%
1.7%
13
Journal of Proteome Research
215 papers in training set
Top 1%
1.5%
14
Frontiers in Bioinformatics
45 papers in training set
Top 0.4%
1.3%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
16
Journal of Cheminformatics
25 papers in training set
Top 0.4%
1.2%
17
PLOS ONE
4510 papers in training set
Top 60%
1.2%
18
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.7%
1.2%
19
International Journal of Molecular Sciences
453 papers in training set
Top 12%
0.9%
20
Database
51 papers in training set
Top 0.7%
0.9%
21
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
22
Structure
175 papers in training set
Top 3%
0.9%
23
Genome Research
409 papers in training set
Top 4%
0.7%
24
Nature Computational Science
50 papers in training set
Top 2%
0.7%
25
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
26
Protein Engineering, Design and Selection
14 papers in training set
Top 0.1%
0.7%
27
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.7%
28
Nature Biotechnology
147 papers in training set
Top 9%
0.6%
29
BioData Mining
15 papers in training set
Top 1%
0.6%
30
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.6%