Back

From expansion to consolidation: two decades ofGene Ontology evolution

Pitarch, B.; Pazos, F.; Chagoyen, M.

2026-03-06 bioinformatics
10.64898/2026.03.04.709507 bioRxiv
Show abstract

The Gene Ontology (GO) is a long-standing, community-maintained knowledge resource that underpins the functional annotation of gene products across numerous biological databases. Released regularly, GO and its associated annotations form a large, continuously evolving dataset whose temporal dynamics have direct consequences for data reuse, versioning, and reproducibility. Because analytical results derived from GO are inherently tied to specific ontology and annotation releases, a systematic understanding of how GO changes over time is essential for transparent interpretation and long-term reuse of GO-based analyses. Here, we present a comprehensive temporal characterization of the Gene Ontology and its annotations spanning 21 years of publicly available releases. Treating successive ontology and annotation versions as longitudinal research data, we quantify changes in ontology structure, term composition, relationships, and annotation content across time and across three representative annotation resources. Our analysis reveals sustained growth of GO over its lifetime, accompanied by marked structural reorganization, particularly affecting high-level, general ontology terms. Notably, across multiple structural and annotation metrics, we identify a transition toward increased stability beginning around 2017, consistent with a maturation phase of the resource. This work provides a reference framework for researchers who rely on GO releases for data integration, benchmarking, and reproducible functional analysis.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 1.0%
14.1%
2
Bioinformatics
1061 papers in training set
Top 2%
12.3%
3
GigaScience
172 papers in training set
Top 0.1%
9.9%
4
Scientific Data
174 papers in training set
Top 0.3%
6.3%
5
Database
51 papers in training set
Top 0.1%
4.8%
6
Nature Communications
4913 papers in training set
Top 35%
4.3%
50% of probability mass above
7
Cell Systems
167 papers in training set
Top 3%
4.1%
8
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.7%
3.6%
9
BMC Bioinformatics
383 papers in training set
Top 3%
3.2%
10
PLOS Computational Biology
1633 papers in training set
Top 11%
3.2%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.0%
12
Scientific Reports
3102 papers in training set
Top 43%
2.8%
13
PLOS ONE
4510 papers in training set
Top 47%
2.3%
14
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.8%
15
Journal of Molecular Biology
217 papers in training set
Top 2%
1.7%
16
iScience
1063 papers in training set
Top 20%
1.3%
17
Nature Biotechnology
147 papers in training set
Top 5%
1.3%
18
Journal of Proteome Research
215 papers in training set
Top 1%
1.3%
19
Genome Biology
555 papers in training set
Top 5%
1.3%
20
Nature Methods
336 papers in training set
Top 5%
1.2%
21
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
22
Cell Genomics
162 papers in training set
Top 7%
0.7%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
24
Patterns
70 papers in training set
Top 3%
0.7%
25
International Journal of Molecular Sciences
453 papers in training set
Top 18%
0.6%
26
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.6%
27
Genetics
225 papers in training set
Top 5%
0.6%
28
Molecular Systems Biology
142 papers in training set
Top 2%
0.6%
29
Genome Medicine
154 papers in training set
Top 9%
0.6%