Back

TaxonMatch: taxonomic integration and tree construction from heterogeneous biological databases

Leone, M.; Rech De Laval, V.; Drage, H. B.; Waterhouse, R. M.; Robinson-Rechavi, M.

2026-03-20 evolutionary biology
10.64898/2026.03.18.712418 bioRxiv
Show abstract

Integrating taxonomic data from various sources presents a significant challenge in the study of biodiversity research, due to non-standardized nomenclature and evolving species classifications. Discrepancies between major repositories like the Global Biodiversity Information Facility (GBIF) and the National Center for Biotechnology Information (NCBI), as well as citizen science platforms such as iNaturalist, lead to fragmented and sometimes inaccurate biological data. We present TaxonMatch, a tool designed to address these challenges. TaxonMatch aligns taxonomic names, resolves synonymy, and corrects typographical and structural inconsistencies across databases. We show how it can be used to build a common backbone arthropod taxonomy over NCBI, GBIF and iNaturalist, to find the closest molecular data to a given fossil, and to identify IUCN endangered species with molecular data. TaxonMatch provides a cohesive taxonomic framework and a consistent taxonomic backbone, and can be applied to any taxonomic source. The tool is available at https://github.com/MoultDB/TaxonMatch.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Methods in Ecology and Evolution
160 papers in training set
Top 0.2%
22.4%
2
PLOS ONE
4510 papers in training set
Top 16%
12.3%
3
Scientific Data
174 papers in training set
Top 0.3%
6.4%
4
Ecological Informatics
29 papers in training set
Top 0.1%
4.0%
5
Bioinformatics
1061 papers in training set
Top 5%
3.6%
6
Applications in Plant Sciences
21 papers in training set
Top 0.1%
3.6%
50% of probability mass above
7
Ecology and Evolution
232 papers in training set
Top 1%
3.6%
8
Nature Communications
4913 papers in training set
Top 40%
3.6%
9
Systematic Biology
121 papers in training set
Top 0.2%
2.6%
10
Bioinformatics Advances
184 papers in training set
Top 2%
2.4%
11
Science
429 papers in training set
Top 12%
2.1%
12
Molecular Ecology Resources
161 papers in training set
Top 0.5%
2.1%
13
Scientific Reports
3102 papers in training set
Top 56%
1.8%
14
Peer Community Journal
254 papers in training set
Top 2%
1.8%
15
eLife
5422 papers in training set
Top 45%
1.5%
16
Data in Brief
13 papers in training set
Top 0.1%
1.3%
17
Molecular Biology and Evolution
488 papers in training set
Top 3%
1.3%
18
Systematic Entomology
11 papers in training set
Top 0.1%
1.2%
19
PeerJ
261 papers in training set
Top 10%
1.2%
20
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
21
Communications Biology
886 papers in training set
Top 17%
0.9%
22
Ecography
50 papers in training set
Top 1%
0.9%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
24
PLOS Biology
408 papers in training set
Top 17%
0.9%
25
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.8%
26
Journal of Systematics and Evolution
11 papers in training set
Top 0.3%
0.7%
27
BMC Biology
248 papers in training set
Top 6%
0.6%
28
Global Ecology and Biogeography
41 papers in training set
Top 0.7%
0.6%
29
Biology Open
130 papers in training set
Top 3%
0.6%