Back

Harmonising digitised herbarium data to enhance biodiversity knowledge: creating an updated checklist for the flora of Greenland.

Whitley, B. S.; Abermann, J.; Alsos, I. G.; Biersma, E. M.; Gardman, V.; Hoye, T. T.; Jones, L.; Khelidj, N. M.; Li, Z.; Losapio, G.; Pape, T.; Raundrup, K.; Schmitz, P.; Silva, T.; Wirta, H.; Roslin, T.; Ahlstrand, N. I.; de Vere, N.

2024-12-05 ecology
10.1101/2024.12.01.626242 bioRxiv
Show abstract

International efforts to digitise herbarium specimens provide the building blocks for a global digital herbarium. However, taxonomic changes and errors can result in inconsistencies when amalgamating specimen metadata, that compromise the assignment of occurrence records to correct taxa, and the subsequent interpretation of patterns in biodiversity. We present a novel workflow to mass-curate digital specimens. By employing existing digital taxonomic backbones, we aggregate specimen names by their accepted name and flag remaining cases for manual review. We then validate names using site-specific floras, balancing automation with taxonomic expert-based curation. Applying our workflow to the vascular plants of Greenland, we harmonised 175,266 digitised herbarium specimens and observations from 92 data providers from the Global Biodiversity Information Facility (GBIF). The harmonised metacollection for the Greenland flora contains 780 plant species. Our workflow increases the number of species known from Greenland compared to other currently available species checklists and increases the mean number of occurrences per species by 42.6. Our workflow illustrates the integration required in order to create a global, universally accessible digital herbarium, and shows how previous obstacles to database curation can be overcome through a combination of automation and expert curation. From the specific perspective of the Greenland flora, our approach arrives at a new checklist of taxa, a new curated metacollection of occurrence data, and revised estimates of plant richness. The list of taxa and their prevalence allow a new basis for biodiversity assessment and conservation planning. Societal Impact StatementDigitising plant collections has allowed for data to be aggregated across multiple collections, forming a single harmonised resource of unprecedented scale. This resource is only accurate once the database names are assigned to one accepted name per species. We established a semi-automated workflow for processing plant name data, leveraging taxonomic backbones and employing taxonomic expertise at key stages. Applying our workflow to the flora of Greenland, we developed a curated checklist of 780 species, capturing greater species richness than previously published, while also curating 175,266 plant records. Our findings redefine our knowledge of Greenlandic plant diversity, while harmonising a vast digital collection for further research.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Methods in Ecology and Evolution
160 papers in training set
Top 0.2%
18.4%
2
Applications in Plant Sciences
21 papers in training set
Top 0.1%
14.2%
3
PLANTS, PEOPLE, PLANET
21 papers in training set
Top 0.1%
12.4%
4
Scientific Data
174 papers in training set
Top 0.2%
10.0%
50% of probability mass above
5
PLOS ONE
4510 papers in training set
Top 26%
6.7%
6
Scientific Reports
3102 papers in training set
Top 28%
4.3%
7
Conservation Science and Practice
13 papers in training set
Top 0.2%
3.0%
8
Science
429 papers in training set
Top 11%
2.7%
9
Nature Communications
4913 papers in training set
Top 49%
1.9%
10
Nature Human Behaviour
85 papers in training set
Top 3%
1.5%
11
Ecological Informatics
29 papers in training set
Top 0.4%
1.5%
12
Ecology and Evolution
232 papers in training set
Top 3%
1.3%
13
Conservation Biology
14 papers in training set
Top 0.2%
1.2%
14
Remote Sensing in Ecology and Conservation
10 papers in training set
Top 0.2%
1.2%
15
Conservation Letters
11 papers in training set
Top 0.4%
0.9%
16
Ecography
50 papers in training set
Top 1.0%
0.9%
17
Diversity and Distributions
26 papers in training set
Top 0.3%
0.9%
18
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
19
Communications Earth & Environment
14 papers in training set
Top 0.7%
0.9%
20
Ecological Indicators
20 papers in training set
Top 0.5%
0.8%
21
PeerJ
261 papers in training set
Top 15%
0.7%
22
New Phytologist
309 papers in training set
Top 5%
0.7%
23
Journal of Applied Ecology
35 papers in training set
Top 0.7%
0.7%
24
Frontiers in Plant Science
240 papers in training set
Top 5%
0.7%