Back

Automated landmark and semilandmark annotation for wing geometric morphometrics in Diptera using deep learning

Nolte, K.; Baumbach, J.; Kollmannsberger, P.; Sauer, F. G.; Luehken, R.

2026-04-21 bioinformatics
10.64898/2026.04.17.719146 bioRxiv
Show abstract

1. Diptera represent a diverse insect order, including vectors of human and animal pathogens. Their accurate species identification remains a major bottleneck in ecological and epidemiological studies. Morphological identification requires taxonomic expertise, while molecular methods are costly and not universally reliable. Wing geometric morphometrics offers an alternative, but manual landmark annotation is time-consuming and introduces observer bias. 2. We developed ITHILDIN, an automated pipeline for landmark and semilandmark annotation of Diptera wings, combining UNet++ segmentation and an Hourglass landmark prediction model. Using mosquitoes as the primary model system, we extended an existing repository with 5,793 additional images. Models were trained on 5991 annotations of landmarks and segmentations and then evaluated on 12,522 images across 34 taxa. We assessed landmark prediction accuracy against human observers and ML-morph, evaluated species identification using Linear Discriminant Analysis on 17 homologous landmarks and 52 semilandmarks, and tested out-of-distribution generalisation by reproducing an independent study. Transferability was demonstrated by adapting the pipeline to the Dipteran families Drosophilidae and Glossinidae. 3. The Hourglass model achieved a mean landmark error of 4.5 pixels (95% CI: 4.3-4.6), within human observer variability (4.7 pixels, 95% CI: 4.4-5.0) and substantially outperforming ML-Morph (12.7 pixels, 95% CI: 11.1-14.2). The semilandmark-based approach for species identification achieved 91% balanced accuracy across 34 taxa, comparable to CNN performance (94%). On out-of-distribution data, the landmark pipeline generalised substantially better than the CNN and a soft-voting ensemble of the landmark and CNN classifiers achieved 88% balanced accuracy on a replicated study. 4. Combining geometric morphometrics with deep learning provides a reproducible, interpretable, and generalisable alternative to black-box CNN classifiers for Diptera wing analysis. By acting as a consistent single observer comparable to human annotation, the system eliminates inter-observer bias, enabling large-scale and cross-study morphometric analyses of Dipteran wings. The system is publicly available at www.ithildin.bnitm.de and transferable to other Diptera families with moderate retraining effort. Data availabilityImages used in this study are accessible under CC BY 4.0 license at https://doi.org/10.6019/S-BIAD1478. Downloadable and installable docker application can be accessed on the applications git page: https://anonymous.4open.science/r/ITHILDIN-4313/

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 0.7%
33.1%
2
Methods in Ecology and Evolution
160 papers in training set
Top 0.5%
6.8%
3
Scientific Reports
3102 papers in training set
Top 17%
6.4%
4
Communications Biology
886 papers in training set
Top 0.8%
4.4%
50% of probability mass above
5
Scientific Data
174 papers in training set
Top 0.5%
3.6%
6
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
7
Bioinformatics
1061 papers in training set
Top 6%
2.6%
8
BMC Biology
248 papers in training set
Top 0.5%
2.6%
9
Nature Methods
336 papers in training set
Top 4%
2.1%
10
PLOS ONE
4510 papers in training set
Top 48%
2.1%
11
Molecular Ecology Resources
161 papers in training set
Top 0.5%
1.9%
12
Peer Community Journal
254 papers in training set
Top 2%
1.9%
13
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
14
eLife
5422 papers in training set
Top 42%
1.7%
15
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 35%
1.5%
16
Nature
575 papers in training set
Top 12%
1.5%
17
New Phytologist
309 papers in training set
Top 4%
1.3%
18
Ecology Letters
121 papers in training set
Top 1.0%
1.2%
19
Systematic Entomology
11 papers in training set
Top 0.1%
1.2%
20
Systematic Biology
121 papers in training set
Top 0.3%
1.2%
21
GigaScience
172 papers in training set
Top 2%
1.2%
22
Bioinformatics Advances
184 papers in training set
Top 4%
1.1%
23
Science Advances
1098 papers in training set
Top 28%
0.8%
24
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.8%
25
BMC Ecology and Evolution
49 papers in training set
Top 2%
0.7%
26
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
27
PeerJ
261 papers in training set
Top 16%
0.7%
28
Royal Society Open Science
193 papers in training set
Top 5%
0.7%
29
Cell Reports Methods
141 papers in training set
Top 6%
0.6%
30
Ecological Informatics
29 papers in training set
Top 0.9%
0.6%