Back

Modelling the niches of wild and domesticated Ungulatespecies using deep learning

Rademaker, M.; Hogeweg, L.; Vos, R.

2019-08-22 evolutionary biology
10.1101/744441 bioRxiv
Show abstract

Knowledge of global biodiversity remains limited by geographic and taxonomic sampling biases. The scarcity of species data restricts our understanding of the underlying environmental factors shaping distributions, and the ability to draw comparisons among species. Species distribution models (SDMs) were developed in the early 2000s to address this issue. Although SDMs based on single layered Neural Networks have been experimented with in the past, these performed poorly. However, the past two decades have seen a strong increase in the use of Deep Learning (DL) approaches, such as Deep Neural Networks (DNNs). Despite the large improvement in predictive capacity DNNs provide over shallow networks, to our knowledge these have not yet been applied to SDM. The aim of this research was to provide a proof of concept of a DL-SDM1. We used a pre-existing dataset of the worlds ungulates and abiotic environmental predictors that had recently been used in MaxEnt SDM, to allow for a direct comparison of performance between both methods. Our DL-SDM consisted of a binary classification DNN containing 4 hidden layers and drop-out regularization between each layer. Performance of the DL-SDM was similar to MaxEnt for species with relatively large sample sizes and worse for species with relatively low sample sizes. Increasing the number of occurrences further improved DL-SDM performance for species that already had relatively high sample sizes. We then tried to further improve performance by altering the sampling procedure of negative instances and increasing the number of environmental predictors, including species interactions. This led to a large increase in model performance across the range of sample sizes in the species datasets. We conclude that DL-SDMs provide a suitable alternative to traditional SDMs such as MaxEnt and have the advantage of being both able to directly include species interactions, as well as being able to handle correlated input features. Further improvements to the model would include increasing its scalability by turning it into a multi-classification model, as well as developing a more user friendly DL-SDM Python package.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Ecological Informatics
29 papers in training set
Top 0.1%
18.0%
2
Ecography
50 papers in training set
Top 0.1%
16.9%
3
Methods in Ecology and Evolution
160 papers in training set
Top 0.4%
9.8%
4
Global Ecology and Biogeography
41 papers in training set
Top 0.1%
4.7%
5
Journal of Biogeography
37 papers in training set
Top 0.1%
3.5%
50% of probability mass above
6
Diversity and Distributions
26 papers in training set
Top 0.1%
3.5%
7
Ecology and Evolution
232 papers in training set
Top 1%
3.5%
8
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
9
Bioinformatics Advances
184 papers in training set
Top 2%
2.6%
10
Scientific Reports
3102 papers in training set
Top 46%
2.5%
11
Molecular Ecology Resources
161 papers in training set
Top 0.4%
2.4%
12
Ecology Letters
121 papers in training set
Top 0.6%
2.3%
13
PeerJ
261 papers in training set
Top 5%
2.0%
14
Bioinformatics
1061 papers in training set
Top 7%
2.0%
15
PLOS ONE
4510 papers in training set
Top 57%
1.4%
16
Nature Communications
4913 papers in training set
Top 54%
1.4%
17
New Phytologist
309 papers in training set
Top 4%
1.2%
18
Peer Community Journal
254 papers in training set
Top 3%
0.9%
19
Systematic Biology
121 papers in training set
Top 0.3%
0.9%
20
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.9%
21
Global Change Biology
69 papers in training set
Top 1%
0.9%
22
Frontiers in Ecology and Evolution
60 papers in training set
Top 3%
0.9%
23
eLife
5422 papers in training set
Top 59%
0.7%
24
Oikos
74 papers in training set
Top 0.9%
0.7%
25
iScience
1063 papers in training set
Top 36%
0.7%