Back

RUSBoost: A suitable species distribution method for imbalanced records of presence and absence. A case study of twenty-five species of Iberian bats

Carrasco, J.; Lison, F.; Weintraub, A.

2021-10-09 ecology
10.1101/2021.10.06.463434 bioRxiv
Show abstract

O_LITraditional Species Distribution Models (SDMs) may not be appropriate when examples of one class (e.g. absence or pseudo-absences) greatly outnumber examples of the other class (e.g. presences or observations), because they tend to favor the learning of observations more frequently. C_LIO_LIWe present an ensemble method called Random UnderSampling and Boosting (RUSBoost), which was designed to address the case where the number of presence and absence records are imbalanced, and we opened the "black-box" of the algorithm to interpret its results and applicability in ecology. C_LIO_LIWe applied our methodology to a case study of twenty-five species of bats from the Iberian Peninsula and we build a RUSBoost model for each species. Furthermore, in order to improve to build tighter models, we optimized their hyperparameters using Bayesian Optimization. In particular, we implemented a objective function that represents the cross-validation loss: [Formula], with [Formula] representing the hyper-parameters Maximum Number of Splits, Number of Learners and Learning Rate. C_LIO_LIThe models reached average values for Area Under the ROC Curve (AUC), specificity, sensitivity, and overall accuracy of 0.84 {+/-} 0.05%, 79.5 {+/-} 4.87%, 74.9 {+/-} 6.05%, and 78.8 {+/-} 5.0%, respectively. We also obtained values of variable importance and we analyzed the relationships between explanatory variables and bat presence probability. C_LIO_LIThe results of our study showed that RUSBoost could be a useful tool to develop SDMs with good performance when the presence/absence databases are imbalanced. The application of this algorithm could improve the prediction of SDMs and help in conservation biology and management. C_LI

Matching journals

The top 4 journals account for 50% of the predicted probability mass.