RuHere (Are You Here?): An R package to obtain, validate, and clean species records using metadata and specialist range information
Ferreira Trindade, W. C.; Caron, F.
Show abstract
O_LISpecies occurrence data are fundamental to understanding, predicting, and conserving global biodiversity. However, biodiversity datasets remain affected by substantial data-quality issues, particularly erroneous or imprecise geographic coordinates. Most available tools for identifying problematic records rely primarily on automated spatial or metadata-based checks and rarely integrate expert-curated species range information, which can reveal introductions or geographic errors that often escape standard validation procedures. C_LIO_LIHere, we introduce RuHere, an R package designed to manage species occurrence data, flag potential errors, and support the iterative exploration of problematic records. RuHere streamlines the data-cleaning process by integrating six main steps: (1) obtaining species occurrence records; (2) merging datasets and standardizing spatial information; (3) flagging records based on metadata; (4) flagging records using expert-derived distribution data; (5) visualizing, investigating, and summarizing flagged issues in the final datasets; and (6) exploring and reducing sampling bias. C_LIO_LIWe demonstrate the applicability of RuHere using occurrence data for a plant species (Araucaria angustifolia) and an animal species (Cyanocorax caeruleus). Nearly 75% of records were flagged as potentially problematic, including records identified exclusively by functions relying on specialist range information. C_LIO_LIThe main strengths of RuHere lie in its integrated and computationally efficient workflow, its tools for exploring and evaluating flagged records, and its ability to incorporate expert-derived distribution data to identify occurrences outside a species known natural range. By combining metadata-based checks, coordinate validation, and specialist knowledge, RuHere provides a robust and reproducible framework for improving the quality of species occurrence datasets. C_LI
Matching journals
The top 4 journals account for 50% of the predicted probability mass.