A multimodal learning approach for automated detection of wildlife trade on social media

Momeny, M.; Kulkarni, R.; Soriano-Redondo, A.; Rinne, J.; Di Minin, E.

2025-09-29 ecology

10.1101/2025.09.24.678024 bioRxiv

Show abstract

Social media data and machine learning methods for automated content analysis are increasingly being used in ecology and conservation science. A current limitation is the lack of methods for automated multimodal analysis of textual and visual content among other data modalities. In this study, we introduce a multimodal content analysis method applied to the investigation of wildlife trade on YouTube. Our approach consists of analyzing text through transformer based neural networks and video keyframes using convolutional neural networks as part of multimodal filtering followed by classification where a decision fusion module identifies instances of wildlife trade. The decision fusion module achieved an F-score of 0.72 among textual classifiers for trade detection and of 0.77 among visual classifiers for species identification. This multimodal classification helped detect wildlife trade in 3,715 out of 86,321 filtered YouTube posts, featuring 226 species for sale, including 51 Critically Endangered, 62 Endangered, 60 Vulnerable, 25 Near Threatened, and 28 Least Concern species. The proposed multimodal learning methods can be used more broadly for other ecological and biodiversity conservation applications. The bigger pictureThe unsustainable trade in wildlife is a major driver of biodiversity loss, threatening thousands of species across the Tree of Life. While online platforms have become popular spaces for advertising wildlife and exotic pets for sale, monitoring these platforms remains extremely challenging. Traditional surveillance methods are not scalable, and automated tools have typically focused on either text or image analysis in isolation, limiting their effectiveness in identifying nuanced instances of wildlife trade. Our study introduces a multimodal machine learning framework that integrates textual and visual data to detect potential wildlife trade on YouTube. By combining natural language processing with deep learning for image analysis, and filtering millions of posts down to those most relevant, our method significantly improves detection accuracy. This dual-layered approach uncovered thousands of posts featuring hundreds of species, many of which are threatened. This work demonstrates how advances in machine learning can support ecological monitoring and conservation by providing timely, data-driven, insights into online trade networks. In the pursuit of reducing biodiversity loss, this study offers an approach for bridging the gap between online behavior and real-world ecological outcomes. HighlightsO_LIIntroduces a multimodal content analysis approach for detecting wildlife trade on YouTube by integrating textual and visual data. C_LIO_LIA multimodal filtering technique reduces irrelevant text and video content, enhancing analytical efficiency. C_LIO_LIA decision fusion module then combines results from text and video filtering improving wildlife trade detection accuracy. C_LIO_LIThe proposed methods are applicable across multiple online platforms and suitable for diverse tasks in ecology and biodiversity conservation. C_LI

A multimodal learning approach for automated detection of wildlife trade on social media

Matching journals