Back

Can data mining from various internet platforms systematically accelerate detection of alien species invasions across the EU?

Reynaert, S.; Billiet, N.; Pipek, P.; Novoa, A.; Hulme, P.; Meeus, S.; Groom, Q.

2026-02-07 ecology
10.64898/2026.02.06.704325 bioRxiv
Show abstract

Invasive alien species (IAS) expansions are increasingly impacting the biodiversity and economy of Europe. To more effectively allocate the limited resources available for their management, it is pertinent to accelerate detection of IAS spread and distribution. One largely untapped secondary data source showing much potential lies in the automated tracking of internet activity such as IAS search intensity or mentions across different internet platforms. In this study, we tested if internet activity increases systematically when IAS expand into new EU countries utilizing the combined data of 88 invasive species from various internet platforms. In total, 14 internet platforms were screened and evaluated based on their database accessibility, mined data quality and utility for systematic IAS expansion tracking. We found that the procedure to obtain researcher access to minimal data required for IAS tracking (i.e., information about location, time and place) varies widely across platforms, and is particularly difficult without incurring significant costs for many of the larger ones (X, Google and Tiktok). From the explored species, more charismatic species (i.e., mammals) overall gained more online traction than more cryptic ones (i.e., plants), though online activity of the first proved a worse representation of real-world occurrence patterns. Moreover, while the final five selected internet platforms showed increased activity surrounding the year of invasion in many of the explored invasion scenarios (particularly Wikipedia and Facebook), inconsistencies between species groups, trends per platform and the large variability in data quality currently still hampers systematic integration of such data into existing databases. We conclude that combining IAS activity data from various internet platforms shows potential to accelerate IAS expansion detection across the EU (especially for fish, crustaceans, reptiles, birds and plants). However, incorporation in automated early warning systems is currently hampered by variable data quality, limited researcher access to online data and the few open, accurate and generalizable species classification algorithms with API access.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 16%
12.2%
2
Ecological Informatics
29 papers in training set
Top 0.1%
10.0%
3
PeerJ
261 papers in training set
Top 0.3%
8.1%
4
Peer Community Journal
254 papers in training set
Top 0.3%
7.1%
5
Scientific Reports
3102 papers in training set
Top 14%
6.8%
6
Conservation Science and Practice
13 papers in training set
Top 0.1%
3.6%
7
GigaScience
172 papers in training set
Top 0.7%
3.0%
50% of probability mass above
8
BMC Biology
248 papers in training set
Top 0.4%
2.7%
9
Diversity and Distributions
26 papers in training set
Top 0.1%
2.7%
10
eLife
5422 papers in training set
Top 31%
2.7%
11
Conservation Letters
11 papers in training set
Top 0.2%
2.6%
12
Ecography
50 papers in training set
Top 0.5%
2.3%
13
Frontiers in Plant Science
240 papers in training set
Top 3%
1.8%
14
Conservation Biology
14 papers in training set
Top 0.2%
1.8%
15
Ecological Indicators
20 papers in training set
Top 0.2%
1.7%
16
Biological Conservation
43 papers in training set
Top 0.5%
1.6%
17
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 3%
1.6%
18
Science of The Total Environment
179 papers in training set
Top 3%
1.5%
19
Global Ecology and Biogeography
41 papers in training set
Top 0.4%
1.2%
20
Biodiversity and Conservation
11 papers in training set
Top 0.2%
1.1%
21
Methods in Ecology and Evolution
160 papers in training set
Top 2%
1.1%
22
Global Ecology and Conservation
25 papers in training set
Top 0.9%
1.1%
23
Animals
20 papers in training set
Top 0.7%
0.9%
24
Scientific Data
174 papers in training set
Top 2%
0.9%
25
Frontiers in Ecology and Evolution
60 papers in training set
Top 4%
0.8%
26
Ecological Modelling
24 papers in training set
Top 0.6%
0.7%
27
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
28
Ecology and Evolution
232 papers in training set
Top 4%
0.7%
29
PLOS Biology
408 papers in training set
Top 22%
0.7%
30
Patterns
70 papers in training set
Top 3%
0.7%