Back

The impacts of transfer learning, phylogenetic distance, and sample size on big-data bioacoustics

Provost, K. L.; Yang, J.; Carstens, B. C.

2022-04-18 evolutionary biology
10.1101/2022.02.24.481827 bioRxiv
Show abstract

Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness. While recordings of bioacoustic data have been captured and stored in collections for decades, the automated extraction of data from these recordings has only recently been facilitated by artificial intelligence methods. These have yet to be evaluated with respect to accuracy of different automation strategies and features. Here, we use a recently published machine learning framework to extract syllables from ten bird species ranging in their phylogenetic relatedness from 1 to 85 million years, to compare how phylogenetic relatedness influences accuracy. We also evaluate the utility of applying trained models to novel species. Our results indicate that model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa. However, we also find that the application of models trained on multiple distantly related species can improve the overall accuracy to levels near that of training and analyzing a model on the same species. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Ecological Informatics
29 papers in training set
Top 0.1%
19.7%
2
Methods in Ecology and Evolution
160 papers in training set
Top 0.3%
12.5%
3
Scientific Reports
3102 papers in training set
Top 6%
10.2%
4
PLOS ONE
4510 papers in training set
Top 18%
10.2%
50% of probability mass above
5
BMC Ecology and Evolution
49 papers in training set
Top 0.2%
4.9%
6
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 0.9%
4.4%
7
Proceedings of the Royal Society B: Biological Sciences
341 papers in training set
Top 3%
2.1%
8
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
9
Communications Biology
886 papers in training set
Top 8%
1.7%
10
Royal Society Open Science
193 papers in training set
Top 2%
1.7%
11
BMC Biology
248 papers in training set
Top 2%
1.5%
12
The Journal of the Acoustical Society of America
33 papers in training set
Top 0.1%
1.5%
13
eLife
5422 papers in training set
Top 47%
1.3%
14
Ecological Indicators
20 papers in training set
Top 0.3%
1.2%
15
Molecular Ecology Resources
161 papers in training set
Top 0.9%
0.9%
16
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
17
iScience
1063 papers in training set
Top 29%
0.8%
18
Journal of The Royal Society Interface
189 papers in training set
Top 4%
0.8%
19
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
20
Behavioral Ecology and Sociobiology
34 papers in training set
Top 0.6%
0.8%
21
Data in Brief
13 papers in training set
Top 0.5%
0.7%
22
Remote Sensing in Ecology and Conservation
10 papers in training set
Top 0.3%
0.7%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
24
Bioinformatics
1061 papers in training set
Top 10%
0.7%
25
PLOS Genetics
756 papers in training set
Top 17%
0.7%
26
PeerJ
261 papers in training set
Top 19%
0.5%
27
Genes
126 papers in training set
Top 4%
0.5%
28
Biology Methods and Protocols
53 papers in training set
Top 4%
0.5%
29
Nature Machine Intelligence
61 papers in training set
Top 4%
0.5%
30
Computers in Biology and Medicine
120 papers in training set
Top 6%
0.5%