Back

An Open Reproducible Framework for CNN-Based Cetacean Vocalization Detection in Passive Acoustic Monitoring

De Marco, R.

2026-05-06 animal behavior and cognition
10.64898/2026.05.01.721665 bioRxiv
Show abstract

This paper presents a six-stage methodological framework for Convolutional Neural Net-work (CNN)-based cetacean vocalization detection and classification in Passive Acoustic Monitoring (PAM), implemented as the open-source toolkit ai-pam-pipeline. The frame-work is generalizable across species and fully parameterised through a single configuration file, guaranteeing exact experimental reproducibility. Two experiments are reported. Experiment A examines the effect of FFT window length Nfft [isin] {256, 512, 1024} on binary Bottlenose dolphin (Tursiops truncatus) whistle detection using stratified 10-fold cross-validation on an in-domain dataset (Oltremare, 192 kHz) and a cross-domain benchmark (DCLDE 2022). In-domain performance is uniformly high (macro F1{approx} 0.98; Wilcoxon, all p > 0.05). Cross-domain results diverge substantially: Nfft = 256 is significantly superior (p = 0.006, rank-biserial r = 0.89). The mechanism is an upsampling amplification effect: coarser spectral bins produce wider, higher-contrast FM traces after bilinear resampling to fixed image dimensions. This superiority is threshold-invariant: precision equals 1.000 across all configurations and thresholds{theta} [isin] [0.1, 0.9], confirming that the advantage is not an artifact of threshold choice. These findings demonstrate that preprocessing choices -- often treated as secondary implementation details -- can significantly affect cross-domain generalisation. While Nfft serves here as a controlled case study, the framework is designed to enable systematic, reproducible evaluation of arbitrary preprocessing parameters within a unified experimental protocol. Experiment B demonstrates multiclass capability on five T. truncatus vocalization cate-gories (macro F1 = 0.843); inter-class confusion between click trains and burst-pulse sounds reflects biological signal overlap rather than classifier failure.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Methods in Ecology and Evolution
160 papers in training set
Top 0.1%
28.2%
2
PLOS ONE
4510 papers in training set
Top 11%
15.0%
3
Ecological Informatics
29 papers in training set
Top 0.1%
7.0%
50% of probability mass above
4
PLOS Computational Biology
1633 papers in training set
Top 5%
6.5%
5
Scientific Reports
3102 papers in training set
Top 16%
6.5%
6
SoftwareX
15 papers in training set
Top 0.1%
3.7%
7
Open Research Europe
14 papers in training set
Top 0.1%
3.7%
8
Sensors
39 papers in training set
Top 0.5%
3.7%
9
eLife
5422 papers in training set
Top 37%
1.9%
10
Journal of The Royal Society Interface
189 papers in training set
Top 3%
1.5%
11
The Journal of the Acoustical Society of America
33 papers in training set
Top 0.1%
1.5%
12
Behavior Research Methods
25 papers in training set
Top 0.1%
1.5%
13
Journal of Neuroscience Methods
106 papers in training set
Top 1%
1.4%
14
Biology
43 papers in training set
Top 1%
1.3%
15
Royal Society Open Science
193 papers in training set
Top 4%
0.9%
16
Communications Biology
886 papers in training set
Top 23%
0.8%
17
Nature Communications
4913 papers in training set
Top 65%
0.7%
18
Scientific Data
174 papers in training set
Top 3%
0.7%
19
Peer Community Journal
254 papers in training set
Top 5%
0.5%
20
Journal of Animal Ecology
63 papers in training set
Top 1%
0.5%
21
Ecological Indicators
20 papers in training set
Top 0.8%
0.5%
22
Hearing Research
49 papers in training set
Top 0.4%
0.5%
23
BMC Ecology and Evolution
49 papers in training set
Top 2%
0.5%