Back

Bridging Acoustic and Semantic Spaces for Interpretable Voice Scoring via Zero-Shot Semantic Expansion

Hsiao, C.; Cheng, Y.-R.; Yang, C.-Y.; Hsu, F.-S.

2026-06-01 health informatics
10.64898/2026.05.29.26354442 medRxiv
Show abstract

Subjective auditory-perceptual evaluation and uninterpretable deep learning models limit the clinical assessment of voice disorders. This study proposes a two-phase zero-shot framework to evaluate voice pathology. First, an Audio Spectrogram Transformer is fine-tuned on the Perceptual Voice Quality Database to generate an acoustic latent space. Second, Orthogonal Procrustes analysis maps these acoustic embeddings directly onto the semantic space of a pre-trained Sentence Transformer. The geometric alignment produced continuous semantic axes that outperformed a supervised machine learning baseline in regressing clinician-rated GRBAS (Grade, Roughness, Breathiness, Asthenia, and Strain) severity scales. Furthermore, these axes correlate with traditional acoustic measures, including Harmonics-to-Noise Ratio and local jitter, while remaining robust when applied to aperiodic signals by not requiring fundamental frequency extraction. Most importantly, the model achieved zero-shot semantic expansion, successfully evaluating voices using an untrained, natural clinical vocabulary beyond the GRBAS scale. External validation on the Voice ICarus Database confirmed cross-corpus stability and demonstrated the capacity for zero-shot differential phenotyping of specific etiologies, such as hypokinetic dysphonia and reflux laryngitis. By bridging acoustic and semantic latent spaces, this framework offers an objective, continuous, and transparent metric for evaluating voice quality using voice descriptive vocabulary.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Computers in Biology and Medicine
120 papers in training set
Top 0.1%
18.6%
2
Scientific Reports
3102 papers in training set
Top 0.9%
18.6%
3
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.2%
6.4%
4
PLOS ONE
4510 papers in training set
Top 35%
4.0%
5
Communications Biology
886 papers in training set
Top 2%
3.6%
50% of probability mass above
6
Advanced Science
249 papers in training set
Top 7%
2.9%
7
Nature Communications
4913 papers in training set
Top 44%
2.7%
8
Frontiers in Digital Health
20 papers in training set
Top 0.5%
2.1%
9
NeuroImage: Clinical
132 papers in training set
Top 2%
1.9%
10
npj Digital Medicine
97 papers in training set
Top 2%
1.9%
11
IEEE Transactions on Biomedical Engineering
38 papers in training set
Top 0.5%
1.8%
12
Science Advances
1098 papers in training set
Top 17%
1.7%
13
eLife
5422 papers in training set
Top 42%
1.7%
14
Journal of Neural Engineering
197 papers in training set
Top 1%
1.7%
15
eBioMedicine
130 papers in training set
Top 1%
1.7%
16
Scientific Data
174 papers in training set
Top 1%
1.5%
17
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.3%
18
Nature Machine Intelligence
61 papers in training set
Top 3%
1.2%
19
NeuroImage
813 papers in training set
Top 5%
1.1%
20
Sensors
39 papers in training set
Top 1%
0.9%
21
Biomedical Signal Processing and Control
18 papers in training set
Top 0.4%
0.9%
22
Journal of Personalized Medicine
28 papers in training set
Top 0.9%
0.9%
23
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
24
European Respiratory Journal
54 papers in training set
Top 2%
0.8%
25
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
26
Science Translational Medicine
111 papers in training set
Top 6%
0.7%
27
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.7%
28
Frontiers in Neuroscience
223 papers in training set
Top 8%
0.7%
29
Trends in Hearing
12 papers in training set
Top 0.1%
0.7%
30
Frontiers in Computational Neuroscience
53 papers in training set
Top 2%
0.6%