Scalable, non-invasive depression monitoring with smartphone speech: a multimodal benchmark and topic analysis
Emden, D.; Gutfleisch, L.; Herpertz, J.; Leenings, R.; Blitz, R.; Holstein, V. L.; Goltermann, J.; Richter, M.; Chevance, A.; Fleuchaus, A.; Winter, N. R.; Spanagel, J.; Meinert, S.; Borgers, T.; Flinkenflugel, K.; Stein, F.; Alexander, N.; Jamalabadi, H.; Leehr, E. J.; Redlich, R.; Ebner-Priemer, U.; Nenadic, I.; Kircher, T.; Dannlowski, U.; Hahn, T.; Opel, N.
Show abstract
Objective, scalable biomarkers are needed for continuous monitoring of major depressive disorder (MDD). Smartphone-collected speech is promising, yet extracting clinically useful signals remains difficult. We analysed 3 151 weekly voice diaries from 284 German-speaking adults (128 MDD, 156 controls) and regressed Beck Depression Inventory (BDI) scores. Sentence embeddings from the open-source 8-billion-parameter Qwen3-8B model predicted scores with MAE = 4.45 and R2 = 0.35, explaining 16 more points of variance than the best traditional feature set (TF-IDF). Adding lexical-prosodic or TF-IDF features provided only marginal improvement (best MAE = 4.39). To interpret the embeddings we applied BERTopic and uncovered ten coherent themes; BDI scores peaked for "Persistent Low Mood" and "Pain Distress", confirming clinical relevance. Large-language-model embeddings therefore capture the dominant signal of depression severity in everyday speech and, paired with interpretable topic analysis, offer a privacy-preserving, scalable route to digital mental-health phenotyping.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.