Evaluating Voice-Enabled Generative AI for Mental Health: Real-Time Performance and Safety Analyses
Ngo, N.; Sano, A.
Show abstract
This study investigates the integration of Voice AI into a locally hosted generative AI chatbot designed to function as a mental health assistant, with the goal of enabling intuitive, voice-based therapeutic interaction. Leveraging the Llama3.1 8B language model for privacy-preserving generation, the system combines Deepgrams Speech-to-Text API and OpenAIs Text-to-Speech API within a WebRTC-based framework to support low-latency, bi-directional communication. A custom pipeline facilitates real-time voice input and output, aiming to reduce barriers to engagement and foster a more natural conversational flow. Technical evaluation focuses on latency across short, long-form, and multi-turn dialogues, revealing response times within tolerable bounds for synchronous use. Prompt engineering and system prompt customization guide empathetic, context-aware responses in standard therapeutic scenarios, though limitations persist in handling edge cases. These findings suggest that locally hosted voice-enabled LLMs can support responsive, privacy-conscious mental health applications, with future work directed toward fine-tuning for high-risk interactions.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.