Back

Large Language Models Struggle to Encode Medical Concepts - A Multilingual Benchmarking and Comparative Analysis

Rouhizadeh, H.; Yazdani, A.; Zhang, B.; Vicente Alvarez, D.; Hueser, M.; Vanobberghen, A.; Yang, R.; Li, I.; Walter, A.; Teodoro, D.

2025-01-15 health informatics

10.1101/2025.01.15.25320579 medRxiv

Show abstract

Interoperability in health information systems is crucial for accurate data exchange across environments such as electronic health records, clinical notes, and medical research. The main challenge arises from the wide variation in biomedical concepts, their representation across different systems and languages, and the limited context, complicating data integration and standardization. Inspired by recent advances in large language models (LLMs), this study explores their potential role as biomedical knowledge engineers to (semi-)automate multilingual biomedical concept normalization, a key task for semantic interoperability of medical concepts. We developed a novel multilingual dataset comprising 59104 unique terms mapped to 27280 distinct biomedical concepts, designed to assess language model performance across this task within five European languages: English, French, German, Spanish, and Turkish. We then proposed a multi-stage pipeline based on a retrieve-then-rerank approach using sparse and dense retrievers, rerankers, and fusion approaches, leveraging discriminative and generative LLMs, with a predefined primary knowledge organization system. Our experiments show that the best discriminative model, e5, achieves an accuracy of 71%, surpassing the best generative model, Mistral, by 2% (p-value < 0.001). For semi-automated workflows, e5 maintained superior performance with 82% recall@10 versus Mistrals 78%. Our findings demonstrate a pathway to how LLM-based approaches can advance the normalization of multilingual biomedical terms as well as the limitations of LLMs in encoding biomedical concepts.

Large Language Models Struggle to Encode Medical Concepts - A Multilingual Benchmarking and Comparative Analysis

Matching journals