Back

Large Language Models Struggle to Encode Medical Concepts - A Multilingual Benchmarking and Comparative Analysis

Rouhizadeh, H.; Yazdani, A.; Zhang, B.; Vicente Alvarez, D.; Hueser, M.; Vanobberghen, A.; Yang, R.; Li, I.; Walter, A.; Teodoro, D.

2025-01-15 health informatics
10.1101/2025.01.15.25320579 medRxiv
Show abstract

Interoperability in health information systems is crucial for accurate data exchange across environments such as electronic health records, clinical notes, and medical research. The main challenge arises from the wide variation in biomedical concepts, their representation across different systems and languages, and the limited context, complicating data integration and standardization. Inspired by recent advances in large language models (LLMs), this study explores their potential role as biomedical knowledge engineers to (semi-)automate multilingual biomedical concept normalization, a key task for semantic interoperability of medical concepts. We developed a novel multilingual dataset comprising 59104 unique terms mapped to 27280 distinct biomedical concepts, designed to assess language model performance across this task within five European languages: English, French, German, Spanish, and Turkish. We then proposed a multi-stage pipeline based on a retrieve-then-rerank approach using sparse and dense retrievers, rerankers, and fusion approaches, leveraging discriminative and generative LLMs, with a predefined primary knowledge organization system. Our experiments show that the best discriminative model, e5, achieves an accuracy of 71%, surpassing the best generative model, Mistral, by 2% (p-value < 0.001). For semi-automated workflows, e5 maintained superior performance with 82% recall@10 versus Mistrals 78%. Our findings demonstrate a pathway to how LLM-based approaches can advance the normalization of multilingual biomedical terms as well as the limitations of LLMs in encoding biomedical concepts.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
32.0%
2
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.1%
17.0%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
6.6%
50% of probability mass above
4
Artificial Intelligence in Medicine
15 papers in training set
Top 0.1%
4.7%
5
npj Digital Medicine
97 papers in training set
Top 1%
3.5%
6
International Journal of Medical Informatics
25 papers in training set
Top 0.5%
3.2%
7
Scientific Reports
3102 papers in training set
Top 45%
2.7%
8
JMIR Medical Informatics
17 papers in training set
Top 0.5%
2.5%
9
JAMIA Open
37 papers in training set
Top 0.7%
2.0%
10
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.0%
11
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
12
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.4%
13
BMC Bioinformatics
383 papers in training set
Top 5%
1.4%
14
Biology Methods and Protocols
53 papers in training set
Top 1%
1.3%
15
Bioinformatics
1061 papers in training set
Top 8%
1.3%
16
PLOS Digital Health
91 papers in training set
Top 2%
1.2%
17
PLOS ONE
4510 papers in training set
Top 61%
1.2%
18
GigaScience
172 papers in training set
Top 2%
1.1%
19
Frontiers in Digital Health
20 papers in training set
Top 1%
0.9%
20
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.9%
21
Database
51 papers in training set
Top 0.8%
0.9%
22
Patterns
70 papers in training set
Top 3%
0.7%
23
iScience
1063 papers in training set
Top 38%
0.6%
24
Scientific Data
174 papers in training set
Top 3%
0.6%