Back

Clinical Note Comparison and Data Retrieval Via Embedding Vectors: Model Selection, Metrics, and Convergence

Dahlberg, A. C. H.; Tapiola, O.; Luisto, R.; Puranen, T.; Sanmark, E.; Vartiainen, V.

2026-05-18 health informatics
10.64898/2026.05.12.26352832 medRxiv
Show abstract

Background: Embedding models are an integral part of generative AI architectures, transforming text into embedding vectors that represent semantic content in numerical form. Despite their central role, their performance in clinical settings remains underexplored. We evaluate embedding models across two tasks: semantic difference detection in clinical texts, and data retrieval from patient records. Methods: Eight models were applied to synthetic discharge summaries in English, Finnish, and Swedish. Semantic sensitivity was assessed by introducing controlled perturbations (deletion, modification, and paraphrasing) at three levels of severity; cosine similarity, and L1 and Euclidean distances were computed between the vectors of the original and perturbed texts. Partial vectors were compared to explore dimensionality reduction. Two models with the biggest contrast in semantic difference detection were evaluated on retrieval of relevant information from real Finnish vascular surgery records. Results: Embedding vectors captured semantic differences in clinical text: content deletion and modification produced larger increases in vector distance than paraphrasing. On average, models detected the direction of semantic change correctly, but case-level performance varied considerably. Qwen3-Embedding-8B was the only model with zero directional errors, while multilingual-E5-large erred in 13.8% of cases. In data retrieval, Qwen3-Embedding-8B again outperformed multilingual-E5-large, though the margin was narrower: sufficiency scores were 3.25 vs. 3.17 out of 5 for the first query and 2.25 vs. 1.15 out of 5 for the second query. For some models, as few as 0.6-1.2% of dimensions sufficed to replicate full-vector accuracy; principal component analysis and coordinate-level analysis did not account for this finding. Conclusions: Our results show that the choice of embedding model is important: performance differences between models can be large enough to determine whether clinically relevant information reaches the end user, and model weaknesses can be both task-specific and context-dependent.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.5%
10.0%
2
Scientific Reports
3102 papers in training set
Top 10%
8.3%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
8.3%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.3%
7.1%
5
Artificial Intelligence in Medicine
15 papers in training set
Top 0.1%
7.1%
6
PLOS ONE
4510 papers in training set
Top 34%
4.3%
7
BMJ Health & Care Informatics
13 papers in training set
Top 0.2%
3.6%
8
Computers in Biology and Medicine
120 papers in training set
Top 0.9%
3.6%
50% of probability mass above
9
International Journal of Medical Informatics
25 papers in training set
Top 0.4%
3.6%
10
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.6%
11
Journal of Biomedical Informatics
45 papers in training set
Top 0.5%
3.6%
12
PLOS Digital Health
91 papers in training set
Top 0.8%
3.6%
13
Biology Methods and Protocols
53 papers in training set
Top 0.4%
3.0%
14
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.1%
3.0%
15
Journal of NeuroEngineering and Rehabilitation
28 papers in training set
Top 0.4%
2.4%
16
JMIR Medical Informatics
17 papers in training set
Top 0.5%
2.3%
17
Frontiers in Digital Health
20 papers in training set
Top 0.7%
1.7%
18
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.5%
1.7%
19
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
20
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.6%
1.2%
21
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
22
Cureus
67 papers in training set
Top 4%
0.9%
23
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 5%
0.9%
24
JAMIA Open
37 papers in training set
Top 1%
0.8%
25
Journal of Personalized Medicine
28 papers in training set
Top 1%
0.8%
26
Bioinformatics
1061 papers in training set
Top 10%
0.7%
27
iScience
1063 papers in training set
Top 38%
0.6%
28
Heliyon
146 papers in training set
Top 8%
0.6%