Back

Enhancing Automated Medical Coding: Evaluating Embedding Models for ICD-10-CM Code Mapping

klotzman, v.

2024-07-03 health informatics
10.1101/2024.07.02.24309849 medRxiv
Show abstract

PurposeThe goal of this study is to enhance automated medical coding (AMC) by evaluating the effectiveness of modern embedding models in capturing semantic similarity and improving the retrieval process for ICD-10-CM code mapping. Achieving consistent and accurate medical coding practices is crucial for effective healthcare management. MethodsWe compared the performance of embedding models, including text-embedding-3-large, text-embedding-004, voyage-large-2-instruct, and mistralembed, against ClinicalBERT. These models were assessed for their ability to capture semantic similarity between long and short ICD-10-CM descriptions and to improve the retrieval process for mapping diagnosis strings from the eICU database to the correct ICD-10-CM codes. ResultsThe text-embedding-3-large and text-embedding-004 models outperformed ClinicalBERT in capturing semantic similarity, with text-embedding-3-large achieving the highest accuracy. For ICD-10 code retrieval, the voyage-large-2-instruct model demonstrated the best performance. Using the 15 nearest neighbors provided the best results. Increasing the number beyond this did not improve accuracy due to a lack of meaningful information. ConclusionModern embedding models significantly outperform specialized models like ClinicalBERT in AMC tasks. These findings underscore the potential of these models to enhance medical coding practices, in spite of the challenges with ambiguous diagnosis descriptions.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.1%
32.6%
2
JMIR Medical Informatics
17 papers in training set
Top 0.1%
14.2%
3
Journal of Biomedical Informatics
45 papers in training set
Top 0.2%
8.3%
50% of probability mass above
4
International Journal of Medical Informatics
25 papers in training set
Top 0.2%
6.3%
5
JAMIA Open
37 papers in training set
Top 0.2%
6.2%
6
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.6%
4.8%
7
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.5%
8
BMJ Health & Care Informatics
13 papers in training set
Top 0.3%
2.4%
9
Artificial Intelligence in Medicine
15 papers in training set
Top 0.2%
2.1%
10
Scientific Reports
3102 papers in training set
Top 56%
1.8%
11
PLOS ONE
4510 papers in training set
Top 54%
1.7%
12
Healthcare
16 papers in training set
Top 1.0%
1.3%
13
BMC Medical Research Methodology
43 papers in training set
Top 0.9%
1.1%
14
Frontiers in Public Health
140 papers in training set
Top 7%
0.9%
15
Biology Methods and Protocols
53 papers in training set
Top 2%
0.9%
16
npj Digital Medicine
97 papers in training set
Top 3%
0.8%
17
PLOS Digital Health
91 papers in training set
Top 3%
0.7%
18
Journal of the American Heart Association
119 papers in training set
Top 4%
0.7%
19
JMIR Public Health and Surveillance
45 papers in training set
Top 4%
0.6%
20
Frontiers in Digital Health
20 papers in training set
Top 2%
0.6%
21
Computers in Biology and Medicine
120 papers in training set
Top 6%
0.6%
22
Heliyon
146 papers in training set
Top 8%
0.6%