Back

Phecoder: semantic retrieval for auditing and expanding ICD-based phenotypes in EHR biobanks

2026-01-11 health informatics Title + abstract only
View on medRxiv
Show abstract

BackgroundElectronic health record (EHR)-based phenotyping underpins genome-wide association studies, yet current ICD-code phenotypes rely heavily on manually curated lists such as Phecodes. These definitions are labour-intensive to maintain, inherently subjective, and may omit clinically relevant diagnostic codes, reducing study power. Advances in text embedding models offer an opportunity to automate and standardize ICD-based phenotype construction. MethodsWe developed Phecoder, an ensemble o...

Predicted journal destinations

1
Journal of the American Medical Informatics Association
53 training papers
Top 0.1% 21.9%
2
npj Digital Medicine
85 training papers
Top 2% 7.7%
3
Journal of Biomedical Informatics
37 training papers
Top 1% 6.3%
4
JAMIA Open
35 training papers
Top 2% 6.3%
5
Scientific Reports
701 training papers
Top 47% 5.9%
6
PLOS Digital Health
88 training papers
Top 7% 3.8%
7
Journal of Medical Internet Research
81 training papers
Top 5% 3.8%
8
BMC Medical Informatics and Decision Making
36 training papers
Top 4% 3.8%
9
JAMA Network Open
125 training papers
Top 6% 3.0%
10
PLOS ONE
1737 training papers
Top 89% 3.0%
11
International Journal of Medical Informatics
25 training papers
Top 2% 2.7%
12
BMJ Open
553 training papers
Top 49% 1.9%
13
Nature Communications
483 training papers
Top 37% 1.9%
14
JMIR Medical Informatics
16 training papers
Top 4% 1.5%
15
Translational Psychiatry
94 training papers
Top 7% 1.5%
16
BMC Medicine
155 training papers
Top 20% 1.3%
17
BMC Medical Research Methodology
41 training papers
Top 4% 1.3%
18
Frontiers in Psychiatry
56 training papers
Top 7% 1.3%
19
Nature Medicine
88 training papers
Top 10% 1.2%
20
eBioMedicine
82 training papers
Top 7% 1.2%
21
Frontiers in Digital Health
18 training papers
Top 3% 0.9%
22
JMIR Formative Research
31 training papers
Top 5% 0.9%
23
Molecular Psychiatry
84 training papers
Top 9% 0.9%
24
Journal of Medical Genetics
22 training papers
Top 4% 0.7%
25
Patterns
15 training papers
Top 4% 0.7%
26
Psychological Medicine
52 training papers
Top 7% 0.7%
27
Communications Medicine
63 training papers
Top 8% 0.7%
28
Computers in Biology and Medicine
39 training papers
Top 10% 0.7%
29
BMJ
49 training papers
Top 7% 0.7%