Back

MIMIC-IV-Phenotype-Atlas (MIPA) : A Publicly Available Dataset for EHR Phenotyping

Yamga, E.; Goudrar, R.; Despres, P.

2026-04-24 health informatics
10.64898/2026.04.16.26350888 medRxiv
Show abstract

Introduction Secondary use of electronic health records (EHRs) often requires transforming raw clinical information into research-grade data. A central step in this process is EHR phenotyping - the identification of patient cohorts defined by specific medical conditions. Although numerous approaches exist, from ICD-based heuristics to supervised learning and large language models (LLMs), the field lacks standardized benchmark datasets, limiting reproducibility and hindering fair comparison across methods. Methods We developed the MIMIC-IV Phenotype Atlas (MIPA) dataset, an adaptation of MIMIC-IV that provides expert-annotated discharge summaries across 16 phenotypes of varying prevalence and complexity. Two independent clinicians reviewed and labeled the discharge summaries, resolving disagreements by consensus. In parallel, we implemented a processing pipeline that extracts multimodal EHR features and generates training, validation, and testing datasets for supervised phenotyping. To illustrate MIPA's utility, we benchmarked four phenotyping methods : ICD-based classifiers, keyword-driven Term Frequency-Inverse Document Frequency (TF-IDF) classifiers, supervised machine learning (ML) models, and LLMs on the task. Results The final MIPA corpus consists of 1,388 expert-annotated discharge summaries. Annotation reliability was high (mean document-level kappa = 0.805, mean label-level kappa = 0.771), with 91% of disagreements resolved through consensus review. MIPA provides high-quality phenotype labels paired with structured EHR features and predefined train/validation/test splits for each phenotype. In the benchmarking case study, LLMs achieved the highest F1 scores in 13 of 16 phenotypes, particularly for conditions requiring contextual interpretation of clinical narrative, while supervised ML offered moderate improvements over rule-based baselines. Conclusion MIPA is the first publicly available benchmark dataset dedicated to EHR phenotyping, combining expert-curated annotations, broad phenotype coverage, and a reproducible processing pipeline. By enabling standardized comparison across ICD-based heuristics, ML models, and LLMs, MIPA provides a durable reference resource to advance methodological development in automated phenotyping.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
22.4%
2
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
17.4%
3
JAMIA Open
37 papers in training set
Top 0.1%
8.4%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.6%
4.8%
50% of probability mass above
5
npj Digital Medicine
97 papers in training set
Top 0.9%
4.8%
6
JMIR Medical Informatics
17 papers in training set
Top 0.3%
3.9%
7
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.9%
8
International Journal of Medical Informatics
25 papers in training set
Top 0.3%
3.9%
9
Frontiers in Digital Health
20 papers in training set
Top 0.3%
3.1%
10
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.3%
2.3%
11
Scientific Reports
3102 papers in training set
Top 54%
1.9%
12
PLOS Digital Health
91 papers in training set
Top 1%
1.7%
13
The Lancet Digital Health
25 papers in training set
Top 0.4%
1.7%
14
BMJ Health & Care Informatics
13 papers in training set
Top 0.5%
1.7%
15
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.5%
16
Journal of General Internal Medicine
20 papers in training set
Top 1%
0.7%
17
eBioMedicine
130 papers in training set
Top 5%
0.7%
18
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.7%
19
iScience
1063 papers in training set
Top 35%
0.7%
20
Genetics in Medicine
69 papers in training set
Top 1%
0.7%
21
European Heart Journal - Digital Health
15 papers in training set
Top 0.7%
0.6%
22
PLOS ONE
4510 papers in training set
Top 71%
0.6%