Back

A Systematic Process for Assessing Fitness-for-Purpose of Health Outcomes for Computable Phenotyping with Electronic Health Record Data

Gatto, N. M.; Cronkite, D. J.; Wartko, P. D.; Ball, R.; Carrell, D. S.; Eniafe, R.; Desai, R. M.; Floyd, J. S.; Lee, T.; Nelson, J. C.; Shebl, F. M.; Schoeplein, R.; Toh, S.; Zhang, M.; Dublin, S.; Hernandez-Munoz, J. J.

2025-09-04 pharmacology and therapeutics
10.1101/2025.08.29.25334394 medRxiv
Show abstract

PurposeInformation from electronic health records (EHRs) may be incorporated into computable phenotype algorithms in efforts to overcome inaccuracies of algorithms based on administrative claims data alone. However, such efforts can be resource-intensive and unsuccessful. Assessing the feasibility of computable phenotyping for a health outcome of interest (HOI) before proceeding is therefore recommended. MethodsWe developed a systematic fitness-for-purpose (FFP) assessment process to implement concepts outlined in a previously described general framework for computable phenotyping incorporating EHR data. Our process includes verifying the HOI is well-defined, reviewing clinical information about the HOI, identifying existing algorithms and their performance, evaluating HOI clinical and data complexity, and determining an overall FFP conclusion and recommendation. We applied this process to ten HOIs lacking high-performing claims-based algorithms, selecting HOIs of public health importance that varied in clinical and data complexity, including neutropenia, pericardial effusion and drug-induced liver injury. ResultsHOIs assessed as having moderate (vs. easy) overall difficulty had characteristics such as the need for natural language processing, integration of multiple laboratory test results, or longitudinal EHR data. HOIs assessed as having high difficulty required using data from multiple EHR sources, ruling out many other potential causes, or relying on low-sensitivity diagnostic tests. Input from experts in EHR data and clinical care was crucial. ConclusionEHR data have potential to enhance accuracy of defining certain HOIs for research and surveillance compared to administrative claims data. The process and tools we created will support others in assessing FFP of HOIs for computable phenotyping. Five key pointsO_LIIncorporating electronic health record (EHR) data into computable phenotypes could improve accurate identification of health outcomes of interest (HOIs), but such work can be resource intensive. C_LIO_LIWe developed a systematic fitness-for-purpose (FFP) process and tools to assess the feasibility of computable phenotyping for HOIs. C_LIO_LISteps include identifying existing algorithms and their performance, ensuring the HOI is well-defined, evaluating clinical and data complexity, and determining a feasibility recommendation. C_LIO_LIDifficulty increased with a need for natural language processing, multiple laboratory tests, longitudinal EHR data, multiple EHR sources or ruling out other potential causes. C_LIO_LIInput from EHR data and clinical care experts was crucial to the FFP assessment process. C_LI Plain Language Summary (PLS)Attempts to identify diseases and health conditions by applying computer programs to information easily gleaned from insurance claims of tens of thousands of patients (such as FDAs ongoing safety monitoring of approved drugs or medical products) are often unsuccessful because the data lack nuance. Incorporating information from electronic health records (EHR) and patient chart notes may improve accurate identification of health outcomes. Because this can be resource-intensive, we designed a process and tools to assess the feasibility of including EHR data in computer algorithms to identify health outcomes. Steps included identifying existing algorithms and their performance, building familiarity with the outcome and making sure it is well-defined, evaluating clinical and data complexity, and determining a conclusion about feasibility. We applied our process to ten health outcomes of public health importance. Health outcomes were considered moderately difficult for computerized algorithms if they required natural language processing, integration of multiple laboratory tests, or EHR data from multiple timepoints. Health outcomes having high difficulty required using multiple EHR data types, ruling out many alternative causes of the HOI (other than medications), or relying on diagnostic tests of low accuracy. Input from EHR data and clinical care experts was crucial for the assessment process.

Published in Pharmacoepidemiology and Drug Safety (predicted rank #11) · training set

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
26.1%
2
BioData Mining
15 papers in training set
Top 0.1%
12.4%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
10.2%
4
PLOS ONE
4510 papers in training set
Top 19%
10.2%
50% of probability mass above
5
JAMIA Open
37 papers in training set
Top 0.1%
7.2%
6
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.7%
3.7%
7
BMJ Open
554 papers in training set
Top 8%
1.9%
8
JMIRx Med
31 papers in training set
Top 0.5%
1.9%
9
PeerJ
261 papers in training set
Top 7%
1.7%
10
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.5%
Pharmacoepidemiology and Drug Safety · published here
13 papers in training set
Top 0.2%
1.5%
12
BMC Medical Research Methodology
43 papers in training set
Top 0.8%
1.3%
13
Frontiers in Pharmacology
100 papers in training set
Top 3%
1.3%
14
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.1%
15
BMJ Health & Care Informatics
13 papers in training set
Top 0.7%
0.9%
16
F1000Research
79 papers in training set
Top 3%
0.9%
17
Clinical and Translational Science
21 papers in training set
Top 0.8%
0.9%
18
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%
19
Frontiers in Psychiatry
83 papers in training set
Top 3%
0.8%
20
Frontiers in Medicine
113 papers in training set
Top 7%
0.8%
21
JMIR Medical Informatics
17 papers in training set
Top 2%
0.7%
22
Heliyon
146 papers in training set
Top 7%
0.7%
23
Trials
25 papers in training set
Top 2%
0.6%
24
Pilot and Feasibility Studies
12 papers in training set
Top 0.7%
0.6%
25
Wellcome Open Research
57 papers in training set
Top 3%
0.5%
26
International Journal of Environmental Research and Public Health
124 papers in training set
Top 8%
0.5%