Back

Electronic Health Record-Based Estimation of Kansas City Cardiomyopathy Questionnaire Scores in Heart Failure

Kim, Y. W.; Lau, W.; Patel, N.; Kendrick, K.; Wu, A.; Feldman, T.; Ahern, R.; Oka, A.

2026-04-05 health informatics
10.64898/2026.04.03.26350138 medRxiv
Show abstract

Background: The Kansas City Cardiomyopathy Questionnaire (KCCQ) is a validated patient-reported outcome measure for heart failure. However, its clinical utility is limited by incomplete and inconsistent data collection. We aimed to develop and validate machine learning models to estimate KCCQ overall summary scores from electronic health record (EHR) data. Methods: We assembled a retrospective cohort of 10,889 heart failure patients with recorded KCCQ scores from the Truveta database. Predictor features were derived from structured EHR variables across 13 historical time windows (15-360 days). Multiple regression algorithms were evaluated, followed by SHapley Additive exPlanations (SHAP)-based feature reduction and nested cross-validation for hyperparameter optimization. Model performance was assessed using the coefficient of determination (R2), mean absolute error (MAE), and ordinal discrimination and calibration for categorical severity classification. Results: Histogram-based gradient boosting (HGB) with HGB-SHAP feature selection achieved the strongest performance, reducing feature dimensionality by more than 94\% while maintaining estimation accuracy. The 240-day window performed best (R2=0.522, MAE=12.485). For categorical severity classification, the model demonstrated strong ordinal discrimination (mean ordinal AUROC=0.850). Quantile-based calibration improved classification balance, increasing the F1-score for the most severe category (KCCQ<25) from 0.180 to 0.428 and the quadratic weighted kappa from 0.601 to 0.640. Longer EHR observation windows were associated with improved prediction performance. Conclusion: Machine learning models can estimate KCCQ scores from routine EHR data with clinically meaningful accuracy and strong discriminatory performance. This approach may help extend assessment of patient-reported health status to populations in which survey-based data are incompletely captured, supporting population-level cardiovascular outcomes assessment and risk stratification in heart failure care.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 8%
9.2%
2
European Heart Journal - Digital Health
15 papers in training set
Top 0.1%
6.8%
3
PLOS Digital Health
91 papers in training set
Top 0.3%
6.8%
4
JMIR Medical Informatics
17 papers in training set
Top 0.1%
6.4%
5
Journal of the American Heart Association
119 papers in training set
Top 1%
6.4%
6
Journal of Medical Internet Research
85 papers in training set
Top 1%
4.3%
7
Frontiers in Cardiovascular Medicine
49 papers in training set
Top 1%
3.6%
8
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.8%
3.6%
9
PLOS ONE
4510 papers in training set
Top 39%
3.6%
50% of probability mass above
10
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.9%
3.1%
11
npj Digital Medicine
97 papers in training set
Top 1%
2.7%
12
JAMIA Open
37 papers in training set
Top 0.6%
2.4%
13
BMC Medical Research Methodology
43 papers in training set
Top 0.4%
2.1%
14
Frontiers in Physiology
93 papers in training set
Top 2%
2.1%
15
Journal of the American College of Cardiology
12 papers in training set
Top 0.2%
2.1%
16
BMC Cardiovascular Disorders
14 papers in training set
Top 0.9%
1.9%
17
European Respiratory Journal
54 papers in training set
Top 0.8%
1.9%
18
JMIR Public Health and Surveillance
45 papers in training set
Top 2%
1.7%
19
Circulation
66 papers in training set
Top 2%
1.7%
20
Journal of Biomedical Informatics
45 papers in training set
Top 0.8%
1.7%
21
Critical Care Explorations
15 papers in training set
Top 0.3%
1.5%
22
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.0%
23
The Lancet Digital Health
25 papers in training set
Top 0.7%
1.0%
24
Critical Care
14 papers in training set
Top 0.5%
0.9%
25
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.6%
0.9%
26
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
27
International Journal of Medical Informatics
25 papers in training set
Top 1%
0.8%
28
BMJ Health & Care Informatics
13 papers in training set
Top 0.8%
0.8%
29
eBioMedicine
130 papers in training set
Top 4%
0.7%
30
European Journal of Epidemiology
40 papers in training set
Top 0.7%
0.7%