Back

Deep Learning-Based Missing Value Imputation for Heart Failure Data from MIMIC-III: A Comparative Study of DAE, SAITS, and MICE+LightGBM

sharma, s.; KAUR, M.; GUPTA, S.

2026-02-11 health systems and quality improvement
10.64898/2026.02.10.26345979 medRxiv
Show abstract

BackgroundElectronic Health Records(EHR) are very crucial for Clinical Decision Support Systems and for proper care to be delivered to ICU heart failure patients, there is often missing data due to monitoring device errors thus the need for robust imputation methodologies. ObjectiveTo compare and evaluate three different methodologies for imputing missing data for heart failure patients from the MIMIC-III database: Denoising Autoencoder (DAE), Self-Attention Imputation for Time Series (SAITS), and Multiple Imputation by Chained Equations (MICE) with LightGBM. MethodsAnalysis of 14,090 ICU admissions for patients with heart failure was performed using data from the MIMIC-III database. Features were selected based off of clinical relevance, and 19 clinical features were selected through a combination of Random Forest analysis, correlation analysis, and Mutual Information. The introduction of artificial missing values of 20%, 30%, and 50% was applied to the data set, and then 3 imputation methodologies were evaluated with the DAE, SAITS, and MICE+LightGBM. The performance of each imputation methodology was evaluated using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Normalized Root Mean Square Error (NRMSE). ResultsBoth DAE and SAITS had superior performance on the imputation of missing values across all percentages of missing values. At 20% missingness, DAE had mean MAE = 0.004967, RMSE = 0.005217, and NRMSE = 3.260893 while SAITS had mean MAE = 0.005461, RMSE = 0.005797, and NRMSE = 3.244695; thus MICE+LightGBM resulted in a higher number of errors. At 50% missingness, the SAITS methodology demonstrated the best performance followed by DAE and MICE+LightGBM methods demonstrated decreased performance. The deep learning methodologies maintained a consistent level of accuracy between the clinical variables measured. ConclusionsOur analysis indicates that deep learning-based imputation methodologies significantly outperform traditional methodologies for imputing missing values in ICU heart failure data thus supporting the implementation of these methodologies into Clinical Decision Support Systems for heart failure patients.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.1%
33.3%
2
Physiological Measurement
12 papers in training set
Top 0.1%
18.9%
50% of probability mass above
3
PLOS ONE
4510 papers in training set
Top 24%
7.2%
4
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.1%
6.9%
5
Sensors
39 papers in training set
Top 0.3%
4.9%
6
Scientific Reports
3102 papers in training set
Top 40%
3.3%
7
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.5%
8
Journal of Biomedical Informatics
45 papers in training set
Top 0.6%
2.1%
9
PLOS Digital Health
91 papers in training set
Top 1%
2.1%
10
European Heart Journal - Digital Health
15 papers in training set
Top 0.3%
1.7%
11
JMIRx Med
31 papers in training set
Top 0.9%
1.3%
12
Frontiers in Digital Health
20 papers in training set
Top 0.8%
1.3%
13
Journal of Personalized Medicine
28 papers in training set
Top 0.8%
1.0%
14
Biology Methods and Protocols
53 papers in training set
Top 2%
0.9%
15
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
16
Frontiers in Cardiovascular Medicine
49 papers in training set
Top 3%
0.8%
17
Frontiers in Physiology
93 papers in training set
Top 6%
0.7%
18
International Journal of Environmental Research and Public Health
124 papers in training set
Top 8%
0.5%
19
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.5%
20
BMC Medical Research Methodology
43 papers in training set
Top 2%
0.5%
21
JMIR Medical Informatics
17 papers in training set
Top 2%
0.5%
22
F1000Research
79 papers in training set
Top 7%
0.5%