Deep Learning-Based Missing Value Imputation for Heart Failure Data from MIMIC-III: A Comparative Study of DAE, SAITS, and MICE+LightGBM

sharma, s.; KAUR, M.; GUPTA, S.

2026-02-11 health systems and quality improvement

10.64898/2026.02.10.26345979 medRxiv

Show abstract

BackgroundElectronic Health Records(EHR) are very crucial for Clinical Decision Support Systems and for proper care to be delivered to ICU heart failure patients, there is often missing data due to monitoring device errors thus the need for robust imputation methodologies. ObjectiveTo compare and evaluate three different methodologies for imputing missing data for heart failure patients from the MIMIC-III database: Denoising Autoencoder (DAE), Self-Attention Imputation for Time Series (SAITS), and Multiple Imputation by Chained Equations (MICE) with LightGBM. MethodsAnalysis of 14,090 ICU admissions for patients with heart failure was performed using data from the MIMIC-III database. Features were selected based off of clinical relevance, and 19 clinical features were selected through a combination of Random Forest analysis, correlation analysis, and Mutual Information. The introduction of artificial missing values of 20%, 30%, and 50% was applied to the data set, and then 3 imputation methodologies were evaluated with the DAE, SAITS, and MICE+LightGBM. The performance of each imputation methodology was evaluated using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Normalized Root Mean Square Error (NRMSE). ResultsBoth DAE and SAITS had superior performance on the imputation of missing values across all percentages of missing values. At 20% missingness, DAE had mean MAE = 0.004967, RMSE = 0.005217, and NRMSE = 3.260893 while SAITS had mean MAE = 0.005461, RMSE = 0.005797, and NRMSE = 3.244695; thus MICE+LightGBM resulted in a higher number of errors. At 50% missingness, the SAITS methodology demonstrated the best performance followed by DAE and MICE+LightGBM methods demonstrated decreased performance. The deep learning methodologies maintained a consistent level of accuracy between the clinical variables measured. ConclusionsOur analysis indicates that deep learning-based imputation methodologies significantly outperform traditional methodologies for imputing missing values in ICU heart failure data thus supporting the implementation of these methodologies into Clinical Decision Support Systems for heart failure patients.

Deep Learning-Based Missing Value Imputation for Heart Failure Data from MIMIC-III: A Comparative Study of DAE, SAITS, and MICE+LightGBM

Matching journals