Outcome Prediction Models for Critically Ill Patients Using Small Routine Laboratory Datasets
Cao, X.; Hou, J.; Wei, X.; Wang, Q.
Show abstract
We present a suite of foundational, outcome prediction models for critically ill patients, developed using readily available, routine blood tests and advanced machine learning techniques. The input data of the models includes complete blood counts (CBCs), metabolic panels, and additional biomarkers that assess liver and kidney function, coagulation status, and cardiac injury. The output yields the predicted outcome at a given future horizon. For diagnoses, the length of the future horizon is set to zero, while it is set to a fixed time interval for prognoses. The training dataset in this study comprises clinical data from 332 ICU patients, augmented with 200 synthetic samples generated via a conditional diffusion model. Generative machine learning based data imputation and augmentation approaches yielded modest gains in predictive accuracy. However, substantial performance improvements were achieved through additional methods, including dimensionality and order reduction, SHAP based feature importance analysis, and a novel time series to image encoding strategy that enables the use of image based classifiers for temporal clinical data. Principal component analysis based order reduction produced measurable gains in outcome prediction, while the time series to image encoding proved particularly effective in mitigating small data limitations common in clinical research. Across all evaluation metrics, accuracy, precision, recall, F1 score, and AUROC, the prognostic models achieved performance exceeding 85\%, with some models attaining AUROC scores above 90%. We innovated a new model ensemble approach to optimize the predictive outcome. This ensemble modeling approach improves the overall prediction, pushing all assessment metrics over 90% . This work establishes a robust and interpretable AI enabled diagnostic and prognostic toolkit for outcome predictions in critically ill patients and demonstrates a scalable workflow for developing high performing models from sparse healthcare datasets. The proposed framework is readily deployable in ICU environments with routine blood testing capabilities and serves as a foundation for future integration into digital twin systems for critical care.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.