Back

Combining multimorbidity clustering with limited demographic information enables high-precision outcome predictions

Ferreira, F. S.; Le Lannou, E.; Post, B.; Haar, S.; Kadiverlu, B.; Brett, S. J.; Faisal, A. A.

2024-05-28 health informatics
10.1101/2024.05.28.24308024 medRxiv
Show abstract

Multimorbidity, the coexistence of multiple health conditions in individuals, is prevalent and increasing worldwide, proving to be a growing challenge for patients and the healthcare systems. Furthermore, the prevalence of multimorbidity contributes to an increased risk of hospital admission or even death. In this study, we employ a principled approach that utilises longitudinal data routinely collected in electronic health records linked to half a million people from the UK biobank to generate digital comorbidity fingerprints (DCFs) using a topic modelling approach, Latent Dirichlet Allocation. These comorbidity fingerprints summarise a patients full secondary care clinical history, i.e. their comorbidities and past interventions. We identified 18 clinically relevant DCFs, which captured nuanced combinations of diseases and risk factors, e.g. grouping cardiovascular disorders with common risk factors but also novel groupings that are not obvious and differ in both their breadth and depth from existing observational disease associations. The DCFs, combined with demographic characteristics, performed on par or outperformed traditional models of all-cause mortality or hospital admission, showcasing the potential of data-driven strategies in healthcare forecasting. The comorbidity fingerprints together with age and number of hospital admissions were shown to be the most important factors in the predictions. Additionally, our DCF approach showed robust and consistent performance over time. Our findings underscore the promising role of interpretable data-driven approaches in healthcare forecasting, suggesting improved risk profiling for individual clinical decisions and targeted public health interventions, with consistent and robust performance over time. Author summaryThis study addresses the global challenge of multimorbidity, the presence of multiple health conditions in individuals, which is on the rise and poses a significant burden on patients and healthcare systems. Investigating its impact on the risk of hospitalization or mortality, we employ a sophisticated approach using longitudinal data from the UK Biobank to create digital comorbidity fingerprints (DCFs) through natural language processing methods. These DCFs, summarizing a patients complete clinical history, reveal 18 clinically relevant patterns, including unique combinations of diseases and risk factors. When combined with patient demographic and lifestyle data, the DCF approach performs similarly to traditional models in predicting all-cause mortality or hospitalization. Notably, the DCF approach demonstrates robust and consistent performance over time, highlighting its potential for enhancing healthcare forecasting. These findings emphasize the value of interpretable data-driven strategies in healthcare, offering improved risk profiling for individual clinical decisions and targeted public health interventions with enduring reliability.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
33.1%
2
npj Digital Medicine
97 papers in training set
Top 0.2%
22.6%
50% of probability mass above
3
PLOS Digital Health
91 papers in training set
Top 0.5%
4.9%
4
Scientific Reports
3102 papers in training set
Top 27%
4.3%
5
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.8%
3.6%
6
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.8%
3.6%
7
Journal of Personalized Medicine
28 papers in training set
Top 0.1%
2.6%
8
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.9%
9
JMIR Medical Informatics
17 papers in training set
Top 0.6%
1.9%
10
Communications Medicine
85 papers in training set
Top 0.4%
1.3%
11
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.2%
12
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.1%
13
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.5%
1.1%
14
Nature Communications
4913 papers in training set
Top 58%
1.0%
15
Patterns
70 papers in training set
Top 2%
1.0%
16
iScience
1063 papers in training set
Top 29%
0.8%
17
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
18
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
19
JAMIA Open
37 papers in training set
Top 1%
0.7%
20
European Journal of Epidemiology
40 papers in training set
Top 0.8%
0.7%
21
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
22
Frontiers in Digital Health
20 papers in training set
Top 1%
0.7%
23
BMJ Open
554 papers in training set
Top 13%
0.6%
24
PLOS ONE
4510 papers in training set
Top 73%
0.5%
25
Frontiers in Public Health
140 papers in training set
Top 10%
0.5%