Back

Towards reproducible multimorbidity clustering in electronic health records: a transparent pipeline for aligning research aims and methodology

Romero Moreno, G.; Restocchi, V.; De Ferrari, L.; Palmer, J.; Fleuriot, J. D.; Guthrie, B.; Lone, N. I.

2026-05-26 health informatics
10.64898/2026.05.25.26353178 medRxiv
Show abstract

The availability of electronic health records has facilitated data-driven approaches to the understanding of multimorbidity, with clustering becoming a common tool for uncovering relevant groups of associated conditions. Previous studies, however, have found challenges in their reproducibility, with wide disparity in the reported clusters. At the core of this issue lays a vagueness of the definition of a cluster, leading to a lack of standards in their methods and evaluation, while implementation details are often not completely reported or explicit in their assumptions. We present a methodological pipeline that can be adapted to different cluster definitions (e.g. multiple cluster membership or clusters where all nodes are mutually associated) and a set of scores that can be composed into an evaluation metric that explicitly incorporates assumptions that align with the research aims. We apply our pipeline to a healthcare dataset of over 7 million patients in England and show how clusters may drastically differ when varying the parameter choices, exposing the risks of reporting a single clustering realisation. Our methodological pipeline, evaluation framework, and tools for analysis and network visualisation serve as a reference to transparently explore and align methodological decisions to the aims of multimorbidity clustering, contributing to overcome the reproducibility challenges of the field.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
13.8%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
8.8%
3
npj Digital Medicine
97 papers in training set
Top 0.7%
8.1%
4
Journal of Medical Internet Research
85 papers in training set
Top 0.9%
6.1%
5
PLOS Digital Health
91 papers in training set
Top 0.6%
4.1%
6
JAMIA Open
37 papers in training set
Top 0.3%
4.0%
7
European Journal of Epidemiology
40 papers in training set
Top 0.1%
3.8%
8
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.7%
3.8%
50% of probability mass above
9
Nature Communications
4913 papers in training set
Top 38%
3.8%
10
BMC Medical Research Methodology
43 papers in training set
Top 0.3%
3.5%
11
PLOS ONE
4510 papers in training set
Top 41%
3.5%
12
The Lancet Digital Health
25 papers in training set
Top 0.2%
3.5%
13
JMIR Medical Informatics
17 papers in training set
Top 0.4%
3.5%
14
Wellcome Open Research
57 papers in training set
Top 0.3%
3.5%
15
Scientific Reports
3102 papers in training set
Top 42%
3.0%
16
BMJ Open
554 papers in training set
Top 8%
2.0%
17
International Journal of Medical Informatics
25 papers in training set
Top 0.9%
1.6%
18
Frontiers in Digital Health
20 papers in training set
Top 1.0%
1.2%
19
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.6%
1.2%
20
British Journal of General Practice
22 papers in training set
Top 0.4%
1.1%
21
Bioinformatics
1061 papers in training set
Top 8%
1.1%
22
BMJ
49 papers in training set
Top 1%
0.7%
23
GigaScience
172 papers in training set
Top 3%
0.7%
24
PLOS Computational Biology
1633 papers in training set
Top 26%
0.7%
25
iScience
1063 papers in training set
Top 36%
0.7%
26
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.5%
0.7%
27
BMC Medicine
163 papers in training set
Top 8%
0.6%
28
BMJ Health & Care Informatics
13 papers in training set
Top 1%
0.6%