Towards reproducible multimorbidity clustering in electronic health records: a transparent pipeline for aligning research aims and methodology
Romero Moreno, G.; Restocchi, V.; De Ferrari, L.; Palmer, J.; Fleuriot, J. D.; Guthrie, B.; Lone, N. I.
Show abstract
The availability of electronic health records has facilitated data-driven approaches to the understanding of multimorbidity, with clustering becoming a common tool for uncovering relevant groups of associated conditions. Previous studies, however, have found challenges in their reproducibility, with wide disparity in the reported clusters. At the core of this issue lays a vagueness of the definition of a cluster, leading to a lack of standards in their methods and evaluation, while implementation details are often not completely reported or explicit in their assumptions. We present a methodological pipeline that can be adapted to different cluster definitions (e.g. multiple cluster membership or clusters where all nodes are mutually associated) and a set of scores that can be composed into an evaluation metric that explicitly incorporates assumptions that align with the research aims. We apply our pipeline to a healthcare dataset of over 7 million patients in England and show how clusters may drastically differ when varying the parameter choices, exposing the risks of reporting a single clustering realisation. Our methodological pipeline, evaluation framework, and tools for analysis and network visualisation serve as a reference to transparently explore and align methodological decisions to the aims of multimorbidity clustering, contributing to overcome the reproducibility challenges of the field.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.