Data Heterogeneity in Federated Learning with Electronic Health Records: Case Studies of Risk Prediction for Acute Kidney Injury and Sepsis Diseases in Critical Care

Rajendran, S.; Xu, Z.; Pan, W.; Ghosh, A.; Wang, F.

2022-09-01 health informatics

10.1101/2022.08.30.22279382 medRxiv

Show abstract

With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality of care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular research topic in healthcare analytics. However, concerns about privacy of healthcare data may hinder the development of effective predictive models that are generalizable because this often requires rich diverse data from multiple clinical institutions. Recently, federated learning (FL) has demonstrated promise in addressing this concern. However, data heterogeneity from different local participating sites may affect prediction performance. Exploring such heterogeneity of data sources would aid in building accurate risk prediction models in FL. Due to acute kidney injury (AKI) and sepsis high prevalence among patients admitted to intensive care units (ICU), the early prediction of these conditions based on AI is an important topic in critical care medicine. In this study, we take AKI and sepsis onset risk prediction in ICU as two examples to explore the impact of data heterogeneity in the FL framework for risk prediction using EHR data across multiple hospitals. In particular, we built predictive models based on local, pooled, and FL frameworks. The local framework only used data from each site itself. The pooled framework combined data from all sites. In the FL framework, each local site did not have access to other sites data. A model was trained locally and its parameters were shared to a central aggregator, which was used to update the federated models weights and then subsequently, shared with each site. We found models built within a FL framework outperformed local counterparts. Then, we analyzed variable importance discrepancies across sites and frameworks. Finally, we explored potential sources of the heterogeneity within the EHR data. The different distributions of demographic profiles, medication use, and site information contributed to data heterogeneity. Author SummaryThe availability of a large amount of healthcare data such as Electronic Health Records (EHR) and advances of artificial intelligence (AI) techniques provides opportunities to build predictive models for disease risk prediction. Due to the sensitive nature of healthcare data, it is challenging to collect the data together from different hospitals and train a unified model on the combined data. Recent federated learning (FL) demonstrates promise in addressing the fragmented healthcare data sources with privacy-preservation. However, data heterogeneity in the FL framework may influence prediction performance. Exploring the heterogeneity of data sources would contribute to building accurate disease risk prediction models in FL. In this study, we take acute kidney injury (AKI) and sepsis prediction in intensive care units (ICU) as two examples to explore the effects of data heterogeneity in the FL framework for disease risk prediction using EHR data across multiple hospital sites. In particular, multiple predictive models were built based on local, pooled, and FL frameworks. The local framework only used data from each site itself. The pooled framework combined data from all sites. In the FL framework, each local site did not have access to other sites data. We found models built within a FL framework outperformed local counterparts. Then, we analyzed variable importance discrepancies across sites and frameworks. Finally, we explored potential sources of the heterogeneity within EHR data. The different distributions of demographic profiles, medication use, site information such as the type of ICU at admission contributed to data heterogeneity.

Data Heterogeneity in Federated Learning with Electronic Health Records: Case Studies of Risk Prediction for Acute Kidney Injury and Sepsis Diseases in Critical Care

Matching journals