Back

Distribution-Aware Federated Learning for Diabetes Prediction Using Tabular Clinical Data Under Non-IID and Class-Imbalanced Settings

Amin, R.; Rana, M. M. H.; Aktar, S.

2026-03-08 developmental biology
10.64898/2026.03.05.709751 bioRxiv
Show abstract

Federated learning (FL) enables collaborative clinical model training without centralized data sharing, yet its deployment is hindered by statistical heterogeneity (non-IID data) and inherent class imbalance across healthcare institutions. Conventional aggregation strategies such as FedAvg and FedProx weight client updates solely by dataset size, ignoring class distributions and thereby biasing the global model toward the majority class. To address this, we propose Distribution-Aware Federated Learning (DA-FL), which introduces a minority-class amplification factor{phi} k computed as the ratio of a clients local positive class rate to the global positive class rate. Combined with class-weighted cross-entropy loss at the client level, DA-FL forms a two-level correction mechanism that mitigates imbalance without additional data sharing. Experiments on the CDC BRFSS 2021 diabetes dataset (236,378 records across five simulated clients under three non-IID levels) show that DA-FL improves F1-Macro by 18.2% and G-Mean by 26.7% over FedAvg under moderate non-IID conditions, while achieving 31-fold greater F1-Macro stability across 30 communication rounds. These findings demonstrate that DA-FL is an effective and practically deployable solution for federated clinical prediction under realistic non-IID and class-imbalanced settings.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 2%
26.3%
2
npj Digital Medicine
97 papers in training set
Top 0.3%
14.6%
3
PLOS ONE
4510 papers in training set
Top 31%
4.9%
4
Scientific Reports
3102 papers in training set
Top 22%
4.9%
50% of probability mass above
5
npj Systems Biology and Applications
99 papers in training set
Top 0.3%
4.4%
6
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 16%
4.4%
7
Nature Computational Science
50 papers in training set
Top 0.1%
3.7%
8
Cell Systems
167 papers in training set
Top 4%
3.1%
9
eLife
5422 papers in training set
Top 33%
2.4%
10
Advanced Science
249 papers in training set
Top 10%
1.7%
11
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
12
Science Advances
1098 papers in training set
Top 17%
1.7%
13
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.2%
1.7%
14
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.5%
15
Nature Machine Intelligence
61 papers in training set
Top 2%
1.4%
16
Communications Biology
886 papers in training set
Top 12%
1.4%
17
iScience
1063 papers in training set
Top 21%
1.2%
18
Nature Medicine
117 papers in training set
Top 3%
1.2%
19
Journal of Biomedical Informatics
45 papers in training set
Top 1%
1.2%
20
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.9%
21
Communications Medicine
85 papers in training set
Top 0.8%
0.8%
22
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.8%
23
PLOS Digital Health
91 papers in training set
Top 3%
0.8%
24
Cell Reports
1338 papers in training set
Top 36%
0.5%
25
Expert Systems with Applications
11 papers in training set
Top 0.7%
0.5%
26
BMC Bioinformatics
383 papers in training set
Top 8%
0.5%
27
Cell Reports Medicine
140 papers in training set
Top 10%
0.5%
28
Nature Neuroscience
216 papers in training set
Top 7%
0.5%
29
JMIR Medical Informatics
17 papers in training set
Top 2%
0.5%