Back

Federated Learning Performance Depends on Site Variation in Global HIV Data Consortia

Jackson, N. J.; Yan, C.; Caro-Vega, Y.; Paredes, F.; Ismerio Moreira, R.; Cadet, S.; Varela, D.; Cesar, C.; Duda, S. N.; Shepherd, B. E.; Malin, B. A.

2026-03-27 health informatics
10.64898/2026.03.25.26349286 medRxiv
Show abstract

Digital health technologies, including machine learning (ML), are transforming infectious disease management, however ML models for HIV care have been limited by data sharing restrictions that prevent multi-site collaboration. Federated Learning (FL) offers a privacy-preserving solution, enabling cross-site model training without sharing patient-level data. We evaluated FL for developing clinical prediction models using data from 22,234 people living with HIV (PLWH) across six sites in five countries within the Caribbean, Central, and South America network for HIV epidemiology (CCASAnet). Across four prediction tasks --- 1-year mortality, 3-year mortality, tuberculosis incidence, and AIDS-defining cancer incidence --- FL algorithms achieved near-centralized performance while substantially outperforming site-specific models. Performance gains varied across sites, driven by both site size and between-site heterogeneity. Local fine-tuning often improved FL performance, though benefits were task dependent. These findings support FL as a scalable, privacy-preserving infrastructure for multi-site ML in international HIV research.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.3%
17.1%
2
Nature Communications
4913 papers in training set
Top 12%
14.0%
3
Patterns
70 papers in training set
Top 0.1%
8.0%
4
Nature Medicine
117 papers in training set
Top 0.3%
6.6%
5
PLOS Digital Health
91 papers in training set
Top 0.5%
4.7%
50% of probability mass above
6
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.8%
3.5%
7
Nature Computational Science
50 papers in training set
Top 0.2%
3.5%
8
Scientific Reports
3102 papers in training set
Top 39%
3.5%
9
PLOS ONE
4510 papers in training set
Top 45%
2.5%
10
Cell Systems
167 papers in training set
Top 6%
2.0%
11
Science Translational Medicine
111 papers in training set
Top 2%
2.0%
12
Nature Biomedical Engineering
42 papers in training set
Top 0.6%
2.0%
13
Science Advances
1098 papers in training set
Top 18%
1.7%
14
PLOS Computational Biology
1633 papers in training set
Top 17%
1.7%
15
eLife
5422 papers in training set
Top 43%
1.7%
16
The Lancet Digital Health
25 papers in training set
Top 0.5%
1.4%
17
International Journal of Medical Informatics
25 papers in training set
Top 1.0%
1.4%
18
Bioinformatics
1061 papers in training set
Top 8%
1.3%
19
Communications Medicine
85 papers in training set
Top 0.5%
1.2%
20
Communications Biology
886 papers in training set
Top 20%
0.9%
21
Frontiers in Digital Health
20 papers in training set
Top 1%
0.8%
22
Med
38 papers in training set
Top 0.7%
0.8%
23
GigaScience
172 papers in training set
Top 3%
0.7%
24
iScience
1063 papers in training set
Top 34%
0.7%
25
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
26
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.7%
27
Nature Machine Intelligence
61 papers in training set
Top 4%
0.6%
28
Journal of Biomedical Informatics
45 papers in training set
Top 2%
0.6%
29
Nature Genetics
240 papers in training set
Top 9%
0.6%
30
Advanced Science
249 papers in training set
Top 22%
0.6%