Back

Equitable Health Intelligence: An Open Benchmark of Multi-Population Machine Learning for Omics-Based Cancer Prognosis

Sharma, T.; Chopra, A. P.; Agrawal, L.; Verma, N. K.; Starlard-Davenport, A.; Wang, J.; Hayes, D. N.; Cui, Y.

2026-06-02 bioinformatics
10.64898/2026.05.29.728755 bioRxiv
Show abstract

PurposeMachine learning (ML) models for omics-based cancer prognosis are often trained on data from predominantly European-ancestry populations, producing biased predictions for other populations and undermining equitable genomic medicine. Existing fairness benchmarks mainly focus on outcome parity rather than predictive performance parity across populations. Public benchmark resources are needed for systematically detecting and mitigating such performance disparities in multi-population cancer prognosis. MethodsWe developed Equitable Health Intelligence (EHI, https://ehiportal.org), an open-source benchmark of multi-population ML for omics-based cancer prognosis. EHI contains 1,475 ML tasks across 40 cancer/pan-cancer types, 4 omics feature sets, 4 clinical endpoints, 5 event-time thresholds, and 3 data-disadvantaged population (DDP) groups relative to a majority European Ancestry population group. Deep neural network models are trained under three multi-population ML schemes (Mixture, Independent, and Transfer Learning), with Naive Transfer included as a no-adaptation control, comprising a total of 10,325 ML experiments. ResultsThe EHI platform provides an interactive environment with visualization and exploratory tools for users to inspect predictive performance disparities between the majority European-ancestry group and data-disadvantaged populations, evaluate the extent to which transfer learning mitigates these disparities, and examine the impact of feature engineering methods across cancer types, omics features, and clinical endpoints. ConclusionEHI is an open, interactive, and extensible benchmark for identifying and addressing performance disparities in multi-population ML for omics-based cancer prognosis. It provides a foundation for a growing ecosystem of methods targeting ML performance disparities arising from biomedical data inequality and population-level distribution shifts, thereby advancing equitable AI in precision oncology.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
14.6%
2
BMC Bioinformatics
383 papers in training set
Top 1%
7.3%
3
npj Precision Oncology
48 papers in training set
Top 0.1%
6.4%
4
Journal of Translational Medicine
46 papers in training set
Top 0.1%
4.9%
5
Bioinformatics
1061 papers in training set
Top 5%
4.0%
6
GigaScience
172 papers in training set
Top 0.5%
3.6%
7
Genome Medicine
154 papers in training set
Top 2%
3.6%
8
PLOS Computational Biology
1633 papers in training set
Top 11%
3.3%
9
Scientific Reports
3102 papers in training set
Top 49%
2.1%
10
PLOS ONE
4510 papers in training set
Top 47%
2.1%
50% of probability mass above
11
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
12
Database
51 papers in training set
Top 0.3%
1.9%
13
eBioMedicine
130 papers in training set
Top 1%
1.8%
14
Nature Communications
4913 papers in training set
Top 51%
1.7%
15
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
16
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
17
Nucleic Acids Research
1128 papers in training set
Top 13%
1.4%
18
BMC Medical Genomics
36 papers in training set
Top 0.6%
1.4%
19
npj Digital Medicine
97 papers in training set
Top 3%
1.4%
20
BioData Mining
15 papers in training set
Top 0.5%
1.2%
21
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.1%
22
The Lancet Digital Health
25 papers in training set
Top 0.8%
1.0%
23
Cancer Research Communications
46 papers in training set
Top 0.9%
0.9%
24
Patterns
70 papers in training set
Top 2%
0.8%
25
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
26
Frontiers in Oncology
95 papers in training set
Top 3%
0.8%
27
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%
28
Artificial Intelligence in Medicine
15 papers in training set
Top 0.6%
0.8%
29
JMIR Medical Informatics
17 papers in training set
Top 1%
0.8%
30
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%