Back

MyGESig: a population-specific gene signature improves survival prediction in Malaysian breast cancer patients

Khairi, M. H. F. B.; Wong, Z. L.; Ang, B. H.; Phipps-Tan, J.; Nur Fatin, P.; Pathmanathan, R.; Hoong, S. M.; Mohd Taib, N. A.; Yip, C.-H.; Ho, W. K.; Tai, M. C.; Teo, S.-H.; Cheong, S. C.; Jia-Wern, P.

2025-09-02 genetic and genomic medicine
10.1101/2025.08.28.25334111 medRxiv
Show abstract

Accurate prognostic models are essential for guiding treatment decisions and improving patient outcomes in breast cancer. To achieve this, population-specific models are needed to account for genetic, clinical, and pathological differences across populations. In this study, the widely used and freely available PREDICT v3.0 breast cancer prognostic model was first validated in the multiethnic Malaysian Breast Cancer (MyBrCa) cohort to assess its performance. Given its only moderate performance in this population, a machine learning workflow was developed to integrate gene expression and clinical information for classifying patients by their 10-year prognosis. A 77-gene signature, termed MyGESig, was derived from the transcriptomes of 258 MyBrCa patients. Using this signature in combination with clinical variables, an ensemble-based model achieved a median area under the receiver-operator characteristic curve (AUROC) of 0.92 in the hold-out testing set and 0.90 in the independent MyBrCa dataset. While the model exhibited poor generalizability in external cohorts, its discriminative performance improved when trained and tested within the same population (median AUROC: 0.71 in METABRIC; 0.84 in SCAN-B), validating the prognostic value of the gene set. Together, these findings demonstrate the value of incorporating population-specific gene expression datasets into prognosis prediction and highlight the need to develop and validate models tailored to diverse populations in breast cancer.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 2%
14.8%
2
Breast Cancer Research
32 papers in training set
Top 0.1%
12.4%
3
Cancers
200 papers in training set
Top 0.6%
7.2%
4
Frontiers in Bioinformatics
45 papers in training set
Top 0.1%
4.9%
5
Genome Medicine
154 papers in training set
Top 2%
4.3%
6
Frontiers in Genetics
197 papers in training set
Top 2%
3.6%
7
iScience
1063 papers in training set
Top 5%
3.6%
50% of probability mass above
8
Genomics
60 papers in training set
Top 0.5%
2.7%
9
PLOS ONE
4510 papers in training set
Top 46%
2.4%
10
European Journal of Cancer
10 papers in training set
Top 0.1%
2.4%
11
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
12
Frontiers in Oncology
95 papers in training set
Top 2%
1.7%
13
Nature Communications
4913 papers in training set
Top 51%
1.7%
14
Annals of Oncology
13 papers in training set
Top 0.5%
1.7%
15
Translational Oncology
18 papers in training set
Top 0.1%
1.7%
16
eBioMedicine
130 papers in training set
Top 2%
1.5%
17
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.3%
18
Cell Genomics
162 papers in training set
Top 4%
1.3%
19
Communications Biology
886 papers in training set
Top 14%
1.2%
20
International Journal of Cancer
42 papers in training set
Top 0.8%
1.2%
21
npj Breast Cancer
18 papers in training set
Top 0.1%
1.1%
22
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
1.0%
23
Biomedicines
66 papers in training set
Top 2%
1.0%
24
Heliyon
146 papers in training set
Top 4%
1.0%
25
Cancer Research Communications
46 papers in training set
Top 0.9%
0.9%
26
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
27
BMC Cancer
52 papers in training set
Top 2%
0.8%
28
Communications Medicine
85 papers in training set
Top 1%
0.8%
29
BMC Genomics
328 papers in training set
Top 6%
0.8%
30
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.6%
0.8%