Back

Integrative Bioinformatics Approach to Identify Prognostic Gene Signatures for Risk Stratification in Thyroid Carcinoma

Malik, S.; Raghava, G. P. S.

2026-04-27 bioinformatics
10.64898/2026.04.23.720344 bioRxiv
Show abstract

Thyroid cancer is a heterogeneous malignancy with variable outcomes, highlighting the need for reliable biomarkers and effective risk stratification. In this study, we implemented a multi-step integrative framework to identify distinct prognostic biomarker sets using transcriptomic data from 572 thyroid cancer patients. Correlation analysis followed by false discovery rate (FDR) correction revealed significant associations of genes. Notably, MAFF (r = 0.25, p = 1.34x10-, FDR = 2.46x10-), NR4A3 (r = 0.24, p = 1.26x10-, FDR = 9.25x10-), and SRF showed strong positive correlations, whereas LOC728264 (r = -0.21, p = 7.39x10-, FDR = 6.36x10-) and VAMP1 (r = -0.20, p = 1.20x10-, FDR = 1.3x10-) exhibited negative correlations with OS. Univariate Cox regression identified several survival-associated genes, including TMEM90B (HR = 10.66, p = 2.88x10-) and PTH1R (HR = 9.88, p = 5.55x10-). LASSO regression further identified 31 key prognostic genes, including 13 potential drug targets predominantly functioning as inhibitors. Machine learning models based on seven independent 20-gene biomarker sets effectively predicted Class 0 (0-1 years), Class 1 (1-3 years), Class 2 (3-5 years), and Class 3 (>5 years), achieving AUC values of 0.91-0.94 and Kappa up to 0.76. An ensemble model further improved prediction (AUC = 0.95, Kappa = 0.72). Incorporating clinical variables (age, gender, stage) enhanced model performance (AUC = 0.96, Kappa = 0.80). Reduced 10- and 5-gene subsets demonstrated consistent yet slightly lower performance (AUC = 0.90 and 0.86, respectively). Collectively, the 20-gene set exhibited the strongest predictive and prognostic potential, highlighting the importance of integrating molecular and clinical features for risk stratification in thyroid cancer.All data and code are openly available (https://github.com/raghavagps/THCA_prognostic_biomarkers), supporting future research in thyroid cancer prediction.

Matching journals

The top 13 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 5%
10.3%
2
BMC Bioinformatics
383 papers in training set
Top 1%
6.5%
3
PLOS ONE
4510 papers in training set
Top 26%
6.5%
4
Bioinformatics
1061 papers in training set
Top 5%
4.4%
5
PLOS Computational Biology
1633 papers in training set
Top 8%
4.1%
6
Cancers
200 papers in training set
Top 1%
4.1%
7
PeerJ
261 papers in training set
Top 2%
3.7%
8
International Journal of Cancer
42 papers in training set
Top 0.4%
2.4%
9
Nature Communications
4913 papers in training set
Top 47%
2.1%
10
Frontiers in Oncology
95 papers in training set
Top 2%
1.9%
11
Genomics
60 papers in training set
Top 0.7%
1.9%
12
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.8%
13
Frontiers in Genetics
197 papers in training set
Top 4%
1.8%
50% of probability mass above
14
Frontiers in Bioinformatics
45 papers in training set
Top 0.2%
1.8%
15
Journal of Translational Medicine
46 papers in training set
Top 0.8%
1.7%
16
Molecular Therapy Nucleic Acids
32 papers in training set
Top 0.3%
1.7%
17
iScience
1063 papers in training set
Top 14%
1.7%
18
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.5%
19
Communications Biology
886 papers in training set
Top 12%
1.4%
20
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.3%
21
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
1.1%
22
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.1%
23
npj Precision Oncology
48 papers in training set
Top 0.9%
1.0%
24
Translational Oncology
18 papers in training set
Top 0.2%
1.0%
25
JNCI Cancer Spectrum
10 papers in training set
Top 0.4%
1.0%
26
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
1.0%
27
BMC Cancer
52 papers in training set
Top 2%
0.9%
28
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
29
Frontiers in Immunology
586 papers in training set
Top 7%
0.8%
30
Journal of Personalized Medicine
28 papers in training set
Top 1.0%
0.8%