Back

Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures

Wang, X.; Vanvalkenburg, A.; Odom-Mabey, A. R.; Ellner, J. J.; Hochberg, N. S.; Salgame, P.; Patil, P.; Johnson, W. E.

2023-01-20 bioinformatics
10.1101/2023.01.19.520627 bioRxiv
Show abstract

RationaleMany blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease, predict risk of progression from infection to disease, and monitor TB treatment outcomes. However, an unresolved issue is whether gene set enrichment analysis (GSEA) of the signature transcripts alone is sufficient for prediction and differentiation, or whether it is necessary to use the original statistical model created when the signature was derived. Intra-method comparison is complicated by the unavailability of original training data, missing details about the original trained model, and inadequate publicly-available software tools or source code implementing models. To facilitate these signatures replicability and appropriate utilization in TB research, comprehensive comparisons between gene set scoring methods with cross-data validation of original model implementations are needed. ObjectivesWe compared the performance of 19 TB gene signatures across 24 transcriptomic datasets using both re-rebuilt original models and gene set scoring methods to evaluate whether gene set scoring is a reasonable proxy to the performance of the original trained model. We have provided an open-access software implementation of the original models for all 19 signatures for future use. MethodsWe considered existing gene set scoring and machine learning methods, including ssGSEA, GSVA, PLAGE, Singscore, and Zscore, as alternative approaches to profile gene signature performance. The sample-size-weighted mean area under the curve (AUC) value was computed to measure each signatures performance across datasets. Correlation analysis and Wilcoxon paired tests were used to analyze the performance of enrichment methods with the original models. Measurement and Main ResultsFor many signatures, the predictions from gene set scoring methods were highly correlated and statistically equivalent to the results given by the original diagnostic models. PLAGE outperformed all other gene scoring methods. In some cases, PLAGE outperformed the original models when considering signatures weighted mean AUC values and the AUC results within individual studies. ConclusionGene set enrichment scoring of existing blood-based biomarker gene sets can distinguish patients with active TB disease from latent TB infection and other clinical conditions with equivalent or improved accuracy compared to the original methods and models. These data justify using gene set scoring methods of published TB gene signatures for predicting TB risk and treatment outcomes, especially when original models are difficult to apply or implement.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 12%
15.1%
2
Tuberculosis
11 papers in training set
Top 0.1%
15.1%
3
Scientific Reports
3102 papers in training set
Top 22%
5.0%
4
PLOS Global Public Health
293 papers in training set
Top 2%
5.0%
5
Bioinformatics
1061 papers in training set
Top 4%
5.0%
6
Clinical Infectious Diseases
231 papers in training set
Top 1%
3.7%
7
Journal of Clinical Microbiology
120 papers in training set
Top 0.5%
3.7%
50% of probability mass above
8
PLOS Computational Biology
1633 papers in training set
Top 11%
3.1%
9
BMC Bioinformatics
383 papers in training set
Top 3%
3.1%
10
PeerJ
261 papers in training set
Top 4%
2.4%
11
The Journal of Infectious Diseases
182 papers in training set
Top 2%
2.1%
12
Frontiers in Medicine
113 papers in training set
Top 2%
2.1%
13
Microbiology Spectrum
435 papers in training set
Top 2%
1.9%
14
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 3%
1.8%
15
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.7%
16
Microbial Genomics
204 papers in training set
Top 1%
1.5%
17
American Journal of Respiratory and Critical Care Medicine
39 papers in training set
Top 0.5%
1.5%
18
BMC Research Notes
29 papers in training set
Top 0.4%
0.8%
19
The Lancet Microbe
43 papers in training set
Top 1%
0.8%
20
Trials
25 papers in training set
Top 2%
0.8%
21
PLOS Neglected Tropical Diseases
378 papers in training set
Top 5%
0.8%
22
Thorax
32 papers in training set
Top 0.8%
0.8%
23
Diagnostics
48 papers in training set
Top 2%
0.8%
24
PLOS Digital Health
91 papers in training set
Top 3%
0.8%
25
Nature Communications
4913 papers in training set
Top 63%
0.7%
26
Frontiers in Microbiology
375 papers in training set
Top 10%
0.7%
27
Clinical Chemistry
22 papers in training set
Top 1.0%
0.7%
28
Frontiers in Immunology
586 papers in training set
Top 9%
0.7%
29
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
30
Frontiers in Veterinary Science
30 papers in training set
Top 1%
0.5%