Back

Analysis of the Cross-Study Replicability of Tuberculosis Gene Signatures Using 49 Curated Transcriptomic Datasets

Wang, X.; Harper, K.; Sinha, P.; Johnson, W. E.; Patil, P.

2023-12-01 genomics
10.1101/2023.12.01.569442 bioRxiv
Show abstract

BackgroundTuberculosis (TB) is the leading cause of infectious disease mortality worldwide. Numerous blood-based gene expression signatures have been proposed in the literature as alternative tools for diagnosing TB infection. Ongoing efforts are actively focused on developing additional signatures in other TB-related contexts. However, the generalizability of these signatures to different patient contexts is not well-characterized. There is a pressing need for a well-curated database of TB gene expression studies for the systematic assessment of existing and newly developed TB gene signatures. ResultsWe built the curatedTBData, a manually-curated database of 49 TB transcriptomic studies. This data resource is freely available through GitHub and as an R Bioconductor package that allows users to validate new and existing biomarkers without the challenges of harmonizing heterogeneous studies. We also demonstrate the use of this data resource with cross-study comparisons for 72 TB gene signatures. For the comparison of subjects with active TB from healthy controls, 19 gene signatures had weighted mean AUC of 0.90 or greater, with the highest result of 0.94. In active TB disease versus latent TB infection, 7 gene signatures had weighted mean AUC of 0.90 or greater, with a maximum of 0.93. We also explore ensembling methods for averaging predictions from multiple gene signatures to significantly improve diagnostic ability beyond any single signature. ConclusionsThe curatedTBData data package offers a comprehensive resource of curated gene expression and clinically annotated data. It could be used to identify robust new TB gene signatures, to perform comparative analysis of existing TB gene signatures, and to develop alternative gene set scoring or ensembling methods, among other things. This resource will also facilitate the development of new signatures that are generalizable across cohorts or more applicable to specific subsets of patients (e.g. with rare comorbid conditions, etc.). We demonstrated that these blood-based gene signatures could distinguish patients with distinct TB outcomes; moreover, the combination of multiple gene signatures could improve the overall predictive accuracy in differentiating these subtypes, which point out an important aspect for the translation of genomics to clinical implementation.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Tuberculosis
11 papers in training set
Top 0.1%
19.5%
2
PLOS ONE
4510 papers in training set
Top 27%
6.4%
3
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 0.4%
6.4%
4
Clinical Infectious Diseases
231 papers in training set
Top 0.8%
6.3%
5
International Journal of Infectious Diseases
126 papers in training set
Top 0.3%
4.9%
6
Scientific Reports
3102 papers in training set
Top 27%
4.3%
7
Frontiers in Public Health
140 papers in training set
Top 2%
3.6%
50% of probability mass above
8
Microbial Genomics
204 papers in training set
Top 0.8%
2.6%
9
Journal of Infection
71 papers in training set
Top 0.9%
2.1%
10
Clinical Chemistry
22 papers in training set
Top 0.3%
1.8%
11
Journal of Clinical Microbiology
120 papers in training set
Top 0.9%
1.8%
12
Frontiers in Medicine
113 papers in training set
Top 3%
1.8%
13
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
14
Genomics
60 papers in training set
Top 1%
1.7%
15
The Journal of Infectious Diseases
182 papers in training set
Top 3%
1.7%
16
PLOS Neglected Tropical Diseases
378 papers in training set
Top 3%
1.7%
17
Epidemiology and Infection
84 papers in training set
Top 2%
1.5%
18
Frontiers in Microbiology
375 papers in training set
Top 6%
1.5%
19
PeerJ
261 papers in training set
Top 9%
1.3%
20
EBioMedicine
39 papers in training set
Top 0.6%
1.2%
21
Microbiology Spectrum
435 papers in training set
Top 4%
1.2%
22
Genome Medicine
154 papers in training set
Top 6%
1.2%
23
Open Forum Infectious Diseases
134 papers in training set
Top 2%
1.1%
24
Bioinformatics
1061 papers in training set
Top 8%
1.1%
25
PLOS Global Public Health
293 papers in training set
Top 5%
1.0%
26
BMC Infectious Diseases
118 papers in training set
Top 5%
0.9%
27
International Journal of Epidemiology
74 papers in training set
Top 2%
0.9%
28
BMC Genomics
328 papers in training set
Top 5%
0.8%
29
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.7%
30
BMC Medical Genomics
36 papers in training set
Top 1%
0.7%