Back

Copy number variation analysis of 9,482 Mycobacterium tuberculosis isolates identifies lineage-specific molecular determinants.

Bhalla, N.; Behera, A. K.; Gupta, A.; Nanda, R.

2024-10-17 genomics
10.1101/2024.10.15.618387 bioRxiv
Show abstract

BackgroundClinical manifestations of tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) show lineage-specific differences contributed by genetic polymorphism such as phylo-single nucleotide variations (PhyloSNPs) and insertion or deletions (INDELs). Intragenomic rearrangement events, such as gene duplications and deletions, may cause gene copy number differences in Mtb, contributing to lineage-specific phenotypic variations, if any, which need better understanding. ResultsThe relative gene copy number differences in high-quality publicly available whole genome sequencing datasets of 9,482 clinical Mtb isolates were determined by repurposing and modifying an RNA-seq data analysis pipeline. The pipeline included various steps, viz., alignment of reads, sorting by coordinate, GC bias correction, and variant stabilising transformation. The strategy showed maximum separation of lineage-specific clusters in two principal components, capturing [~]54% variability. Unsupervised hierarchical clustering of the top 100 genes and pairwise comparisons between Mtb lineages revealed an overlapping subset of genes (n=42) having significantly perturbed copy numbers (Benjamin Hochberg adjusted P-value < 0.05 and log2(drug-resistant/sensitive) > {+/-} 1). These 42 genes formed multiple tandem gene clusters and are known to be involved in virulence, pathogenicity and defence response to invading phages. A separate comparison showed a significantly high copy number of phage genes and a recently reported druggable target Rv1525 in pre- and extensively drug-resistant (Pre-XDR, XDR) compared to drug-sensitive clinical Mtb isolates. ConclusionThe identified gene sets in Mtb clinical isolates may be useful targets for lineage-specific therapeutics and diagnostics development.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Tuberculosis
11 papers in training set
Top 0.1%
41.1%
2
Microbial Genomics
204 papers in training set
Top 0.3%
7.5%
3
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 0.3%
6.6%
50% of probability mass above
4
PLOS ONE
4510 papers in training set
Top 33%
4.5%
5
Scientific Reports
3102 papers in training set
Top 44%
2.7%
6
Microbiology Spectrum
435 papers in training set
Top 2%
2.0%
7
International Journal of Infectious Diseases
126 papers in training set
Top 1%
1.9%
8
Genomics
60 papers in training set
Top 0.9%
1.8%
9
Journal of Infection
71 papers in training set
Top 1%
1.8%
10
Frontiers in Microbiology
375 papers in training set
Top 5%
1.8%
11
EBioMedicine
39 papers in training set
Top 0.3%
1.8%
12
Journal of Clinical Microbiology
120 papers in training set
Top 0.9%
1.8%
13
mSystems
361 papers in training set
Top 5%
1.7%
14
Bioinformatics
1061 papers in training set
Top 7%
1.5%
15
The Journal of Infectious Diseases
182 papers in training set
Top 3%
1.5%
16
Journal of Global Antimicrobial Resistance
15 papers in training set
Top 0.4%
1.4%
17
BMC Genomics
328 papers in training set
Top 3%
1.3%
18
BMC Infectious Diseases
118 papers in training set
Top 4%
1.0%
19
Antimicrobial Agents and Chemotherapy
167 papers in training set
Top 1%
1.0%
20
Clinical Infectious Diseases
231 papers in training set
Top 4%
1.0%
21
BMC Microbiology
35 papers in training set
Top 1%
1.0%
22
Frontiers in Medicine
113 papers in training set
Top 6%
0.8%
23
Clinical Chemistry
22 papers in training set
Top 0.8%
0.7%
24
PeerJ
261 papers in training set
Top 15%
0.7%
25
eLife
5422 papers in training set
Top 60%
0.7%
26
F1000Research
79 papers in training set
Top 6%
0.5%
27
BMC Bioinformatics
383 papers in training set
Top 8%
0.5%
28
The American Journal of Tropical Medicine and Hygiene
60 papers in training set
Top 5%
0.5%