Back

Pathotypr: harmonised MTBC lineage assignment and resistance-associated variant detection for genomic surveillance

Ruiz-Rodriguez, P.; Coscolla, M.

2026-03-27 genomics
10.64898/2026.03.24.714002 bioRxiv
Show abstract

BACKGROUNDRapid, interoperable whole-genome tools for Mycobacterium tuberculosis complex (MTBC) surveillance remain limited for harmonised lineage assignment across recognised lineages and simultaneous resistance-associated variant detection. AIMTo develop and validate Pathotypr, an alignment-free tool for harmonised MTBC lineage assignment and resistance genotyping from assemblies and raw reads. METHODSWe reconstructed an MTBC phylogeny from 26,813 genomes using 609,003 polymorphic sites, derived an updated lineage marker backbone, and implemented a k-mer/Random Forest framework with marker-based lineage and WHO catalogue-based resistance calling. Performance was evaluated on 498 RefSeq assemblies, 88,071 UShER-TB typed sequencing samples, 162 clinical read sets for closest-reference matching, and 7,148 CRyPTIC isolates with phenotypic drug susceptibility data. RESULTSPathotypr supported all 14 currently recognised MTBC lineages (L1-L10, A1-A4). On 498 complete genomes, marker-based and alignment-free lineage calls were 100% concordant, and prediction accuracy remained 100% on 254 independent assemblies. In 88,071 non-ambiguous UShER-TB samples, root-lineage concordance with TB-Profiler was 100%, while Pathotypr additionally identified lineage 10, A1 and A2. Resistance predictions showed 85.0% genotype-phenotype concordance overall, with high performance for rifampicin (95.8% sensitivity, 95.0% specificity) and isoniazid (93.0%, 97.9%). Runtime was about 1 second per sample, enabling analysis of 88,071 samples in approximately 24 hours on four threads. In the MDR-enriched CRyPTIC collection, Pathotypr supported reconstruction of 135 probable introduction events into Germany, Italy and Ukraine; 33.7% of introduction-associated isolates carried MDR/pre-XDR genotypes. CONCLUSIONPathotypr enables rapid, harmonised MTBC lineage assignment and high-confidence resistance screening, supporting near real-time and cross-border tuberculosis surveillance.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 3%
22.6%
2
Genome Medicine
154 papers in training set
Top 0.6%
8.4%
3
Thorax
32 papers in training set
Top 0.1%
6.8%
4
Clinical Infectious Diseases
231 papers in training set
Top 0.7%
6.4%
5
PLOS Medicine
98 papers in training set
Top 0.9%
4.0%
6
Bioinformatics
1061 papers in training set
Top 5%
3.6%
50% of probability mass above
7
Journal of Infection
71 papers in training set
Top 0.6%
3.1%
8
The Journal of Infectious Diseases
182 papers in training set
Top 1%
3.1%
9
EBioMedicine
39 papers in training set
Top 0.1%
2.6%
10
Nucleic Acids Research
1128 papers in training set
Top 8%
2.6%
11
The Lancet Microbe
43 papers in training set
Top 0.4%
2.4%
12
Scientific Reports
3102 papers in training set
Top 50%
2.1%
13
BMC Infectious Diseases
118 papers in training set
Top 3%
1.7%
14
Journal of Clinical Microbiology
120 papers in training set
Top 1%
1.7%
15
The Lancet Infectious Diseases
71 papers in training set
Top 2%
1.5%
16
European Respiratory Journal
54 papers in training set
Top 1%
1.3%
17
Open Forum Infectious Diseases
134 papers in training set
Top 2%
1.2%
18
Microbial Genomics
204 papers in training set
Top 2%
1.1%
19
PLOS ONE
4510 papers in training set
Top 61%
1.1%
20
Antimicrobial Agents and Chemotherapy
167 papers in training set
Top 1%
1.0%
21
The Lancet Regional Health - Americas
22 papers in training set
Top 0.2%
0.9%
22
The Lancet Digital Health
25 papers in training set
Top 1%
0.8%
23
Nature Medicine
117 papers in training set
Top 5%
0.7%
24
New England Journal of Medicine
50 papers in training set
Top 0.9%
0.7%
25
BMC Genomics
328 papers in training set
Top 6%
0.7%
26
The Lancet Respiratory Medicine
17 papers in training set
Top 0.3%
0.7%
27
Nature Microbiology
133 papers in training set
Top 5%
0.7%
28
American Journal of Respiratory and Critical Care Medicine
39 papers in training set
Top 0.9%
0.7%
29
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%
30
Communications Biology
886 papers in training set
Top 29%
0.6%