Back

A generalisable approach to drug susceptibility prediction for M. Tuberculosis using machine learning and whole-genome sequencing

The CRyPTIC consortium, ; Lachapelle, A. S.

2021-09-16 microbiology
10.1101/2021.09.14.458035 bioRxiv
Show abstract

There remains a clinical need for better approaches to rapid drug susceptibility testing in view of the increasing burden of multidrug resistant tuberculosis. Binary susceptibility phenotypes only capture changes in minimum inhibitory concentration when these cross the critical concentration, even though other changes may be clinically relevant. We developed a machine learning system to predict minimum inhibitory concentration from unassembled whole-genome sequencing data for 13 anti-tuberculosis drugs. We trained, validated and tested the system on 10,859 isolates from the CRyPTIC dataset. Essential agreement rates (predicted MIC within one doubling dilution of observed MIC) were above 92% for first-line drugs, 91% for fluoroquinolones and aminoglycosides, and 90% for new and repurposed drugs, albeit with a significant drop in performance for the very few phenotypically resistant isolates in the latter group. To further validate the model in the absence of external MIC datasets, we predicted MIC and converted values to binary for an external set of 15,239 isolates with binary phenotypes, and compare their performance against a previously validated mutation catalogue, the expected performance of existing molecular assays, and World Health Organization Target Product Profiles. The sensitivity of the model on the external dataset was greater than 90% for all drugs except ethionamide, clofazimine and linezolid. Specificity was greater than 95% for all drugs except ethambutol, ethionamide, bedaquiline, delamanid and clofazimine. The proposed system can provide quantitative susceptibility phenotyping to help guide antimicrobial therapy, although further data collection and validation are required before machine learning can be used clinically for all drugs.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Journal of Clinical Microbiology
120 papers in training set
Top 0.1%
58.7%
50% of probability mass above
2
Genome Medicine
154 papers in training set
Top 2%
3.6%
3
Antimicrobial Agents and Chemotherapy
167 papers in training set
Top 0.6%
3.6%
4
European Respiratory Journal
54 papers in training set
Top 0.5%
3.6%
5
Nature Communications
4913 papers in training set
Top 43%
2.9%
6
Scientific Reports
3102 papers in training set
Top 51%
2.1%
7
JAC-Antimicrobial Resistance
13 papers in training set
Top 0.2%
2.1%
8
The Journal of Infectious Diseases
182 papers in training set
Top 3%
1.7%
9
American Journal of Respiratory and Critical Care Medicine
39 papers in training set
Top 0.5%
1.7%
10
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
11
eLife
5422 papers in training set
Top 47%
1.3%
12
Journal of Infection
71 papers in training set
Top 2%
1.2%
13
Microbiology Spectrum
435 papers in training set
Top 4%
1.2%
14
PLOS ONE
4510 papers in training set
Top 63%
0.9%
15
Clinical Chemistry
22 papers in training set
Top 0.7%
0.9%
16
mBio
750 papers in training set
Top 10%
0.9%
17
Clinical Infectious Diseases
231 papers in training set
Top 4%
0.9%
18
PLOS Pathogens
721 papers in training set
Top 9%
0.7%
19
The Lancet Microbe
43 papers in training set
Top 1%
0.7%
20
Cell Systems
167 papers in training set
Top 14%
0.6%
21
ERJ Open Research
44 papers in training set
Top 1.0%
0.6%
22
Genomics
60 papers in training set
Top 3%
0.6%
23
BMC Genomics
328 papers in training set
Top 7%
0.6%
24
ACS Infectious Diseases
74 papers in training set
Top 2%
0.6%