Back

Long-read sequencing of Mycobacterial tuberculosis is comparable to short-read sequencing for antimicrobial resistance prediction and epidemiological studies.

Colpus, M.; Baker, C. S.; Roghi, E.; Hong, H. N.; Trieu, P. P.; Thu, D. D. A.; Hall, A.; Fowler, P. W.; Walker, T. M.; Spies, R.; Webster, H.; Westhead, J.; Thai, H.; Turner, R. D.; Peto, T. E.; Quang, N. L.; Thuong, N. T. T.; Omar, S. V.; Crook, D. W.

2026-04-08 microbiology

10.64898/2026.04.08.717216 bioRxiv

Show abstract

BackgroundShort-read genetic sequencing technologies (mainly Illumina) have been extensively used for around a decade for Mycobacterium tuberculosis complex (MTBC) outbreak analysis and genomic drug susceptibility testing (gDST) with the result that Illumina has become the de facto gold standard. Long-read sequencing, as exemplified by Oxford Nanopore Technologies (ONT), offer the prospect of faster, simpler, and portable sequencing. In this work, we carry out the largest to date comparison of how well Illumina and ONT technologies sequence MTBC samples, making use of R10.4 ONT flowcells, updated basecalling models and deep-learning variant calling. MethodsA total of 508 samples were sequenced using both short and long-read platforms. All samples originated from South Africa or Vietnam and were over-selected for drug resistance and also included several local outbreaks and a range of lineages. The South African and Vietnamese samples had already been Illumina sequenced. Samples with [≥]50 read depth by Illumina were selected for sequencing by ONT using one of the GridION or PromethION platforms. Bioinformatics processing was done using a modified online cloud platform which included reference-based variant calling, catalogue-based gDST and identified related samples via SNP counting to inform outbreak detection. The lineages and gDST predictions obtained by short-and long-sequencing were compared for all samples as were all putative clusters identified via SNP counting. For convenience Illumina was used as the reference method. FindingsOf the 508 samples, 425 (83.7%) had sufficient read depths to permit comparison between the two sequencing technologies. The assigned lineages were identical for 407/425 (95.8%) samples and all discordances were due to mixed lineages being identified by one technology. Evidence of non-tuberculous mycobacterium (NTM) subpopulations were found in nine samples. Using Illumina as the reference method, the very major error (VME) rate of ONT for predicting resistance to all 15 drugs is 1.0% (0.6-1.5%) whilst the major error (ME) rate is 1.7% (1.3-2.2%) with an unclassified rate of 6.9% (6.3-7.5%). This is below the thresholds specified by the CLSI. Considering each of the 15 drugs individually they had VME and ME point estimates below [≤]3% in 29/30 cases; and most 25/30 below [≤]1.5%. Filtering out all samples containing mixtures left 382 isolates. By appropriate masking of the reference genome we were able to obtain a mean SNP distance between the two platforms of 0.13 (median of zero) for the same sample and for 376/382 samples (98.4%, CI:96.6-99.4%) the difference was [≤]1 SNPs. The high concordance in SNP identification ensured that few differences in the 43 putative clusters among 172 isolates were observed. InterpretationThe differences between the two sequencing platforms for the key clinical outputs is so small that it is now within the tolerances set by regulatory agencies. Provided the sequencing is of sufficient quality, we have therefore reached a threshold whereby sequencing data from long-and short-read platforms can be aggregated. This will enable large scale analyses by national and international public health agencies whilst allowing the MTBC community to take advantage of the portability and speed of long-read sequencing. FundingThe NIHR Health Protection Research Unit: Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford (NIHR200915), a partnership between the UK Health Security Agency (UKHSA) and the University of Oxford, the National Institute for Health and Care Research Biomedical Research Centre: Oxford (BRC) and the Ellison Institute of Technology, Oxford Ltd. The CRyPTIC project was funded by Wellcome [214560/Z/18/Z], a Wellcome Trust/Newton Fund-MRC Collaborative Award (200205/Z/15/Z); and the Bill & Melinda Gates Foundation Trust (OPP1133541). Research in contextO_ST_ABSEvidence before this studyC_ST_ABSWe conducted a PubMed Central full text search for "tuberculosis" AND ("drug resistance prediction" OR "drug susceptibility prediction") AND ("genome" OR "genomic" OR "geno-typic") AND ("ont" OR "oxford nanopore") between 2022 and 2026 (conducted 1 April 2026). This returned 62 papers; of which, six used both Illumina and ONT sequencing. One of these, published in 2023, directly compared the performance of the two platforms on 151 M. tuberculosis isolates oversampled for resistance. The investigation yielded comparative results for the earlier generation ONT flow cell (R9{middle dot}4{middle dot}1) and base-caller (guppy version 5{middle dot}0{middle dot}16). Another, published in 2026, investigated a targeted next-generation sequencing panel of 20 amplicons using ONT sequencing on R10.4.1 flow cells with guppy 6{middle dot}4{middle dot}6. They compared the results on 71 isolates against phenotypic data and Illumina whole genome sequencing (for 53 isolates) but had low rates of resistance, with all drugs but isoniazid being limited to under five resistant isolates. Two other small studies (10 and 13 samples, respectively) conducted feasibility studies comparing ONT with Illumina, also using earlier generation flow cells and base-calling technology from ONT. Two further studies compared Illumina with ONT for direct sputum sequencing and did not investigate the comparative performance of the two platforms for variant call accuracy, resistance prediction, and outbreak detection. Illumina sequencing technology is widely used for genomic sequence analysis in research, and clinical and public health contexts. Consequently, it has become the de facto reference standard for generating whole genome sequence data. Whilst previous studies established the promise and limitations of long-read (ONT) sequencing as an alternative to short-read sequencing (mainly Illumina), the enhanced performance arising from newer flowcells (e.g. R10.4.1), V14 chemistry, and the latest basecallers (dorado v4.3.0/5.0.0) has not been analysed. Neither has any ONT analysis incorporating the new deep-learning variant callers been evaluated in a large-scale comparative study. Thus, it is currently unclear whether data generated by either platform can be used safely in aggregated analyses for research and clinical or public health service. Added value of this studyWe compared how well short-(Illumina) and long-read (ONT) sequencing platforms identify the genetic variants in M. tuberculosis, predict antituberculous drug resistance and recog-nise outbreaks. The long-reads were generated using the latest generation ONT R10.4.1 flows cells, V14 chemistry, super high accuracy basecalling (dorado v4.3.0/5.0.0) and a bioinformatics analysis pipeline built using the Clair3 deep-learning based variant caller. A total of 508 clinical samples were sequenced using both technologies, substantially more than previous studies. The sampling frame was much larger than previously investigations and included a large proportion of isolates with resistance to first-line and second-line antibiotics as well as bedaquiline. Thus, providing greater statistical power for resistance prediction than before. In particular, the inclusion of bedaquiline resistance provided evidence useful for predicting resistance to this newly deployed drug for treating multi-drug resistant (MDR) TB. We find that the differences between technologies are small meaning that either technology can be used alone safely, and services using both technologies can confidently aggregate the data for analysis. Implications of all the available evidenceThis will be a benefit to local, regional and international organisations, particularly public health agencies, which often have a mix of the two main sequencing technologies for characterising TB whole genome sequences. It also opens up the sequence based diagnostic market to greater competition, particularly if the observed performance can be replicated for other pathogen species.

Long-read sequencing of Mycobacterial tuberculosis is comparable to short-read sequencing for antimicrobial resistance prediction and epidemiological studies.

Matching journals