Back

Evaluation of MeaSeq: comprehensive analysis and reporting of measles virus whole genome sequences.

Hole, D. T.; Abdalla, A.; Zubach, V.; Pratt, M.; Van Driel, S.; Ashfaq, S.; Hiebert, J.; Duggan, A. T.

2026-05-14 bioinformatics
10.64898/2026.05.12.724559 bioRxiv
Show abstract

Although vaccine-preventable, measles virus (MeV) continues to pose a significant public health challenge, with a substantial resurgence of cases worldwide. As whole-genome sequencing (WGS) becomes increasingly affordable and routinely adopted in public health laboratories, reliable and accessible analysis of next-generation sequencing (NGS) data is critical for outbreak investigation and molecular surveillance. Here, we present MeaSeq, a fast, user-friendly, open-source bioinformatics pipeline for MeV analysis using Illumina or Oxford Nanopore Technologies (ONT) NGS data. MeaSeq performs quality control assessments, consensus genome assembly and variant detection, optional genotype-specific reference selection, Distinct Sequence Identifier (DSId) assignment via user-provided databases or hashing, sub-consensus variant visualization, genome quality assessment, and standardized HTML reporting. We compared the performance of MeaSeq on NGS data generated from multiple sequencing platforms and targeted enrichment strategies against gold-standard Sanger data, reference genomes, and publicly available comparative data. This validation demonstrates that MeaSeq provides an accurate, reproducible, and accessible solution for routine MeV WGS analysis, supporting genomic surveillance and outbreak response workflows in public health and research settings. Impact StatementThe recent surge in measles cases worldwide, causing several countries to lose their measles elimination status, underscores the urgent need for effective and accessible genomic surveillance. Our manuscript introduces MeaSeq, a comprehensive and open-source bioinformatics pipeline specifically designed for analyzing MeV NGS data. MeaSeq includes MeV specific analyses such as genotype prediction from sequencing reads with optional genotype-specific reference selection; DSId assignment; quality control checks such as genome rule-of-six divisibility and gene CDS validation; subconsensus nucleotide analysis with mixed-site highlighting; and genomic plotting. By leveraging NGS technology, our pipeline can facilitate the identification of transmission chains and may provide critical insights into the dynamics of MeV outbreaks. This information is essential for public health officials and researchers to implement targeted interventions and optimize vaccine strategies. Additionally, the open-source nature of MeaSeq fosters collaboration and innovation within the scientific measles community along with providing access to a wider range of researchers. Data SummaryThe MeaSeq pipeline code is available on GitHub (https://github.com/phac-nml/measeq). Comparative datasets of publicly available WGS data were accessed through the NCBI Sequence Read Archive under the following BioProjects: PRJNA869081 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA869081) PRJNA480551 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA480551) PRJNA1017431 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1017431) PRJNA1241325 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1241325) PRJNA1174053 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1174053) PRJNA1293457 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1293457) PRJNA843031 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA843031) Whole-genome sequences were included in the validation analysis if they consisted of paired-end data (Illumina) and achieved [≥]95% genome completeness following trimming of the 5' and 3' untranslated regions (UTRs). This criterion ensured sufficient genome coverage for robust validation while allowing for limited missing data arising from regions of low sequencing depth or amplicon dropout. A complete list of sequences included in the validation, along with their accession numbers, is provided in Supplementary Table 1.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Scientific Data
174 papers in training set
Top 0.1%
21.7%
2
Nature Communications
4913 papers in training set
Top 16%
11.9%
3
GigaScience
172 papers in training set
Top 0.1%
7.9%
4
BMC Bioinformatics
383 papers in training set
Top 2%
4.1%
5
Bioinformatics
1061 papers in training set
Top 6%
3.5%
6
Nucleic Acids Research
1128 papers in training set
Top 6%
3.5%
50% of probability mass above
7
Viruses
318 papers in training set
Top 2%
3.1%
8
PLOS ONE
4510 papers in training set
Top 43%
2.8%
9
PeerJ
261 papers in training set
Top 4%
2.5%
10
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.4%
11
BMC Genomics
328 papers in training set
Top 1%
2.3%
12
Genome Biology
555 papers in training set
Top 4%
2.0%
13
Microbial Genomics
204 papers in training set
Top 1%
1.8%
14
mSystems
361 papers in training set
Top 5%
1.7%
15
Nature Methods
336 papers in training set
Top 5%
1.6%
16
Scientific Reports
3102 papers in training set
Top 60%
1.6%
17
Genome Medicine
154 papers in training set
Top 5%
1.6%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.3%
19
Gigabyte
60 papers in training set
Top 0.9%
1.3%
20
Genome Research
409 papers in training set
Top 3%
1.3%
21
Bioinformatics Advances
184 papers in training set
Top 4%
1.1%
22
Journal of Clinical Microbiology
120 papers in training set
Top 1%
0.9%
23
iScience
1063 papers in training set
Top 28%
0.9%
24
Wellcome Open Research
57 papers in training set
Top 2%
0.9%
25
eLife
5422 papers in training set
Top 57%
0.8%
26
Database
51 papers in training set
Top 1%
0.7%
27
Nature Biotechnology
147 papers in training set
Top 8%
0.7%
28
Genomics
60 papers in training set
Top 3%
0.7%
29
Diagnostic Microbiology and Infectious Disease
21 papers in training set
Top 0.3%
0.6%
30
Biology Methods and Protocols
53 papers in training set
Top 3%
0.6%