Back

CMS: Achieving Uniform and High-Quality Sequencing across Challenging Non-canonical Genomic Regions

Li, Q.; Liu, L.; Lin, Q.; Dan, X.; Jiang, Y.; Wei, Y.; Yang, M.; Peng, X.; Luo, W.; Wang, W.; Xu, D.; Huang, Z.; Sun, W.; Zhao, L.; Yan, Q.; Sun, L.; Feng, B.

2026-04-28 genomics
10.64898/2026.04.24.720553 bioRxiv
Show abstract

High-throughput sequencing is essential in modern biological research, yet low-complexity sequences remain challenging as they form structurally complex, non-canonical (non-B) DNA conformations that impede sequencing enzyme read-through. This leads to a long-standing trade-off: maximizing coverage introduces false positives (FP), while stringent filtering causes coverage loss and false negatives (FN). To address this, we developed CMS (Cross Mountains and Seas) on GeneMind sequencing platforms by optimizing its chemistry and enzymatic systems to traverse these secondary structures with high fidelity. Benchmarking across whole-genome (WGS) and whole-exome (WES) sequencing demonstrates that CMS addresses the trade-off by simultaneously enhancing both coverage uniformity and accuracy, notably achieving an approximately 100-fold reduction in low-coverage bins for WGS and a 70% reduction in FN insertions/deletions (INDELs) within complex non-B regions. Specifically, a synthetic G-quadruplex (G4) motif sequencing experiment demonstrates that CMS maintains a 1:1 strand ratio, effectively handling G4-induced biases where benchmarked platforms exhibit extensive depletion. These findings establish CMS as a reliable technology for the precise characterization of structural-challenging but functional-essential genome regions.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Nature Biotechnology
147 papers in training set
Top 0.1%
39.9%
2
Nature Communications
4913 papers in training set
Top 10%
14.5%
50% of probability mass above
3
Nature Methods
336 papers in training set
Top 2%
6.4%
4
Genome Biology
555 papers in training set
Top 2%
4.0%
5
Science
429 papers in training set
Top 9%
3.6%
6
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
7
Genome Research
409 papers in training set
Top 2%
2.1%
8
Genome Medicine
154 papers in training set
Top 4%
1.9%
9
Cell
370 papers in training set
Top 11%
1.7%
10
Cell Genomics
162 papers in training set
Top 3%
1.7%
11
Cell Systems
167 papers in training set
Top 9%
1.2%
12
Cell Reports Methods
141 papers in training set
Top 3%
1.2%
13
Nature Machine Intelligence
61 papers in training set
Top 3%
1.0%
14
Nature
575 papers in training set
Top 14%
1.0%
15
Nature Computational Science
50 papers in training set
Top 1%
0.9%
16
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.8%
17
Nature Genetics
240 papers in training set
Top 7%
0.8%
18
PLOS ONE
4510 papers in training set
Top 67%
0.8%
19
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 46%
0.7%
20
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
21
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 12%
0.5%
23
eLife
5422 papers in training set
Top 64%
0.5%
24
Bioinformatics
1061 papers in training set
Top 11%
0.5%
25
Nature Biomedical Engineering
42 papers in training set
Top 3%
0.5%
26
Communications Biology
886 papers in training set
Top 32%
0.5%
27
iScience
1063 papers in training set
Top 40%
0.5%