Back

cfMIND: A read-level methylation framework for accurate non- invasive disease detection using cell-free DNA

Li, J.; Liu, Z.; Zhang, H.; Zhang, Y.; Li, W.; Li, Y.

2026-05-16 bioinformatics
10.64898/2026.05.13.725033 bioRxiv
Show abstract

Plasma cell-free DNA (cfDNA) emerged as a promising non-invasive biomarker for cancers. However, reliable detection remains challenging due to the low abundance of tumor-derived cfDNA fragments and the dilution of informative methylation signals when aggregated into region-level features. Here, we propose a novel approach cfMIND, an efficient and robust machine-learning framework that leverages stratified read-level methylation signals to preserve rare cell-type-specific information and enhance detection sensitivity. By avoiding information loss inherent to conventional aggregation strategies, cfMIND enables more accurate and stable inference across diverse conditions. cfMIND is compatible with various cfDNA methylation sequencing technologies and cancer types. Across multiple cancer datasets (n = 868), cfMIND achieves high performance (AUROC up to 0.966) and maintains strong accuracy even at ultra-low sequencing depth (0.2x) and in early-stage cancers. Notably, cfMIND demonstrates exceptional robustness, generalizing effectively across cohorts and platforms without the need for model retraining. These results highlight its potential utility in heterogeneous experimental and clinically relevant settings. Beyond cancer detection, cfMIND is readily extendable to non-malignant diseases, as demonstrated by its ability to capture disease-associated methylation alterations in amyotrophic lateral sclerosis (ALS). Functional investigations on cfMIND-identified features further reveal enrichment in key regulatory regions implicated in disease pathogenesis and recapitulate tissue- and single-cell-level methylation and transcriptional programs underlying tumor biology. Collectively, cfMIND represents a significant advancement in the field, offering a broadly applicable, functionally interpretable, and high-resolution framework for non-invasive disease detection.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 7%
18.2%
2
Genome Medicine
154 papers in training set
Top 0.2%
17.2%
3
Advanced Science
249 papers in training set
Top 0.6%
14.4%
4
Nature Biotechnology
147 papers in training set
Top 0.9%
9.0%
50% of probability mass above
5
Nucleic Acids Research
1128 papers in training set
Top 3%
6.2%
6
Genome Biology
555 papers in training set
Top 3%
3.5%
7
Bioinformatics
1061 papers in training set
Top 6%
3.5%
8
Nature Machine Intelligence
61 papers in training set
Top 2%
2.0%
9
Nature Biomedical Engineering
42 papers in training set
Top 0.7%
1.8%
10
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
11
Nature Methods
336 papers in training set
Top 5%
1.7%
12
Cell Reports Methods
141 papers in training set
Top 3%
1.5%
13
Communications Biology
886 papers in training set
Top 11%
1.5%
14
Scientific Reports
3102 papers in training set
Top 69%
0.9%
15
Cell Systems
167 papers in training set
Top 11%
0.9%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 42%
0.9%
17
Cell Reports Medicine
140 papers in training set
Top 7%
0.9%
18
Genome Research
409 papers in training set
Top 4%
0.9%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
20
Science Advances
1098 papers in training set
Top 29%
0.8%
21
PLOS ONE
4510 papers in training set
Top 71%
0.6%
22
Cell Genomics
162 papers in training set
Top 8%
0.6%