Back

Early Lung Cancer Detection Using Nucleotide Transition Probabilities in plasma cell-free DNA

Ji, J.; Xue, R.; Zhang, X.; Yang, M.; Li, L.; Duan, X.; Deng, W.; Yan, R.; Xu, Z.; Pian, C.; Zhao, J.

2025-09-12 oncology
10.1101/2025.09.09.25335450 medRxiv
Show abstract

Lung cancer, the most lethal malignancy globally, urgently requires effective early detection methods. Current non-invasive approaches based on plasma cell-free DNA (cfDNA) fragmentomics are often constrained by limited sensitivity in early-stage patients due to low tumor DNA fraction. To overcome this, we introduce a novel computational feature--First-Order Transition Probability (FOTP)--to decode nucleotide sequential dependencies within cfDNA fragments. Through systematic analysis of 1,036 participants and low-pass whole-genome sequencing, we demonstrate that the first 10 bp at the 5' end harbor the most discriminative information for cancer detection. An SVM model leveraging FOTP achieved an AUC of 0.942, with 73.9% sensitivity for stage I and 81.8% for stage II lung cancer at 95% specificity, significantly outperforming existing fragmentomic features. Furthermore, the method generalized robustly across independent and multi-cancer validation sets, including HCC, CRC, and HNSCC, and exhibited potential for tissue-of-origin identification. These findings are supported by nucleotide frequency stability and entropy patterns beyond the initial 10 bp, reflecting underlying nuclease cleavage biases and chromatin features. This work establishes FOTP as a biologically interpretable and highly efficient feature for pan-cancer early detection, offering a scalable pathway toward population-wide screening programs.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 16%
10.6%
2
Advanced Science
249 papers in training set
Top 1%
10.3%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.7%
6.9%
4
Molecular Cancer
14 papers in training set
Top 0.1%
6.9%
5
eLife
5422 papers in training set
Top 22%
4.0%
6
ACS Nano
99 papers in training set
Top 1%
3.7%
7
iScience
1063 papers in training set
Top 4%
3.7%
8
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.1%
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.7%
50% of probability mass above
10
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 27%
2.1%
11
Communications Biology
886 papers in training set
Top 5%
2.1%
12
Scientific Reports
3102 papers in training set
Top 49%
2.1%
13
Cell Genomics
162 papers in training set
Top 3%
1.9%
14
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
15
Cancer Cell
38 papers in training set
Top 0.9%
1.7%
16
Cell Reports
1338 papers in training set
Top 23%
1.7%
17
National Science Review
22 papers in training set
Top 0.9%
1.7%
18
Cell Reports Medicine
140 papers in training set
Top 5%
1.4%
19
Genome Medicine
154 papers in training set
Top 5%
1.4%
20
Genome Biology
555 papers in training set
Top 5%
1.2%
21
Biosensors and Bioelectronics
52 papers in training set
Top 1%
1.0%
22
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
23
EMBO Molecular Medicine
85 papers in training set
Top 4%
0.8%
24
PLOS ONE
4510 papers in training set
Top 65%
0.8%
25
Small Methods
26 papers in training set
Top 0.9%
0.8%
26
Interface Focus
14 papers in training set
Top 0.3%
0.8%
27
Med
38 papers in training set
Top 0.8%
0.8%
28
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
29
Science Advances
1098 papers in training set
Top 29%
0.8%
30
Cancer Research
116 papers in training set
Top 3%
0.7%