Back

Channel Capacity of Genome-Wide Cell-Free DNA Fragment Length Distribution in Colorectal Cancer

Matov, A.

2024-07-18 oncology
10.1101/2024.07.17.24310568
Show abstract

IntroductionEach piece of cell-free DNA (cfDNA) has a length determined by the exact metabolic conditions in the cell it belonged to at the time of cell death. The changes in cellular regulation leading to a variety of patterns, which are based on the different number of fragments with lengths up to several hundred base pairs (bp) at each of the almost three billion genomic positions, allow for the detection of disease and also the precise identification of the tissue of their origin. MethodsA Kullback-Leibler (KL) divergence computation identifies different fragment lengths and areas of the human genome, depending on the stage, for which disease samples, starting from pre-clinical disease stages, diverge from healthy individual samples. We provide examples of genes related to colorectal cancer (CRC), which our algorithm detected to belong to divergent genomic bins. The staging of CRC can be viewed as a Markov chain and that provides a framework for studying disease progression and the types of epigenetic changes occurring longitudinally at each stage, which might aid the correct classification of a new hospital sample. ResultsIn a new look to treat such data as grayscale value images, pattern recognition using artificial intelligence could be one approach to classification. In CRC, Stage I disease does not, for the most part, shed any tumor in circulation, making detection difficult for established machine learning (ML) methods. This leads to the deduction that early detection, where we can only rely on changes in the metabolic patterns, can be accomplished when the information is considered in its entirety, for example by applying computer vision methods. ConclusionsLongitudinal analysis of patients genetic datasets can detect the early stages of neoplasm better than population-based methods.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Scientific Reports
based on 701 papers
Top 2%
18.2%
2
PLOS ONE
based on 1737 papers
Top 54%
8.1%
3
Briefings in Bioinformatics
based on 11 papers
Top 0.1%
4.8%
4
Genomics, Proteomics & Bioinformatics
based on 10 papers
Top 0.2%
4.8%
5
PeerJ
based on 46 papers
Top 0.5%
4.8%
6
Cancers
based on 57 papers
Top 3%
4.8%
7
Frontiers in Oncology
based on 34 papers
Top 2%
3.2%
8
Frontiers in Genetics
based on 32 papers
Top 0.8%
3.0%
50% of probability mass above
9
PLOS Computational Biology
based on 141 papers
Top 4%
3.0%
10
iScience
based on 74 papers
Top 2%
2.5%
11
Aging
based on 18 papers
Top 1.0%
2.5%
12
International Journal of Molecular Sciences
based on 39 papers
Top 1%
1.9%
13
Computers in Biology and Medicine
based on 39 papers
Top 4%
1.7%
14
npj Precision Oncology
based on 14 papers
Top 2%
1.7%
15
Nature Communications
based on 483 papers
Top 32%
1.4%
16
Heliyon
based on 57 papers
Top 6%
1.4%
17
Clinical Cancer Research
based on 22 papers
Top 3%
1.4%
18
JCO Clinical Cancer Informatics
based on 14 papers
Top 3%
1.3%
19
eLife
based on 262 papers
Top 23%
1.3%
20
Frontiers in Immunology
based on 140 papers
Top 6%
1.3%
21
Diagnostics
based on 36 papers
Top 4%
1.3%
22
EMBO Molecular Medicine
based on 15 papers
Top 1%
1.3%
23
Biology Methods and Protocols
based on 19 papers
Top 2%
0.9%
24
mSystems
based on 16 papers
Top 2%
0.9%
25
British Journal of Cancer
based on 22 papers
Top 3%
0.9%
26
Cancer Medicine
based on 17 papers
Top 3%
0.9%
27
Mathematical Biosciences and Engineering
based on 14 papers
Top 2%
0.7%
28
The Lancet Digital Health
based on 25 papers
Top 5%
0.7%
29
JCO Precision Oncology
based on 11 papers
Top 3%
0.7%
30
Communications Biology
based on 36 papers
Top 5%
0.7%