Channel Capacity of Genome-Wide Cell-Free DNA Fragment Length Distribution in Colorectal Cancer
Matov, A.
Show abstract
IntroductionEach piece of cell-free DNA (cfDNA) has a length determined by the exact metabolic conditions in the cell it belonged to at the time of cell death. The changes in cellular regulation leading to a variety of patterns, which are based on the different number of fragments with lengths up to several hundred base pairs (bp) at each of the almost three billion genomic positions, allow for the detection of disease and also the precise identification of the tissue of their origin. MethodsA Kullback-Leibler (KL) divergence computation identifies different fragment lengths and areas of the human genome, depending on the stage, for which disease samples, starting from pre-clinical disease stages, diverge from healthy individual samples. We provide examples of genes related to colorectal cancer (CRC), which our algorithm detected to belong to divergent genomic bins. The staging of CRC can be viewed as a Markov chain and that provides a framework for studying disease progression and the types of epigenetic changes occurring longitudinally at each stage, which might aid the correct classification of a new hospital sample. ResultsIn a new look to treat such data as grayscale value images, pattern recognition using artificial intelligence could be one approach to classification. In CRC, Stage I disease does not, for the most part, shed any tumor in circulation, making detection difficult for established machine learning (ML) methods. This leads to the deduction that early detection, where we can only rely on changes in the metabolic patterns, can be accomplished when the information is considered in its entirety, for example by applying computer vision methods. ConclusionsLongitudinal analysis of patients genetic datasets can detect the early stages of neoplasm better than population-based methods.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.