Back

DNA fragment length analysis using machine learning assisted vibrational spectroscopy

Fatayer, R.; Ahmed, W.; Szeto, I.; Sammut, S.-J.; Senthil Murugan, G.

2026-03-02 bioinformatics
10.64898/2026.02.27.708538 bioRxiv
Show abstract

DNA length analysis is essential for genomic workflows including next-generation sequencing and fragmentomics based diagnostics. Conventional approaches typically require large, expensive instrumentation and sample-destructive protocols with long processing times. Here we present a rapid, label-free approach integrating vibrational spectroscopy with deep learning to quantify DNA fragment length distributions. We demonstrate that ATR-FTIR and Raman spectroscopy capture length-dependent spectral features arising from phosphate backbone, nucleobase, and structural vibrations. Machine learning models trained on spectra acquired from purified monodisperse DNA (50-300 bp) predicted DNA length with high accuracy (R2=0.92-0.94), with multimodal fusion improving performance to R2=0.96. A convolutional neural network trained on 35 DNA mixtures comprising molecules of different lengths also successfully deconvoluted their fragment length profile. Transfer learning enabled adaptation to biological samples, achieving low prediction error (RMSE=0.3-7.2%, {Delta}=12 bp). Importantly, the method requires only 4 L sample and 15 minutes passive drying, with no consumables beyond cleaning materials, and allows full sample recovery. This establishes vibrational spectroscopy as a scalable alternative for DNA length quantification.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 7%
17.8%
2
Advanced Science
249 papers in training set
Top 1%
10.3%
3
Scientific Reports
3102 papers in training set
Top 13%
6.9%
4
Nano Letters
63 papers in training set
Top 0.4%
6.5%
5
Analytical Chemistry
205 papers in training set
Top 0.5%
6.4%
6
Nature Biotechnology
147 papers in training set
Top 2%
4.9%
50% of probability mass above
7
Nucleic Acids Research
1128 papers in training set
Top 5%
4.0%
8
PLOS ONE
4510 papers in training set
Top 44%
2.8%
9
ACS Nano
99 papers in training set
Top 1%
2.8%
10
Cell Reports Methods
141 papers in training set
Top 2%
2.1%
11
Bioinformatics
1061 papers in training set
Top 6%
2.1%
12
Genome Medicine
154 papers in training set
Top 5%
1.4%
13
Nature Methods
336 papers in training set
Top 5%
1.2%
14
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.0%
15
mSphere
281 papers in training set
Top 5%
1.0%
16
NAR Molecular Medicine
18 papers in training set
Top 0.2%
0.9%
17
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
18
Communications Biology
886 papers in training set
Top 20%
0.8%
19
Science Advances
1098 papers in training set
Top 29%
0.8%
20
Microbiome
139 papers in training set
Top 3%
0.8%
21
Cell
370 papers in training set
Top 17%
0.7%
22
Cancer Research Communications
46 papers in training set
Top 1%
0.7%
23
Cell Systems
167 papers in training set
Top 13%
0.7%
24
ACS Synthetic Biology
256 papers in training set
Top 4%
0.5%
25
Small Methods
26 papers in training set
Top 1%
0.5%
26
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.5%
27
Chemical Communications
24 papers in training set
Top 2%
0.5%
28
ACS Applied Materials & Interfaces
39 papers in training set
Top 1%
0.5%
29
Communications Chemistry
39 papers in training set
Top 2%
0.5%
30
npj Digital Medicine
97 papers in training set
Top 4%
0.5%