Back

Fragment end motif analysis to distinguish pathogens from contaminants in enriched plasma microbial DNA

Zhang, H.; Dominguez, E. G.; Junak, M.; Murtaza, M.; Pepperell, C. S.; Kisat, M. T.

2025-11-07 intensive care and critical care medicine
10.1101/2025.11.06.25339688 medRxiv
Show abstract

IntroductionDespite its promise, accuracy of microbial cell-free DNA (mDNA) in plasma as a diagnostic tool is hindered by its low abundance and process contaminants. We have previously shown that combining size selection with single-stranded DNA (ssDNA) library preparation increased mDNA yield by 200-fold but also decreased sensitivity for pathogen detection due to higher background noise. A recent study showed that pathogen-derived DNA was enriched for CC dinucleotide at 5 ends compared to contaminants. Since ssDNA libraries preserve sequence motifs at both ends (5 and 3), we hypothesized that analysis of nucleotide motifs at microbial fragment ends in size-selected ssDNA libraries could help differentiate pathogen DNA from background noise. MethodsWe performed deep sequencing on size-selected ssDNA libraries (<110 bp) generated from longitudinal plasma samples of 11 critically-ill patients (5 with culture-proven infections, 20 samples; 6 without infections, 18 samples) and 6 no-template controls (NTCs). For each 2-mer and 1-mer motif, we calculated the ratio between its frequency observed at 5 and 3 fragment ends in sequencing data and its expected frequency in the corresponding reference genome (O/E ratio). We compared enrichment of motifs in pathogen DNA and contaminant DNA fragments. ResultsPathogen-derived mDNA fragments were more biased in O/E end motif ratios compared to contaminants across all 3 groups (NTCs, no-infections and culture-proven infections), at both 5 and 3 fragment ends. Notably, the GG dinucleotide was enriched at the 3 end in pathogens compared to contaminants (P < 0.0001). Combining O/E ratios for C and G nucleotides at the 3 end achieved areas under the receiver operating characteristic curve of >0.98 for distinguishing common contaminants from culture-proven pathogens. ConclusionsPathogen-derived mDNA in size-selected ssDNA libraries is biased at 5 and 3 fragment end compared to contaminants. Incorporating microbial fragment end motif analysis can enhance signal-to-noise ratio and improve pathogen detection and identification in plasma metagenomic sequencing.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Clinical Chemistry
22 papers in training set
Top 0.1%
62.9%
50% of probability mass above
2
Microbiology Spectrum
435 papers in training set
Top 0.7%
3.8%
3
PLOS ONE
4510 papers in training set
Top 37%
3.7%
4
Scientific Reports
3102 papers in training set
Top 47%
2.4%
5
Bioinformatics
1061 papers in training set
Top 7%
1.9%
6
F1000Research
79 papers in training set
Top 2%
1.4%
7
The Journal of Infectious Diseases
182 papers in training set
Top 4%
1.1%
8
Wellcome Open Research
57 papers in training set
Top 1%
1.1%
9
The Journal of Molecular Diagnostics
36 papers in training set
Top 0.4%
0.9%
10
BMC Genomics
328 papers in training set
Top 4%
0.9%
11
Frontiers in Medicine
113 papers in training set
Top 5%
0.9%
12
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.9%
13
Journal of Clinical Virology
62 papers in training set
Top 0.7%
0.8%
14
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
15
Diagnostics
48 papers in training set
Top 2%
0.7%
16
Analytical Chemistry
205 papers in training set
Top 3%
0.7%
17
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
18
Clinical Infectious Diseases
231 papers in training set
Top 5%
0.7%
19
Vaccines
196 papers in training set
Top 3%
0.7%
20
Respiratory Research
19 papers in training set
Top 0.6%
0.7%
21
Nucleic Acids Research
1128 papers in training set
Top 19%
0.7%
22
Oncotarget
15 papers in training set
Top 0.5%
0.7%
23
American Journal of Respiratory Cell and Molecular Biology
38 papers in training set
Top 0.8%
0.7%
24
International Journal of Molecular Sciences
453 papers in training set
Top 17%
0.7%
25
iScience
1063 papers in training set
Top 39%
0.5%