Back

Fragment end motif analysis to distinguish pathogens from contaminants in enriched plasma microbial DNA

Zhang, H.; Dominguez, E. G.; Junak, M.; Murtaza, M.; Pepperell, C. S.; Kisat, M. T.

2025-11-07 intensive care and critical care medicine
10.1101/2025.11.06.25339688
Show abstract

IntroductionDespite its promise, accuracy of microbial cell-free DNA (mDNA) in plasma as a diagnostic tool is hindered by its low abundance and process contaminants. We have previously shown that combining size selection with single-stranded DNA (ssDNA) library preparation increased mDNA yield by 200-fold but also decreased sensitivity for pathogen detection due to higher background noise. A recent study showed that pathogen-derived DNA was enriched for CC dinucleotide at 5 ends compared to contaminants. Since ssDNA libraries preserve sequence motifs at both ends (5 and 3), we hypothesized that analysis of nucleotide motifs at microbial fragment ends in size-selected ssDNA libraries could help differentiate pathogen DNA from background noise. MethodsWe performed deep sequencing on size-selected ssDNA libraries (<110 bp) generated from longitudinal plasma samples of 11 critically-ill patients (5 with culture-proven infections, 20 samples; 6 without infections, 18 samples) and 6 no-template controls (NTCs). For each 2-mer and 1-mer motif, we calculated the ratio between its frequency observed at 5 and 3 fragment ends in sequencing data and its expected frequency in the corresponding reference genome (O/E ratio). We compared enrichment of motifs in pathogen DNA and contaminant DNA fragments. ResultsPathogen-derived mDNA fragments were more biased in O/E end motif ratios compared to contaminants across all 3 groups (NTCs, no-infections and culture-proven infections), at both 5 and 3 fragment ends. Notably, the GG dinucleotide was enriched at the 3 end in pathogens compared to contaminants (P < 0.0001). Combining O/E ratios for C and G nucleotides at the 3 end achieved areas under the receiver operating characteristic curve of >0.98 for distinguishing common contaminants from culture-proven pathogens. ConclusionsPathogen-derived mDNA in size-selected ssDNA libraries is biased at 5 and 3 fragment end compared to contaminants. Incorporating microbial fragment end motif analysis can enhance signal-to-noise ratio and improve pathogen detection and identification in plasma metagenomic sequencing.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Clinical Chemistry
based on 14 papers
Top 0.1%
16.0%
2
Scientific Reports
based on 701 papers
Top 14%
10.6%
3
iScience
based on 74 papers
Top 0.1%
10.6%
4
PLOS ONE
based on 1737 papers
Top 65%
5.5%
5
Science Translational Medicine
based on 40 papers
Top 0.3%
5.5%
6
Genomics, Proteomics & Bioinformatics
based on 10 papers
Top 0.4%
3.1%
50% of probability mass above
7
Frontiers in Medicine
based on 99 papers
Top 7%
2.5%
8
Nature
based on 58 papers
Top 3%
2.4%
9
Frontiers in Molecular Biosciences
based on 10 papers
Top 0.1%
2.4%
10
JCI Insight
based on 63 papers
Top 3%
2.4%
11
Nature Communications
based on 483 papers
Top 28%
1.9%
12
Critical Care
based on 14 papers
Top 0.9%
1.9%
13
The Journal of Infectious Diseases
based on 137 papers
Top 5%
1.9%
14
Journal of Thrombosis and Haemostasis
based on 10 papers
Top 1.0%
1.4%
15
American Journal of Respiratory and Critical Care Medicine
based on 23 papers
Top 1%
1.4%
16
eLife
based on 262 papers
Top 20%
1.4%
17
Cells
based on 14 papers
Top 0.8%
1.4%
18
Bioinformatics
based on 24 papers
Top 1%
1.2%
19
Critical Care Explorations
based on 15 papers
Top 2%
1.2%
20
eBioMedicine
based on 82 papers
Top 6%
0.8%
21
PLOS Computational Biology
based on 141 papers
Top 9%
0.8%
22
The Journal of Molecular Diagnostics
based on 24 papers
Top 2%
0.8%
23
European Respiratory Journal
based on 44 papers
Top 5%
0.8%
24
Journal of Biomedical Informatics
based on 37 papers
Top 5%
0.7%
25
Viruses
based on 79 papers
Top 7%
0.7%
26
Frontiers in Cellular and Infection Microbiology
based on 22 papers
Top 4%
0.7%
27
Microbiology Spectrum
based on 86 papers
Top 3%
0.7%
28
Genome Biology
based on 14 papers
Top 2%
0.7%