Fragment end motif analysis to distinguish pathogens from contaminants in enriched plasma microbial DNA
Zhang, H.; Dominguez, E. G.; Junak, M.; Murtaza, M.; Pepperell, C. S.; Kisat, M. T.
Show abstract
IntroductionDespite its promise, accuracy of microbial cell-free DNA (mDNA) in plasma as a diagnostic tool is hindered by its low abundance and process contaminants. We have previously shown that combining size selection with single-stranded DNA (ssDNA) library preparation increased mDNA yield by 200-fold but also decreased sensitivity for pathogen detection due to higher background noise. A recent study showed that pathogen-derived DNA was enriched for CC dinucleotide at 5 ends compared to contaminants. Since ssDNA libraries preserve sequence motifs at both ends (5 and 3), we hypothesized that analysis of nucleotide motifs at microbial fragment ends in size-selected ssDNA libraries could help differentiate pathogen DNA from background noise. MethodsWe performed deep sequencing on size-selected ssDNA libraries (<110 bp) generated from longitudinal plasma samples of 11 critically-ill patients (5 with culture-proven infections, 20 samples; 6 without infections, 18 samples) and 6 no-template controls (NTCs). For each 2-mer and 1-mer motif, we calculated the ratio between its frequency observed at 5 and 3 fragment ends in sequencing data and its expected frequency in the corresponding reference genome (O/E ratio). We compared enrichment of motifs in pathogen DNA and contaminant DNA fragments. ResultsPathogen-derived mDNA fragments were more biased in O/E end motif ratios compared to contaminants across all 3 groups (NTCs, no-infections and culture-proven infections), at both 5 and 3 fragment ends. Notably, the GG dinucleotide was enriched at the 3 end in pathogens compared to contaminants (P < 0.0001). Combining O/E ratios for C and G nucleotides at the 3 end achieved areas under the receiver operating characteristic curve of >0.98 for distinguishing common contaminants from culture-proven pathogens. ConclusionsPathogen-derived mDNA in size-selected ssDNA libraries is biased at 5 and 3 fragment end compared to contaminants. Incorporating microbial fragment end motif analysis can enhance signal-to-noise ratio and improve pathogen detection and identification in plasma metagenomic sequencing.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.