Back

Metagenomic strain-resolved DNA modification patterns link extrachromosomal genetic elements to host strains

Wang, S.; Guitor, A. K.; Valentin-Alvarado, L. E.; Garner, R.; Zhang, P.; Yan, M.; Shi, L.-D.; Schoelmerich, M. C.; Steininger, H. M.; Portik, D. M.; Zhang, S.; Wilkinson, J. E.; Lynch, S.; Morowitz, M. J.; Hess, M.; Diamond, S.; Banfield, J. F.; Sachdeva, R.

2026-03-28 microbiology
10.64898/2026.03.27.714056 bioRxiv
Show abstract

DNA modification is central to microbial defense against extrachromosomal genetic elements (ECEs), consequently ECEs tend to adopt their hosts modification patterns. Shared ECE-host modification patterns enable linking ECEs to their hosts, but modification detection tools are designed for single genomes and are ineffective at metagenome scale. Here, we present MODIFI, software for detecting DNA modifications in metagenomes. MODIFI assumes that each k-mer in a metagenome is mostly unmodified and calculates background signal levels for that k-mer from PacBio HiFi reads, eliminating the need for matched control experiments. MODIFI ECE-host linkages were validated using >1,000 isolate and mock microbiome datasets. Illustrating the approach, we identified 315 strain-resolved, non-redundant ECE-host linkages in environmental and human metagenomes. In infant gut microbiomes, a chromosomal inversion in Enterococcus faecalis alters host and associated plasmid methylation motifs simultaneously. Overall, MODIFI solves a major bottleneck in DNA modification analysis and provides a foundational tool for understanding microbial epigenomics.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Microbiome
139 papers in training set
Top 0.1%
16.7%
2
Nature Communications
4913 papers in training set
Top 24%
8.0%
3
Nucleic Acids Research
1128 papers in training set
Top 4%
6.1%
4
mSystems
361 papers in training set
Top 2%
6.1%
5
Nature Microbiology
133 papers in training set
Top 0.4%
6.1%
6
Nature Methods
336 papers in training set
Top 2%
6.0%
7
Nature Biotechnology
147 papers in training set
Top 2%
6.0%
50% of probability mass above
8
Genome Medicine
154 papers in training set
Top 2%
4.1%
9
Cell Systems
167 papers in training set
Top 3%
3.8%
10
mBio
750 papers in training set
Top 5%
3.4%
11
Genome Biology
555 papers in training set
Top 3%
3.4%
12
mSphere
281 papers in training set
Top 2%
2.9%
13
Cell Host & Microbe
113 papers in training set
Top 2%
2.9%
14
Cell
370 papers in training set
Top 10%
1.8%
15
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
16
ISME Communications
103 papers in training set
Top 1%
1.6%
17
eLife
5422 papers in training set
Top 46%
1.4%
18
Cell Reports
1338 papers in training set
Top 28%
1.3%
19
Cell Reports Methods
141 papers in training set
Top 4%
1.2%
20
Genome Research
409 papers in training set
Top 3%
0.9%
21
Microbial Genomics
204 papers in training set
Top 2%
0.9%
22
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
0.9%
23
Nature
575 papers in training set
Top 14%
0.9%
24
Advanced Science
249 papers in training set
Top 22%
0.6%