Back

Integrating targeted genome mining and structure-guided modeling reveals unexplored 7-deazapurine-containing pathways

Cediel-Becerra, J. D. D.; Chevrette, M. G.; de Crecy-Lagard, V.; Dias, R.

2026-04-19 bioinformatics
10.64898/2026.04.15.718813 bioRxiv
Show abstract

7-deazapurines are nucleoside analogs that play key roles in nucleic acid modification and can serve as building blocks for diverse, bioactive secondary metabolites. Despite their biological significance, their biosynthetic diversity, distribution, and enzymatic determinants of structural diversification remain poorly understood. Here, we leverage large-scale targeted genome mining, phylogenetic, and network analysis to explore 7-deazapurine-containing pathways across [~]2 million bacterial genomes. We identified over 900 candidate biosynthetic gene clusters (BGCs), grouped into more than 100 families, most of which remain uncharacterized. These GATOR-GC-predicted BGCs were predominantly found in Streptomyces. We then examined enzyme-substrate interactions in three representative pathways: (i) peptidyl-deazapurines, (ii) huimycin, and (iii) dapiramicin A. Molecular docking and molecular dynamics (MD) simulations recapitulated known enzyme-substrate interactions and highlighted candidate catalytic residues governing amide bond formation, methylation, and glycosylation. Using this genome- and structure-guided framework, we identified a candidate BGC for dapiramicin A and proposed tailoring steps, including scaffold methylation and deoxy-sugar formation. These findings expand the known diversity of 7-deazapurine-containing BGCs and demonstrate how integrating genome mining with structural modeling can link BGCs to chemical function, providing a foundation for discovering and characterizing 7-deazapurine-containing secondary metabolites. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=79 SRC="FIGDIR/small/718813v1_ufig1.gif" ALT="Figure 1"> View larger version (29K): org.highwire.dtl.DTLVardef@c00feforg.highwire.dtl.DTLVardef@156468forg.highwire.dtl.DTLVardef@1326e90org.highwire.dtl.DTLVardef@1f8d57b_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Cell Chemical Biology
81 papers in training set
Top 0.1%
10.2%
2
Advanced Science
249 papers in training set
Top 2%
6.9%
3
Journal of the American Chemical Society
199 papers in training set
Top 0.9%
6.9%
4
Nature Communications
4913 papers in training set
Top 29%
6.4%
5
eLife
5422 papers in training set
Top 17%
4.9%
6
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
4.3%
7
Molecular Plant
36 papers in training set
Top 0.3%
4.0%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
9
Acta Pharmaceutica Sinica B
11 papers in training set
Top 0.2%
3.6%
50% of probability mass above
10
ACS Chemical Biology
150 papers in training set
Top 0.6%
2.9%
11
PLOS Computational Biology
1633 papers in training set
Top 11%
2.9%
12
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
2.6%
13
Horticulture Research
43 papers in training set
Top 0.7%
2.6%
14
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 27%
2.1%
15
Nature Chemical Biology
104 papers in training set
Top 1%
2.1%
16
Plant Communications
35 papers in training set
Top 0.6%
2.1%
17
iScience
1063 papers in training set
Top 14%
1.7%
18
Protein & Cell
25 papers in training set
Top 1%
1.7%
19
Nucleic Acids Research
1128 papers in training set
Top 12%
1.5%
20
Cell Reports Physical Science
18 papers in training set
Top 0.3%
1.3%
21
JACS Au
35 papers in training set
Top 0.9%
0.9%
22
Chemical Science
71 papers in training set
Top 2%
0.8%
23
Cell Genomics
162 papers in training set
Top 6%
0.8%
24
Communications Chemistry
39 papers in training set
Top 0.9%
0.8%
25
Communications Biology
886 papers in training set
Top 21%
0.8%
26
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.8%
27
Nature Biotechnology
147 papers in training set
Top 8%
0.8%
28
Cell Systems
167 papers in training set
Top 12%
0.8%
29
Cell Discovery
54 papers in training set
Top 6%
0.6%
30
Synthetic and Systems Biotechnology
10 papers in training set
Top 0.6%
0.6%