LoReMINE: Long Read-based Microbial genome mining pipeline

Agrawal, A. A.; Bader, C. D.; Kalinina, O. V.

2026-02-04 bioinformatics

10.64898/2026.02.02.703231 bioRxiv

Show abstract

Microbial natural products represent a chemically diverse repertoire of small molecules with major pharmaceutical potential. Despite the increasing availability of microbial genome sequences, large-scale natural product discovery remains challenging because the existing genome mining approaches lack integrated workflows for rapid dereplication of known compounds and prioritization of novel candidates, forcing researchers to rely on multiple tools that requires extensive manual curation and expert intervention at each step. To address these limitations, we introduce LoReMINE (Long Read-based Microbial genome mining pipeline), a fully automated end-to-end pipeline that generates high-quality assemblies, performs taxonomic classification, predicts biosynthetic gene clusters (BGCs) responsible for biosynthesis of natural products, and clusters them into gene cluster families (GCFs) directly from long-read sequencing data. By integrating state-of-the-art tools into a seamless pipeline, LoReMINE enables scalable, reproducible, and comprehensive genome mining across diverse microbial taxa. The pipeline is openly available at https://github.com/kalininalab/LoReMINE and can be installed via Conda (https://anaconda.org/kalininalab/loremine), facilitating broad adoption by the natural product research community. Author summaryFor decades, microbial natural products have been a major source of medicines, with most of the clinically used antibiotics being their derivatives. Recent advances in DNA sequencing technologies now allow the reconstruction of more complete and continuous microbial genomes, revealing a vast and largely untapped diversity of biosynthetic gene clusters responsible for natural product biosynthesis. Despite these advances, large-scale natural product discovery remains difficult because current genome mining approaches rely on many separate tools and lack an integrated workflow to dereplicate known compounds and prioritize novel biosynthetic pathways. To address these limitations, we introduce LoReMINE, an automated pipeline designed to simplify microbial genome mining directly from long-read sequencing data. LoReMINE integrates genome assembly, taxonomic classification, identification of biosynthetic gene clusters, and their clustering into gene cluster families within a single, reproducible workflow. This streamlined approach enables scalable analysis across diverse microbial taxa and facilitates comprehensive exploration of microbial biosynthetic potential. The pipeline is designed for both experimental and computational researchers, helping to advance natural product research and contribute towards the discovery of new therapeutic drugs.

LoReMINE: Long Read-based Microbial genome mining pipeline

Matching journals