From Circles to Signals: Representation Learning on Ultra-Long Extrachromosomal Circular DNA

Li, J.; Liu, Z.; Zhang, Z.; Zhang, J.; Singh, R.

2026-03-17 bioinformatics

10.1101/2025.11.22.689941 bioRxiv

Show abstract

Extrachromosomal circular DNA (eccDNA) is a covalently closed circular DNA molecule that plays an important role in cancer biology. Genomic foundation models have recently emerged as a powerful direction for DNA sequence modeling, enabling the direct prediction of biologically relevant properties from DNA sequences. Although recent genomic foundation models have shown strong performance on general DNA sequence modeling, their application to eccDNA remains limited: existing approaches either rely on computationally expensive attention mechanisms or truncate ultra-long sequences into kilobase fragments, thereby disrupting long-range continuity and ignoring the molecules circular topology. To overcome these problems, we introduce eccDNAMamba, a bidirectional state space model (SSM) built upon the Mamba-2 framework, which scales linearly with input sequence length and enables scalable modeling of ultra-long eccDNA sequences. eccDNAMamba further incorporates a circular augmentation strategy to preserve the intrinsic circular topology of eccDNA. Comprehensive evaluations against state-of-the-art genomic foundation models demonstrate that eccDNAMamba achieves superior performance on ultra-long sequences across multiple task settings, such as cancer versus healthy eccDNA discrimination and eccDNA copy-number level prediction. Moreover, the Integrated Gradient (IG) based model explanation indicates that eccDNAMamba focuses on biologically meaningful regulatory elements and can uncover key sequence patterns in cancer-derived eccDNAs. Overall, these results demonstrate that eccDNAMamba effectively models ultra-long eccDNA sequences by leveraging their unique circular topology and regulatory architecture, bridging a critical gap in sequence analysis. Our codes and datasets are available at https://github.com/zzq1zh/eccDNAMamba.

From Circles to Signals: Representation Learning on Ultra-Long Extrachromosomal Circular DNA

Matching journals