Back

GANGE: Achieving Sequencing Without Sequencing With Diffusion Guided Generative Genomic Transformer

Gupta, S.; Kumar, A.; Bhati, U.; Shankar, R.

2026-04-17 bioinformatics
10.64898/2026.04.15.718133 bioRxiv
Show abstract

The genome of a species is its book of life, but opening that book remains a costly affair due to the limitations the existing sequencing technologies pose. Short reads sequencers struggle to capture long and complex genomes, though have high fidelity rate. To counter that long reads from IIIrd generation sequencers are used, which are full of indel errors. Thus, reads from both approaches are collectively used with very high coverage, making the sequencing projects unreasonably high of cost and unapproachable to majority. Here we present a first of its kind generative deep-learning system, GANGE, which not just recovers the correct sequence with high accuracy from indel prone ONT reads at manifold lesser coverage but also extends it by 4kb, achieving sequencing without sequencing, horizontally as well as vertically while maintaining >92% accuracy consistently. This all makes it possible to drastically pull down sequencing project cost. GANGE was tested across A. thaliana, O. sativa genomes and Human chromosome 1 where it delivered outstanding assembly performance. Besides this, it was also used to accurately generate 2kb upstream promoters of all the genes from 12 different species, demonstrating that one can now also take up regulomics research just using RNA data alone when genome sequence is not available. With this all, GANGE brings a democratic turning point in the area of genomics and sequencing research.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 8%
17.0%
2
Genome Biology
555 papers in training set
Top 0.1%
14.0%
3
Nature Methods
336 papers in training set
Top 1%
9.8%
4
Nature Biotechnology
147 papers in training set
Top 1%
7.0%
5
Genome Research
409 papers in training set
Top 0.6%
4.7%
50% of probability mass above
6
Nature Machine Intelligence
61 papers in training set
Top 0.6%
4.7%
7
Bioinformatics
1061 papers in training set
Top 5%
3.9%
8
Cell Systems
167 papers in training set
Top 4%
3.6%
9
Cell Genomics
162 papers in training set
Top 2%
3.0%
10
Nature
575 papers in training set
Top 8%
2.7%
11
Advanced Science
249 papers in training set
Top 10%
1.8%
12
Science
429 papers in training set
Top 14%
1.7%
13
Nucleic Acids Research
1128 papers in training set
Top 11%
1.6%
14
Bioinformatics Advances
184 papers in training set
Top 3%
1.6%
15
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.4%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.4%
17
Nature Genetics
240 papers in training set
Top 5%
1.3%
18
Frontiers in Genetics
197 papers in training set
Top 7%
1.1%
19
PLOS ONE
4510 papers in training set
Top 63%
0.9%
20
Cell
370 papers in training set
Top 16%
0.9%
21
iScience
1063 papers in training set
Top 28%
0.9%
22
Nature Computational Science
50 papers in training set
Top 2%
0.8%
23
Horticulture Research
43 papers in training set
Top 1%
0.8%
24
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
25
Molecular Plant
36 papers in training set
Top 2%
0.7%
26
Science Advances
1098 papers in training set
Top 32%
0.7%
27
Genome Medicine
154 papers in training set
Top 9%
0.7%
28
Scientific Reports
3102 papers in training set
Top 77%
0.7%
29
GigaScience
172 papers in training set
Top 3%
0.7%