Back
A lightweight codon-based DNA Transformer for Regulatory Region Identification in the Genome
Karthik, A. S. P.; Das, A. B.
2026-05-07
bioinformatics
10.64898/2026.05.04.722647
bioRxiv
Show abstract
We developed a lightweight codon-based DNA Transformer equipped with multi-head self-attention and an adaptive classifier head, which achieves exon intron classification with high accuracy and also has moderate accuracy in CDS classification and splice site recognition. We named this model as ExIT (Exon-Intron Transformer). We have implemented codon tokenization for this model. This has been validated on the human genome with external validation from the chimpanzee genome. Further benchmarking has implied that our model is better than the existing models in the above tasks.
Matching journals
●Non-profit
◐University press
○Commercial
The top 3 journals account for 50% of the predicted probability mass.
1
Nucleic Acids Research
◐
1128 papers in training set
Top 0.1%
37.9%
2
Nature Communications
○
4913 papers in training set
Top 18%
10.1%
3
BMC Bioinformatics
○
383 papers in training set
Top 2%
4.3%
50% of probability mass above
4
Bioinformatics
◐
1061 papers in training set
Top 5%
4.0%
5
Bioinformatics Advances
◐
184 papers in training set
Top 1%
3.6%
6
Genomics, Proteomics & Bioinformatics
◐
171 papers in training set
Top 2%
3.6%
7
Briefings in Bioinformatics
◐
326 papers in training set
Top 2%
3.1%
8
Genome Research
●
409 papers in training set
Top 2%
2.5%
9
Frontiers in Genetics
○
197 papers in training set
Top 3%
2.1%
10
Communications Biology
○
886 papers in training set
Top 5%
2.1%
11
PLOS Computational Biology
●
1633 papers in training set
Top 13%
2.1%
12
PLOS ONE
●
4510 papers in training set
Top 50%
1.9%
13
NAR Genomics and Bioinformatics
◐
214 papers in training set
Top 2%
1.7%
14
Scientific Reports
○
3102 papers in training set
Top 59%
1.7%
15
IEEE Transactions on Computational Biology and Bioinformatics
●
17 papers in training set
Top 0.3%
1.5%
16
Genome Biology
○
555 papers in training set
Top 5%
1.3%
17
Database
◐
51 papers in training set
Top 0.6%
1.1%
18
Advanced Science
○
249 papers in training set
Top 17%
0.9%
19
Computational and Structural Biotechnology Journal
●
216 papers in training set
Top 8%
0.9%
20
GigaScience
◐
172 papers in training set
Top 3%
0.8%
21
PLOS Genetics
●
756 papers in training set
Top 14%
0.8%
22
Computers in Biology and Medicine
○
120 papers in training set
Top 4%
0.8%
23
IEEE/ACM Transactions on Computational Biology and Bioinformatics
●
32 papers in training set
Top 0.6%
0.7%
24
Journal of Molecular Biology
○
217 papers in training set
Top 4%
0.6%
25
Patterns
○
70 papers in training set
Top 3%
0.6%
26
Computer Methods and Programs in Biomedicine
○
27 papers in training set
Top 1%
0.6%
27
Nature Machine Intelligence
○
61 papers in training set
Top 4%
0.6%