Back

New genetic codes in bacteria and archaea identified with a fast k-mer based algorithm

Melnykov, A. V.

2026-04-06 bioinformatics
10.64898/2026.04.02.715157 bioRxiv
Show abstract

The genetic code is conserved across all domains of life and is often described as universal. Nevertheless, many exceptions to the "universal" code have now been documented, most of these through manual or semiautomated inspection of highly conserved genes. Modern bioinformatics tools improved our ability to find alternative genetic codes but remain computationally expensive preventing widespread use on thousands of new species identified by sequencing environmental samples. Here I report a >100 fold accelerated method for inferring the genetic code directly from assembled genomes and apply it to thousands of previously uncharacterized assemblies from archaea and bacteria. I describe new candidate genetic code variations in both domains, including the first archaea sense codon reassignment. Identifying genetic code variations is important for understanding evolution of the standard code and improving accuracy of protein databases and open reading frame identification.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genome Biology
555 papers in training set
Top 0.1%
14.3%
2
Nucleic Acids Research
1128 papers in training set
Top 2%
8.4%
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 9%
7.1%
4
Nature Communications
4913 papers in training set
Top 25%
7.1%
5
Nature Biotechnology
147 papers in training set
Top 1%
7.1%
6
Molecular Biology and Evolution
488 papers in training set
Top 0.7%
6.3%
50% of probability mass above
7
Bioinformatics
1061 papers in training set
Top 5%
4.3%
8
Science
429 papers in training set
Top 7%
4.3%
9
Cell Systems
167 papers in training set
Top 3%
4.2%
10
Nature Methods
336 papers in training set
Top 3%
2.6%
11
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
12
Cell
370 papers in training set
Top 9%
2.1%
13
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.9%
14
Structure
175 papers in training set
Top 2%
1.7%
15
Communications Biology
886 papers in training set
Top 10%
1.7%
16
eLife
5422 papers in training set
Top 45%
1.5%
17
PLOS ONE
4510 papers in training set
Top 60%
1.2%
18
iScience
1063 papers in training set
Top 23%
1.1%
19
Scientific Reports
3102 papers in training set
Top 69%
0.9%
20
Nature
575 papers in training set
Top 14%
0.9%
21
Nature Microbiology
133 papers in training set
Top 4%
0.8%
22
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
23
Science Advances
1098 papers in training set
Top 28%
0.8%
24
Nature Genetics
240 papers in training set
Top 7%
0.7%
25
PeerJ
261 papers in training set
Top 15%
0.7%
26
Cell Genomics
162 papers in training set
Top 7%
0.7%
27
PLOS Genetics
756 papers in training set
Top 17%
0.6%