Back

A fast and simple approach to k-mer decomposition

Kunzmann, P.

2024-07-26 bioinformatics
10.1101/2024.07.26.605312 bioRxiv
Show abstract

Alignment searches are fast heuristic methods to identify similar regions between two sequences. This group of algorithms is ubiquitously used in a myriad of software to find homologous sequences or to map sequence reads to genomes. Often the first step in alignment searches is k-mer decomposition: listing all overlapping subsequences of length k. This article presents a simple integer representation of k-mers and shows how a sequence can be quickly decomposed into k-mers in constant time with respect to k.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
18.6%
2
BMC Bioinformatics
383 papers in training set
Top 0.3%
18.6%
3
PLOS ONE
4510 papers in training set
Top 13%
14.3%
50% of probability mass above
4
PLOS Computational Biology
1633 papers in training set
Top 8%
4.3%
5
Algorithms for Molecular Biology
15 papers in training set
Top 0.1%
4.3%
6
iScience
1063 papers in training set
Top 5%
3.6%
7
Scientific Reports
3102 papers in training set
Top 40%
3.2%
8
Journal of Computational Biology
37 papers in training set
Top 0.1%
2.1%
9
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
2.1%
10
Frontiers in Bioinformatics
45 papers in training set
Top 0.2%
1.8%
11
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
12
Gigabyte
60 papers in training set
Top 0.9%
1.3%
13
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
14
PeerJ
261 papers in training set
Top 13%
0.9%
15
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.6%
0.8%
16
Frontiers in Molecular Biosciences
100 papers in training set
Top 5%
0.7%
17
F1000Research
79 papers in training set
Top 5%
0.7%
18
Journal of Molecular Biology
217 papers in training set
Top 4%
0.7%
19
Peer Community Journal
254 papers in training set
Top 4%
0.7%
20
Genome Biology
555 papers in training set
Top 7%
0.7%
21
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.7%
22
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
23
GigaScience
172 papers in training set
Top 3%
0.7%
24
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.6%
25
BMC Genomics
328 papers in training set
Top 7%
0.6%