Back

Heuristics for the De Bruijn Graph Sequence Mapping Problem

Rocha, L. B.; Adi, S. S.; Araujo, E.

2023-02-07 bioinformatics
10.1101/2023.02.05.527069 bioRxiv
Show abstract

In computational biology, mapping a sequence s onto a sequence graph G is a significant challenge. One possible approach to addressing this problem is to identify a walk p in G that spells a sequence which is most similar to s. This problem is known as the Graph Sequence Mapping Problem (GSMP). In this paper, we study an alternative problem formulation, namely the De Bruijn Graph Sequence Mapping Problem (BSMP), which can be stated as follows: given a sequence s and a De Bruijn graph Gk (where k[≥] 2), find a walk p in Gk that spells a sequence which is most similar to s according to a distance metric. We present both exact algorithms and approximate distance heuristics for solving this problem, using edit distance as a criterion for measuring similarity.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1.0%
23.2%
2
BMC Bioinformatics
383 papers in training set
Top 0.4%
14.8%
3
Journal of Computational Biology
37 papers in training set
Top 0.1%
10.4%
4
PLOS Computational Biology
1633 papers in training set
Top 5%
7.0%
50% of probability mass above
5
PLOS ONE
4510 papers in training set
Top 30%
5.0%
6
Algorithms for Molecular Biology
15 papers in training set
Top 0.1%
3.7%
7
Scientific Reports
3102 papers in training set
Top 34%
3.7%
8
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
2.8%
9
Frontiers in Molecular Biosciences
100 papers in training set
Top 1%
1.8%
10
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
11
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
12
Genome Research
409 papers in training set
Top 2%
1.7%
13
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.2%
1.5%
14
Frontiers in Genetics
197 papers in training set
Top 7%
1.0%
15
iScience
1063 papers in training set
Top 25%
0.9%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
17
BioData Mining
15 papers in training set
Top 0.8%
0.8%
18
Journal of Molecular Biology
217 papers in training set
Top 3%
0.8%
19
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.7%
20
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.8%
0.7%
21
Cell Systems
167 papers in training set
Top 14%
0.5%
22
Physical Biology
43 papers in training set
Top 3%
0.5%
23
F1000Research
79 papers in training set
Top 6%
0.5%
24
PeerJ
261 papers in training set
Top 18%
0.5%