Back

Partial Fitch Graphs: Characterization, Satisfiability and Complexity

Korchmaros, A.; Hellmuth, M.; Ramirez-Rafael, J. A.; Schmidt, B.; Stadler, P. F.; Thekkumpadan Puthiyaveedu, S.

2024-05-03 bioinformatics
10.1101/2024.04.30.591842 bioRxiv
Show abstract

Horizontal gene transfer is an important contributor to evolution. Following Walter M. Fitch, two genes are xenologs if at least one HGT separates them. More formally, the directed Fitch graph has a set of genes as its vertices, and directed edges (x, y) for all pairs of genes x and y for which y has been horizontally transferred at least once since it diverged from the last common ancestor of x and y. Subgraphs of Fitch graphs can be inferred by comparative sequence analysis. In many cases, however, only partial knowledge about the "full" Fitch graph can be obtained. Here, we characterize Fitch-satisfiable graphs that can be extended to a biologically feasible "full" Fitch graph and derive a simple polynomial-time recognition algorithm. We then proceed to show that several versions of finding the Fitch graph with total maximum (confidence) edge-weights are NP-hard. In addition, we provide a greedy-heuristic for "optimally" recovering Fitch graphs from partial ones. Somewhat surprisingly, even if [~] 80% of information of the underlying input Fitch-graph G is lost (i.e., the partial Fitch graph contains only [~] 20% of the edges of G), it is possible to recover [~] 90% of the original edges of G on average.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.2%
22.7%
2
Bioinformatics
1061 papers in training set
Top 2%
12.5%
3
Algorithms for Molecular Biology
15 papers in training set
Top 0.1%
10.2%
4
Genome Research
409 papers in training set
Top 0.3%
6.9%
50% of probability mass above
5
PLOS Computational Biology
1633 papers in training set
Top 6%
6.4%
6
iScience
1063 papers in training set
Top 2%
4.9%
7
Journal of Computational Biology
37 papers in training set
Top 0.1%
3.6%
8
Scientific Reports
3102 papers in training set
Top 41%
3.1%
9
PLOS ONE
4510 papers in training set
Top 44%
2.8%
10
Nature Communications
4913 papers in training set
Top 44%
2.6%
11
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 30%
1.8%
13
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
14
Journal of The Royal Society Interface
189 papers in training set
Top 4%
0.9%
15
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.9%
16
Genetics
225 papers in training set
Top 4%
0.8%
17
Peer Community Journal
254 papers in training set
Top 4%
0.8%
18
Science
429 papers in training set
Top 20%
0.7%
19
Science Advances
1098 papers in training set
Top 31%
0.7%
20
Genome Biology
555 papers in training set
Top 8%
0.7%
21
Nature
575 papers in training set
Top 17%
0.6%
22
GENETICS
189 papers in training set
Top 2%
0.5%
23
PNAS Nexus
147 papers in training set
Top 3%
0.5%