Back

A Cophylogenetic Approach for Virus-Host Interaction Prediction

Chowdhury, M. Z. U. S.; Murali, T. M.; Sashittal, P.

2026-02-27 evolutionary biology
10.64898/2026.02.26.708038 bioRxiv
Show abstract

Advances in metagenomics have rapidly expanded viral discovery, revealing vast diversity across Earths virosphere. Yet most virus-host interactions--i.e., which viruses infect which hosts--remain unrecorded. Identifying these interactions is essential for anticipating zoonotic spillover events and advancing biomedical applications such as bacteriophage therapy. However, the sheer diversity of viruses and hosts makes comprehensive experimental mapping infeasible, motivating the need for computational approaches. Most existing prediction methods rely on supervised learning strategies that use sequence-derived features, such as codon usage bias or k-mer frequencies, and do not model the coevolutionary processes that shape virus-host interactions. This limits their ability to generalize and the evolutionary interpretability of their predictions. We introduce CoEvoLink, a framework for predicting virus-host interactions that integrates sequence-based evidence with phylogenetic signal by explicitly modeling the coevolutionary histories of viruses and hosts. CoEvoLink infers likely but unobserved interactions by minimizing the number of evolutionary events required to explain them, yielding the most parsimonious interaction under a coevolutionary model. This formulation generalizes classical maximum parsimony, typically defined on a single phylogeny, by jointly optimizing parsimony across both virus and host phylogenies. Sequence-based information is incorporated by assigning a cost to each potential interaction that reflects its likelihood based on genomic features. By drawing a connection between computing parsimony on interaction matrices and maximum parsimony on phylogenetic networks, we derive a polynomial-time algorithm that balances parsimony with sequence-derived prediction cost. We demonstrate the effectiveness of CoEvoLink on simulated data under diverse coevolutionary models. Applying CoEvoLink, we identified putative bat hosts of betacoronaviruses that have not yet been cataloged in the VIRION database. On a benchmark derived from metagenomic sequencing data, we demonstrate that CoEvoLink improves the performance of existing phage-host prediction tools using cophylogenetic information. Code availabilityhttps://github.com/sashittal-group/CoEvoLink Note: This paper is accepted at RECOMB 2026 (30th Annual International Conference on Research in Computational Molecular Biology).

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Virus Evolution
140 papers in training set
Top 0.1%
22.3%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 6%
10.0%
3
Bioinformatics
1061 papers in training set
Top 4%
6.3%
4
Cell Systems
167 papers in training set
Top 3%
4.8%
5
PLOS Computational Biology
1633 papers in training set
Top 8%
4.1%
6
Molecular Biology and Evolution
488 papers in training set
Top 1%
3.9%
50% of probability mass above
7
Science
429 papers in training set
Top 8%
3.6%
8
Nature Communications
4913 papers in training set
Top 42%
3.2%
9
Systematic Biology
121 papers in training set
Top 0.2%
3.0%
10
Microbiome
139 papers in training set
Top 1%
2.1%
11
Genome Biology
555 papers in training set
Top 4%
1.9%
12
Patterns
70 papers in training set
Top 0.9%
1.7%
13
Genome Research
409 papers in training set
Top 2%
1.7%
14
Genetics
225 papers in training set
Top 2%
1.7%
15
Nature Ecology & Evolution
113 papers in training set
Top 3%
1.6%
16
Ecology Letters
121 papers in training set
Top 0.8%
1.6%
17
eLife
5422 papers in training set
Top 44%
1.6%
18
Nature
575 papers in training set
Top 12%
1.3%
19
Nature Computational Science
50 papers in training set
Top 0.9%
1.3%
20
Nature Genetics
240 papers in training set
Top 5%
1.3%
21
PLOS Biology
408 papers in training set
Top 14%
1.2%
22
Communications Biology
886 papers in training set
Top 19%
0.9%
23
Science Translational Medicine
111 papers in training set
Top 5%
0.9%
24
Science Advances
1098 papers in training set
Top 27%
0.9%
25
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
26
iScience
1063 papers in training set
Top 33%
0.7%
27
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
28
Nature Microbiology
133 papers in training set
Top 5%
0.7%
29
PLOS Genetics
756 papers in training set
Top 16%
0.7%
30
Viruses
318 papers in training set
Top 6%
0.6%