Back

Predicting Phage Host Interactions Across Taxonomic Levels: A Systematic Review and Meta-Analysis for Microbial Ecology

Romero-Calle, D. X.; Yucra Rojas, M.; Middelboe, M.

2026-04-30 microbiology
10.64898/2026.04.28.721508 bioRxiv
Show abstract

The prediction of phage-host interactions is key for several applications in biotechnology, medicine, and microbial ecology. Wide studies in machine learning tools have allowed the exploration of these interactions across multiple taxonomic levels. A systematic review and meta-analysis were conducted on 570 records retrieved from PubMed, Scopus, and Web of Science. Eleven studies were selected for the meta-analysis, encompassing 61 datasets. Precision across taxonomic levels (Domain, Phylum, Class, Order, Family, Genus, Species) was evaluated for several prediction tools. Statistical tests, including the Shapiro-Wilk and ANOVA tests, were used. A mixed-effects meta-regression model was used to examine the impact of taxonomic subgroups on the prediction of the proportion of Correctly Predicted PHIs. The results indicated significant variability in the performance of prediction tools across taxonomic levels. Domain-level predictions exhibited near-perfect Proportion of Correctly Predicted PHIs (0.99), whereas finer resolutions (Family and Order) showed considerable variability, with average precision values of 0.682 and 0.775, respectively. The mixed-effects meta-regression analysis revealed that Family and Species taxonomic subgroups were associated with significant reductions in the prediction Proportion of Correctly Predicted PHIs with effect sizes of -0.1464 and -0.1944, respectively. Residual heterogeneity was negligible, indicating that the moderators adequately explained the variability in prediction precision. This study highlights the importance of selecting the appropriate prediction tool based on the desired taxonomic resolution. The findings emphasize the need for further refinement of prediction algorithms, particularly at the Family and Species levels, where tools exhibit the most variability. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=136 SRC="FIGDIR/small/721508v1_ufig1.gif" ALT="Figure 1"> View larger version (39K): org.highwire.dtl.DTLVardef@4105bforg.highwire.dtl.DTLVardef@e07c46org.highwire.dtl.DTLVardef@1ff139corg.highwire.dtl.DTLVardef@1608690_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical Abstract.C_FLOATNO Overview of the systematic review and meta-analysis framework evaluating ML-based phage-host interaction prediction tools across taxonomic levels. C_FIG

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
mSystems
361 papers in training set
Top 0.4%
12.8%
2
Frontiers in Microbiology
375 papers in training set
Top 1.0%
6.9%
3
F1000Research
79 papers in training set
Top 0.1%
6.4%
4
PLOS ONE
4510 papers in training set
Top 31%
4.9%
5
PeerJ
261 papers in training set
Top 1.0%
4.9%
6
ISME Communications
103 papers in training set
Top 0.5%
4.0%
7
GigaScience
172 papers in training set
Top 0.4%
4.0%
8
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.4%
10
PLOS Biology
408 papers in training set
Top 6%
2.4%
50% of probability mass above
11
BMC Genomics
328 papers in training set
Top 1%
2.4%
12
Microbiology
57 papers in training set
Top 0.4%
2.1%
13
Microbiome
139 papers in training set
Top 2%
1.9%
14
Access Microbiology
22 papers in training set
Top 0.2%
1.9%
15
Nucleic Acids Research
1128 papers in training set
Top 9%
1.9%
16
Frontiers in Bioinformatics
45 papers in training set
Top 0.2%
1.8%
17
BMC Biology
248 papers in training set
Top 1%
1.7%
18
Microorganisms
101 papers in training set
Top 0.7%
1.7%
19
Scientific Reports
3102 papers in training set
Top 57%
1.7%
20
Microbial Genomics
204 papers in training set
Top 1%
1.7%
21
mBio
750 papers in training set
Top 8%
1.3%
22
Microbiology Spectrum
435 papers in training set
Top 4%
1.2%
23
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.1%
24
Gut Microbes
70 papers in training set
Top 0.8%
1.1%
25
eLife
5422 papers in training set
Top 53%
0.9%
26
BMC Microbiology
35 papers in training set
Top 1%
0.9%
27
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%
28
Viruses
318 papers in training set
Top 5%
0.8%
29
Environmental Microbiome
26 papers in training set
Top 0.6%
0.7%
30
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.6%