Back

Computational prediction resolves thousands of homooligomeric phage protein structures

Grigson, S. R.; Geliashvili, N.; Schubert, T.; Bouras, G.; Mallawaarachchi, V.; Bogacz, M.; Hellmich, U.; Edwards, R. A.; Dutilh, B. E.

2026-05-25 microbiology
10.64898/2026.05.24.727406 bioRxiv
Show abstract

Bacteriophages (phages) play essential roles in microbial systems, yet most phage proteins remain poorly characterised. Protein tertiary and quaternary structure information contributes valuable information about protein function. As many phage proteins function as homooligomers, complexes that consist of multiple identical subunits, there is great interest in computationally predicting their configurations. Here we present a computational framework, the Phage Homomer Level Estimate and Generation Method (PHLEGM) for inferring homooligomeric states directly from the protein sequence by combining AlphaFold-Multimer modelling with inter-subunit interface quality assessment. We proceeded to experimentally validate two out of nine predicted homooligomers using size exclusion chromatography and complementary hydrodynamic techniques. These efforts confirmed our predictions for a dimer and a trimer, highlighting the value of experimentally benchmarked computational predictions and showing the challenges of heterologous phage protein production. Applied to >22,000 phage protein sequences in the PHROGs database, our approach revealed extensive diversity in phage homooligomeric protein complexes. Benchmarking against protein language model-based predictors on a curated reference set of known phage homooligomers demonstrated superior accuracy of our structure-based method, achieving robust performance in classifying protein homooligomeric states, with the highest accuracy observed for trimers and higher-order complexes. These results highlight the value of computational predictions to decipher the complexities of the vast viral sequence space. All predicted complex structures and functional inferences are made publicly available to support structural and functional studies of phage proteins.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Viruses
318 papers in training set
Top 0.1%
19.6%
2
Virus Evolution
140 papers in training set
Top 0.2%
8.5%
3
Nucleic Acids Research
1128 papers in training set
Top 3%
6.4%
4
PLOS Computational Biology
1633 papers in training set
Top 6%
6.4%
5
Nature Communications
4913 papers in training set
Top 35%
4.3%
6
PLOS Pathogens
721 papers in training set
Top 3%
3.6%
7
Journal of Virology
456 papers in training set
Top 1%
3.6%
50% of probability mass above
8
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.1%
2.6%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.6%
10
mBio
750 papers in training set
Top 7%
1.9%
11
Scientific Reports
3102 papers in training set
Top 57%
1.7%
12
Communications Biology
886 papers in training set
Top 8%
1.7%
13
eLife
5422 papers in training set
Top 41%
1.7%
14
Journal of Molecular Biology
217 papers in training set
Top 2%
1.7%
15
Science Advances
1098 papers in training set
Top 20%
1.5%
16
PLOS Biology
408 papers in training set
Top 11%
1.5%
17
Journal of General Virology
46 papers in training set
Top 0.4%
1.5%
18
mSystems
361 papers in training set
Top 5%
1.5%
19
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.3%
20
GigaScience
172 papers in training set
Top 2%
1.3%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
22
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.0%
23
Protein Science
221 papers in training set
Top 1%
1.0%
24
Frontiers in Bioinformatics
45 papers in training set
Top 0.5%
1.0%
25
Structure
175 papers in training set
Top 3%
0.9%
26
iScience
1063 papers in training set
Top 26%
0.9%
27
PLOS ONE
4510 papers in training set
Top 65%
0.8%
28
Microbial Genomics
204 papers in training set
Top 2%
0.8%
29
Biophysical Journal
545 papers in training set
Top 5%
0.8%
30
Frontiers in Immunology
586 papers in training set
Top 7%
0.8%