Computational prediction resolves thousands of homooligomeric phage protein structures
Grigson, S. R.; Geliashvili, N.; Schubert, T.; Bouras, G.; Mallawaarachchi, V.; Bogacz, M.; Hellmich, U.; Edwards, R. A.; Dutilh, B. E.
Show abstract
Bacteriophages (phages) play essential roles in microbial systems, yet most phage proteins remain poorly characterised. Protein tertiary and quaternary structure information contributes valuable information about protein function. As many phage proteins function as homooligomers, complexes that consist of multiple identical subunits, there is great interest in computationally predicting their configurations. Here we present a computational framework, the Phage Homomer Level Estimate and Generation Method (PHLEGM) for inferring homooligomeric states directly from the protein sequence by combining AlphaFold-Multimer modelling with inter-subunit interface quality assessment. We proceeded to experimentally validate two out of nine predicted homooligomers using size exclusion chromatography and complementary hydrodynamic techniques. These efforts confirmed our predictions for a dimer and a trimer, highlighting the value of experimentally benchmarked computational predictions and showing the challenges of heterologous phage protein production. Applied to >22,000 phage protein sequences in the PHROGs database, our approach revealed extensive diversity in phage homooligomeric protein complexes. Benchmarking against protein language model-based predictors on a curated reference set of known phage homooligomers demonstrated superior accuracy of our structure-based method, achieving robust performance in classifying protein homooligomeric states, with the highest accuracy observed for trimers and higher-order complexes. These results highlight the value of computational predictions to decipher the complexities of the vast viral sequence space. All predicted complex structures and functional inferences are made publicly available to support structural and functional studies of phage proteins.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.