CoEVFold suite: user friendly pipelines to visually represent protein coevolution
Graham, C. L.; Cremona, L.; Little, R.; Rodrigues, C. D.
Show abstract
Multiple sequence alignment (MSA) data underlies current principles in protein folding and protein-protein interaction prediction, from which large language models (LLMs) in tandem with protein datasets, can predict protein structure. However, what is missing are user-friendly tools that enable researchers to predict and demonstrate coevolution - the principal input which these MSAs infer. Here we present tools to identify and visualize coevolution, through a pipeline (CoEVFold) that uses basic direct coupling algorithms derived from GREMLIN and alignment of sequences from MMSEQs2. The pipeline generates a visual representation of coevolution for a single protein but can also represent coevolution of homomeric or heteromeric protein complexes, as well as coevolution within protein networks. The input for this pipeline can be an amino acid sequence, or user input protein structures from Alphafold their own files or the PDB database. In validation of CoEVFolds capabilities, and utilising proteins from known prokaryotic and eukaryotic model systems (Bacillus subtilis, Escherichia coli and Saccharomyces cerevisiae), as well as phage proteins, CoEVFold predicts coevolution between proteins known to interact, proteins known to oligomerise, and coevolution in proteins known to be part of a protein complex. Collectively, these suite of tools, named CoEVFold suite, have broad applicability and provide a useful toolkit to those interested in dissecting protein-protein interactions and networks. AvailabilityThe code is available online at https://colab.research.google.com/drive/1MSSvNTq7KZ4Lr0XTz89vUuK-J3xOTzwS?usp=sharing and Github. https://github.com/MishterBluesky/CoEVFold Supplementary informationSupplementary data is available via Figshare and supplementary materials.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.