Accurate Protein Domain Structure Annotation with DomainMapper
Manriquez-Sandoval, E.; Fried, S. D.
Show abstract
Automated domain annotation plays a number of important roles in structural informatics and typically involves searching query sequences against Hidden Markov Model (HMM) profiles. This process can be ambiguous or inaccurate when proteins contain domains with non-contiguous residue ranges, and especially when insertional domains are hosted within them. Here we present DomainMapper, an algorithm that accurately assigns a unique domain structure annotation to any query sequence, including those with complex topologies. We validate our domain assignments using the AlphaFold database and confirm that non-contiguity is pervasive (6.5% of all domains in yeast and 2.5% in human). Using this resource, we find that certain folds have strong propensities to be non-contiguous or insertional across the Tree of Life, likely underlying evolutionary preferences for domain topology. DomainMapper is freely available and can be run as a single command line function. HIGHLIGHTSDomainMapper generates a unique domain structure annotation, including non-contiguous and insertional domains Automated annotations of non-contiguous domains are validated against the AlphaFold database DomainMapper can be easily installed and used by non-experts Certain folds have strong preferences to be non-contiguous or insertional GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=188 SRC="FIGDIR/small/484986v1_ufig1.gif" ALT="Figure 1"> View larger version (89K): org.highwire.dtl.DTLVardef@1900be8org.highwire.dtl.DTLVardef@1fdae2borg.highwire.dtl.DTLVardef@1b5bd5corg.highwire.dtl.DTLVardef@a31d56_HPS_FORMAT_FIGEXP M_FIG C_FIG
Matching journals
The top 3 journals account for 50% of the predicted probability mass.