Back

Accurate Protein Domain Structure Annotation with DomainMapper

Manriquez-Sandoval, E.; Fried, S. D.

2022-03-20 bioinformatics
10.1101/2022.03.19.484986 bioRxiv
Show abstract

Automated domain annotation plays a number of important roles in structural informatics and typically involves searching query sequences against Hidden Markov Model (HMM) profiles. This process can be ambiguous or inaccurate when proteins contain domains with non-contiguous residue ranges, and especially when insertional domains are hosted within them. Here we present DomainMapper, an algorithm that accurately assigns a unique domain structure annotation to any query sequence, including those with complex topologies. We validate our domain assignments using the AlphaFold database and confirm that non-contiguity is pervasive (6.5% of all domains in yeast and 2.5% in human). Using this resource, we find that certain folds have strong propensities to be non-contiguous or insertional across the Tree of Life, likely underlying evolutionary preferences for domain topology. DomainMapper is freely available and can be run as a single command line function. HIGHLIGHTSDomainMapper generates a unique domain structure annotation, including non-contiguous and insertional domains Automated annotations of non-contiguous domains are validated against the AlphaFold database DomainMapper can be easily installed and used by non-experts Certain folds have strong preferences to be non-contiguous or insertional GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=188 SRC="FIGDIR/small/484986v1_ufig1.gif" ALT="Figure 1"> View larger version (89K): org.highwire.dtl.DTLVardef@1900be8org.highwire.dtl.DTLVardef@1fdae2borg.highwire.dtl.DTLVardef@1b5bd5corg.highwire.dtl.DTLVardef@a31d56_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.9%
26.3%
2
Protein Science
221 papers in training set
Top 0.1%
22.9%
3
Bioinformatics Advances
184 papers in training set
Top 0.4%
6.5%
50% of probability mass above
4
Journal of Molecular Biology
217 papers in training set
Top 0.3%
4.9%
5
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
3.7%
6
Structure
175 papers in training set
Top 0.8%
3.6%
7
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
3.6%
8
PLOS Computational Biology
1633 papers in training set
Top 11%
3.1%
9
BMC Bioinformatics
383 papers in training set
Top 4%
2.1%
10
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.8%
11
PLOS ONE
4510 papers in training set
Top 53%
1.7%
12
Journal of Proteome Research
215 papers in training set
Top 1%
1.4%
13
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.6%
1.2%
14
Frontiers in Bioinformatics
45 papers in training set
Top 0.5%
1.1%
15
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.3%
1.0%
16
Journal of Cheminformatics
25 papers in training set
Top 0.5%
0.8%
17
Scientific Reports
3102 papers in training set
Top 72%
0.8%
18
SoftwareX
15 papers in training set
Top 0.4%
0.8%
19
The Journal of Physical Chemistry B
158 papers in training set
Top 2%
0.7%
20
Frontiers in Molecular Biosciences
100 papers in training set
Top 7%
0.5%
21
PeerJ
261 papers in training set
Top 19%
0.5%
22
Frontiers in Genetics
197 papers in training set
Top 12%
0.5%
23
GigaScience
172 papers in training set
Top 4%
0.5%