Back

In-silico analysis of cyanobacteriochrome architectures and spectral diversity

Mihnev, A.; Amtmann, A.

2022-09-23 biochemistry
10.1101/2022.09.22.509050 bioRxiv
Show abstract

The cyanobacteriochrome GAF domains represent a trove of spectral diversity. These proteins are endemic to cyanobacteria and sense the color and power of light. Multiple mechanisms are used to tune the natural absorbance spectrum of the bound bilin chromophore. In practice, these are difficult to identify from the predicted amino acid sequence. Their individual presence rarely yields a consistent and predictable outcome. The absorbance characteristics of the GAF domain are a complex function of many such tuning mechanisms. This implies that a more combinatoric approach to characterizing the diversity of GAF domains would better to predict spectral tunes. We reviewed the literature and constructed a dataset of predicted/confirmed cyanobacteriochrome GAF domains. This dataset was subjected to multiple sequence alignments and 18 GAF domain families were defined. The amino acid sequence similarity correlated well with known spectral characteristics but there were exceptions. A second approach to predict chromotype involved using Principal Component Analysis to characterize the whole domain architectures of cyanobacteriochrome. This approach identified 7 conserved domain architectures, with some variations. These also offered a correlation to the spectral tune of the GAF domains therein, in addition to the 18 GAF families. The three-dimensional structures of 98 spectrally characterized GAF domains were predicted using Phyre2. Subsequent grouping based on distance maps offered an insight into how the general spectral position of the domain is set. Finer tuning is likely to be achieved by means of six key residues within the binding pocket. Taken together, these insights allowed us to carry out a Multiple Correlation Analysis serving as a mathematical summary of the diversity of cyanobacteriochrome GAF domains. This summary or "cyanobacteriochrome atlas" can be used to make spectral predictions on uncharacterized GAF domains.

Matching journals

The top 12 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 18%
10.3%
2
Scientific Reports
3102 papers in training set
Top 13%
6.9%
3
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.1%
4.9%
4
Access Microbiology
22 papers in training set
Top 0.1%
4.9%
5
Algal Research
20 papers in training set
Top 0.1%
4.9%
6
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
3.7%
7
Plant Direct
81 papers in training set
Top 0.7%
3.1%
8
PLOS Computational Biology
1633 papers in training set
Top 11%
2.9%
9
Biochemical Journal
80 papers in training set
Top 0.1%
2.4%
10
New Phytologist
309 papers in training set
Top 3%
2.1%
11
Frontiers in Microbiology
375 papers in training set
Top 4%
2.1%
12
Journal of Biological Chemistry
641 papers in training set
Top 1%
1.9%
50% of probability mass above
13
Photosynthesis Research
15 papers in training set
Top 0.1%
1.9%
14
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.2%
1.8%
15
Protein Science
221 papers in training set
Top 0.9%
1.7%
16
BMC Genomics
328 papers in training set
Top 3%
1.5%
17
Microbiology
57 papers in training set
Top 0.7%
1.4%
18
ACS Omega
90 papers in training set
Top 2%
1.4%
19
Open Biology
95 papers in training set
Top 0.9%
1.4%
20
PeerJ
261 papers in training set
Top 10%
1.2%
21
Communications Biology
886 papers in training set
Top 14%
1.2%
22
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
1.2%
23
International Journal of Biological Macromolecules
65 papers in training set
Top 2%
1.1%
24
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
25
Biochemistry
130 papers in training set
Top 2%
0.8%
26
Physiologia Plantarum
35 papers in training set
Top 0.4%
0.8%
27
Royal Society Open Science
193 papers in training set
Top 4%
0.8%
28
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.8%
29
Biomolecules
95 papers in training set
Top 2%
0.8%
30
Peer Community Journal
254 papers in training set
Top 4%
0.7%