Back

Information theoretics for the machine learning detection of functionally conserved and coordinated protein motions

Babbitt, G. A.

2020-05-31 biophysics
10.1101/2020.05.29.089003 bioRxiv
Show abstract

Traditional information theoretic analysis of functionally conserved binding interactions described by multiple sequence alignments are unable to provide direct insights into the underlying strength, spatial distribution, and coordination of the biophysical motions that govern protein binding interactions during signaling and regulatory function. However, molecular dynamic (MD) simulations of proteins in bound vs. unbound conformational states can allow for the combined application of machine learning classification and information theory towards many problems posed by comparative protein dynamics. After both bound and unbound protein dynamic states are adequately sampled in MD software, they can be employed as a comparative training set for a binary classifier capable of discerning the complex dynamical consequences of protein binding interactions with DNA or other proteins. The statistical validation of the learner on MD simulations of homologs can be used to assess its ability to recognize functional protein motions that are conserved over evolutionary time scales. Regions of proteins with functionally conserved dynamics are identifiable by their ability to induce significant correlations in local learning performance across homologous MD simulations. Through case studies of Rbp subunit 4/7 interaction in RNA Pol II and DNA-protein interactions of TATA binding protein, we demonstrate this method of detecting functionally conserved protein dynamics. We also demonstrate how the concepts of relative entropy (i.e. information gain) and mutual information applied to the binary classification states of MD simulations can be used to compare the impacts of molecular variation on conserved dynamics and to identify coordinated motions involved in dynamic interactions across sites.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.5%
23.3%
2
Physical Biology
43 papers in training set
Top 0.1%
19.2%
3
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.9%
5.0%
4
Frontiers in Molecular Biosciences
100 papers in training set
Top 0.1%
5.0%
50% of probability mass above
5
Scientific Reports
3102 papers in training set
Top 30%
4.1%
6
Physical Review E
95 papers in training set
Top 0.3%
3.4%
7
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.4%
3.0%
8
eLife
5422 papers in training set
Top 30%
2.8%
9
Biophysical Journal
545 papers in training set
Top 2%
2.7%
10
PLOS ONE
4510 papers in training set
Top 44%
2.7%
11
BMC Bioinformatics
383 papers in training set
Top 4%
2.1%
12
Journal of The Royal Society Interface
189 papers in training set
Top 2%
2.0%
13
Entropy
20 papers in training set
Top 0.1%
1.8%
14
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.4%
15
The Journal of Physical Chemistry B
158 papers in training set
Top 1%
1.4%
16
Bioinformatics
1061 papers in training set
Top 8%
1.0%
17
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.7%
0.9%
18
Journal of Molecular Biology
217 papers in training set
Top 3%
0.9%
19
The Journal of Chemical Physics
49 papers in training set
Top 0.4%
0.8%
20
Quantitative Biology
11 papers in training set
Top 0.7%
0.8%
21
PeerJ
261 papers in training set
Top 14%
0.8%
22
The European Physical Journal E
15 papers in training set
Top 0.1%
0.7%
23
Physical Review Research
46 papers in training set
Top 0.8%
0.7%
24
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
25
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.5%
26
Journal of Theoretical Biology
144 papers in training set
Top 2%
0.5%