Back

Response to "Protein sequence landscapes are not so simple: on reference-free versus reference-based inference"

Park, Y.; Metzger, B. P. H.; Thornton, J. W.

2024-09-20 genetics
10.1101/2024.09.17.613512 bioRxiv
Show abstract

We recently reanalyzed 20 combinatorial mutagenesis datasets using a novel reference-free analysis (RFA) method and showed that high-order epistasis contributes negligibly to protein sequence-function relationships in every case. Dupic, Phillips, and Desai (DPD) commented on a preprint of our work. In our published paper, we addressed all the major issues they raised, but we respond directly to them here. 1) DPDs claim that RFA is equivalent to estimating reference-based analysis (RBA) models by regression neglects fundamental differences in how the two formalisms dissect the causal relationship between sequence and function. It also misinterprets the observation that using regression to estimate any truncated model of genetic architecture will always yield the same predicted phenotypes and variance partition; the resulting estimates correspond to those of the RFA formalism but are inaccurate representations of the true RBA model. 2) DPDs claim that high-order epistasis is widespread and significant while somehow explaining little phenotypic variance is an artifact of two strong biases in the use of regression to estimate RBA models: this procedure underestimates the phenotypic variance explained by RBA epistatic terms while at the same time inflating the magnitude of individual terms. 3) DPD erroneously claim that RFA is "exactly equivalent" to Fourier analysis (FA) and background-averaged analysis (BA). This error arises because DPD used an incorrect mathematical definition of RFA and were misled by a simple numerical relationship among the models that only holds only for the simplest kinds of datasets. 4) DPD argue that using a nonlinear transformation to account for global nonlinearities in sequence-function relationships is often unnecessary and may artifactually absorb specific epistatic interactions. We show that nonspecific epistasis caused by a limited dynamic range affects datasets of all types, even when the phenotype is represented on a free-energy scale. Moreover, using a nonlinear transformation in a joint fitting procedure does not underestimate specific epistasis under realistic conditions, even if the data are not affected by nonspecific epistasis. The conclusions of our work therefore hold: the genetic architecture of all 20 protein datasets we analyzed can be efficiently and accurately described in an RFA framework by first-order amino acid effects and pairwise interactions with a simple model of global nonlinearity. We are grateful for DPDs commentary, which helped us improve our paper.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
GENETICS
189 papers in training set
Top 0.1%
28.7%
2
The American Journal of Human Genetics
206 papers in training set
Top 0.6%
7.1%
3
Journal of Molecular Biology
217 papers in training set
Top 0.2%
6.6%
4
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.1%
5.0%
5
eLife
5422 papers in training set
Top 16%
5.0%
50% of probability mass above
6
Genetics
225 papers in training set
Top 0.9%
4.5%
7
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
8
Bioinformatics Advances
184 papers in training set
Top 1%
3.7%
9
Bioinformatics
1061 papers in training set
Top 6%
2.2%
10
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.4%
1.9%
11
Physical Biology
43 papers in training set
Top 1%
1.8%
12
Physical Review Letters
43 papers in training set
Top 0.3%
1.8%
13
Nature Communications
4913 papers in training set
Top 50%
1.8%
14
Molecular Systems Biology
142 papers in training set
Top 0.6%
1.7%
15
PLOS ONE
4510 papers in training set
Top 56%
1.5%
16
The Journal of Physical Chemistry B
158 papers in training set
Top 1%
1.5%
17
Protein Science
221 papers in training set
Top 1%
1.0%
18
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 39%
1.0%
19
Cell Systems
167 papers in training set
Top 10%
0.9%
20
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
21
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.8%
22
Nature
575 papers in training set
Top 15%
0.8%
23
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.8%
24
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.7%
25
Biophysical Journal
545 papers in training set
Top 5%
0.7%
26
iScience
1063 papers in training set
Top 32%
0.7%
27
Journal of Applied Crystallography
14 papers in training set
Top 0.1%
0.7%
28
Nucleic Acids Research
1128 papers in training set
Top 19%
0.7%
29
Frontiers in Bioengineering and Biotechnology
88 papers in training set
Top 3%
0.7%
30
Scientific Reports
3102 papers in training set
Top 78%
0.7%