Back

Y2H-SCORES: A statistical framework to infer protein-protein interactions from next-generation yeast-two-hybrid sequence data

Velasquez-Zapata, V.; Elmore, J. M.; Banerjee, S.; Dorman, K. S.; Wise, R. P.

2020-09-09 systems biology
10.1101/2020.09.08.288365 bioRxiv
Show abstract

Interactomes embody one of the most effective representations of cellular behavior by revealing function through protein associations. In order to build these models at the organism scale, high-throughput techniques are required to identify interacting pairs of proteins. Next-generation interaction screening (NGIS) protocols that combine yeast two-hybrid (Y2H) with deep sequencing are promising approaches to generate protein-protein interaction networks in any organism. However, challenges remain to mining reliable information from these screens and thus, limit its broader implementation. Here, we describe a statistical framework, designated Y2H-SCORES, for analyzing high-throughput Y2H screens that considers key aspects of experimental design, normalization, and controls. Three quantitative ranking scores were implemented to identify interacting partners, comprising: 1) significant enrichment under selection for positive interactions, 2) degree of interaction specificity among multi-bait comparisons, and 3) selection of in-frame interactors. Using simulation and an empirical dataset, we provide a quantitative assessment to predict interacting partners under a wide range of experimental scenarios, facilitating independent confirmation by one-to-one bait-prey tests. Simulation of Y2H-NGIS identified conditions that maximize detection of true interactors, which can be achieved with protocols such as prey library normalization, maintenance of larger culture volumes and replication of experimental treatments. Y2H-SCORES can be implemented in different yeast-based interaction screenings, accelerating the biological interpretation of experimental results. Proof-of-concept was demonstrated by discovery and validation of a novel interaction between the barley powdery mildew effector, AVRA13, with the vesicle-mediated thylakoid membrane biogenesis protein, HvTHF1. Author SummaryOrganisms respond to their environment through networks of interacting proteins and other biomolecules. In order to investigate these interacting proteins, many in vitro and in vivo techniques have been used. Among these, yeast two-hybrid (Y2H) has been integrated with next generation sequencing (NGS) to approach protein-protein interactions on a genome-wide scale. The fusion of these two methods has been termed next-generation-interaction screening, abbreviated as Y2H-NGIS. However, the massive and diverse data sets resulting from this technology have presented unique challenges to analysis. To address these challenges, we optimized the computational and statistical evaluation of Y2H-NGIS to provide metrics to identify high-confidence interacting proteins under a variety of dataset scenarios. Our proposed framework can be extended to different yeast-based interaction settings, utilizing the general principles of enrichment, specificity, and in-frame prey selection to accurately assemble protein-protein interaction networks. Lastly, we showed how the pipeline works experimentally, by identifying and validating a novel interaction between the barley powdery mildew effector AVRA13 and the barley vesicle-mediated thylakoid membrane biogenesis protein, HvTHF1. Y2H-SCORES software is available at GitHub repository https://github.com/Wiselab2/Y2H-SCORES.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics Advances
184 papers in training set
Top 0.1%
28.3%
2
Bioinformatics
1061 papers in training set
Top 3%
8.6%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.5%
6.5%
4
Frontiers in Plant Science
240 papers in training set
Top 2%
5.0%
5
PLOS Computational Biology
1633 papers in training set
Top 8%
4.4%
50% of probability mass above
6
BMC Bioinformatics
383 papers in training set
Top 2%
4.4%
7
Scientific Reports
3102 papers in training set
Top 27%
4.4%
8
Journal of Proteome Research
215 papers in training set
Top 0.6%
4.3%
9
PLOS ONE
4510 papers in training set
Top 45%
2.7%
10
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
1.9%
11
Cell Reports Methods
141 papers in training set
Top 2%
1.8%
12
iScience
1063 papers in training set
Top 13%
1.8%
13
International Journal of Molecular Sciences
453 papers in training set
Top 8%
1.7%
14
npj Systems Biology and Applications
99 papers in training set
Top 2%
1.0%
15
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
16
Plant Communications
35 papers in training set
Top 1%
0.8%
17
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.8%
18
Molecular & Cellular Proteomics
158 papers in training set
Top 2%
0.7%
19
Protein Science
221 papers in training set
Top 2%
0.7%
20
Frontiers in Molecular Biosciences
100 papers in training set
Top 5%
0.7%
21
Journal of Molecular Biology
217 papers in training set
Top 4%
0.7%
22
Cell Communication and Signaling
35 papers in training set
Top 1%
0.7%
23
Patterns
70 papers in training set
Top 3%
0.7%
24
Open Biology
95 papers in training set
Top 3%
0.7%
25
Molecular Omics
21 papers in training set
Top 0.5%
0.7%
26
The Plant Journal
197 papers in training set
Top 3%
0.7%
27
Plant Biotechnology Journal
56 papers in training set
Top 1%
0.7%
28
Communications Biology
886 papers in training set
Top 31%
0.5%
29
Biochimica et Biophysica Acta (BBA) - Bioenergetics
17 papers in training set
Top 0.3%
0.5%
30
Advanced Science
249 papers in training set
Top 23%
0.5%