Back

Scalable prediction of symmetric protein complex structures

Yu, V. S.; Demsko, P.; Castells-Graells, R.; Parker, H.; Huang, A.; Chen, C.; Huang, M.; Srinivasan, V.; Ajjarapu, K.; Tofighbakhsh, N.; Yu, R.; Lake, M.; Glanzman, D.; Warren, S.; Alzagatiti, J.

2026-02-05 bioengineering
10.1101/2025.11.14.688531 bioRxiv
Show abstract

All life relies on proteins to function, yet accurately modeling protein structures that exceed {approx} 10, 000 amino acids or have higher-order geometries remains difficult. Existing solutions are limited to specific scenarios, require considerable computational resources, or are otherwise unscalable. Consequently, many large, disease-relevant protein complexes in the human proteome, as well as nearly all viruses and numerous other classes, are impractical to model with high fidelity for drug development. To modulate these protein complexes and viruses, structural information is eminently valuable, and often essential. In the last two years, machine learning based-tools that can generate binders to a given target structure with high hit rates have emerged. Combined with high-throughput screening, these technologies can far outpace traditional drug discovery. However, they cannot function well without accurate models of their target structures. Thus, to unlock the full power of AI-driven drug discovery, a scalable method must be developed to predict large protein complex structures. To overcome this bottleneck, we introduce Plica-1, a physics-based method to rapidly and accurately predict the structure of arbitrarily large, symmetric protein complexes. Validated across 4 major symmetry classes (icosahedral, tetrahedral, octahedral, and cyclic), the method consistently achieves near-experimental levels of accuracy, i.e., RMSD < 5[A]. In test cases, the method runs in < 5 minutes on consumer hardware, 103-105 times faster than the closest comparable software. The largest structure currently built, at {approx}40,000 amino acids, is > 8 times the limit of existing machine learning methods. The results demonstrate that protein complexes can be modeled at significantly improved speeds and scales, making Plica-1 a promising tool for protein engineering and drug development.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.3%
22.1%
2
Cell Systems
167 papers in training set
Top 0.5%
17.2%
3
Science
429 papers in training set
Top 5%
6.2%
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 15%
4.8%
50% of probability mass above
5
Nature Communications
4913 papers in training set
Top 41%
3.5%
6
Bioinformatics
1061 papers in training set
Top 6%
3.5%
7
Protein Engineering, Design and Selection
14 papers in training set
Top 0.1%
2.7%
8
Nature Biotechnology
147 papers in training set
Top 3%
2.7%
9
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
10
eLife
5422 papers in training set
Top 36%
2.0%
11
Structure
175 papers in training set
Top 1%
2.0%
12
Protein Science
221 papers in training set
Top 0.7%
1.9%
13
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.9%
14
Scientific Reports
3102 papers in training set
Top 54%
1.9%
15
Communications Biology
886 papers in training set
Top 11%
1.5%
16
Chemical Science
71 papers in training set
Top 1%
1.5%
17
Advanced Science
249 papers in training set
Top 13%
1.3%
18
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
1.1%
19
Nature Computational Science
50 papers in training set
Top 1%
0.9%
20
ACS Synthetic Biology
256 papers in training set
Top 3%
0.9%
21
Science Advances
1098 papers in training set
Top 27%
0.9%
22
Journal of Molecular Biology
217 papers in training set
Top 3%
0.9%
23
Journal of Cheminformatics
25 papers in training set
Top 0.5%
0.9%
24
PLOS ONE
4510 papers in training set
Top 65%
0.9%
25
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
26
Nature
575 papers in training set
Top 15%
0.8%
27
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.9%
0.8%
28
iScience
1063 papers in training set
Top 33%
0.7%
29
Angewandte Chemie International Edition
81 papers in training set
Top 3%
0.7%
30
Cell Reports
1338 papers in training set
Top 34%
0.7%