Back

Rapidly evolving aphid gall effector proteins exhibit saposin-like folds

Bhoinderwala, F.; Korgaonkar, A.; Gopalakrishna, K.; Mathers, T. C.; Shigenobu, S.; Bazan, F. J.; Hogenhout, S. A.; Gronenborn, A.; Stern, D.

2026-03-28 molecular biology
10.64898/2026.03.27.712717 bioRxiv
Show abstract

Many insects manipulate plants by injecting effector proteins. In one extreme example of this molecular "hijacking", Hormaphis cornu aphids inject bicycle proteins into Hamamelis virginiana (Witch Hazel), contributing to the development of novel organs called galls. Bicycle proteins share no amino acid sequence similarity with proteins of known function. Here, we report the crystal structures of two divergent bicycle proteins. Both proteins contain saposin-like folds: one with multiple disulfide bonds exhibits a helix swap; the other has no disulfide bonds and possesses two tandem domains. To explore the structural evolution of bicycle proteins, we predicted bicycle protein structures with Alphafold2 (AF2). While AF2 did not recover the two experimental structures using existing databases, it succeeded after we provided multiple sequence alignments (MSAs) containing protein sequences encoded in new genome sequences from closely related aphid species. Using this customized approach at scale, we generated 2400 high-confidence predictions for bicycle proteins from seven aphid species. This dataset revealed that bicycle proteins without cysteines are outliers in fold space and appear to have evolved from ancestral proteins with disulfide-bonded saposin-like folds. While all bicycle proteins contain predicted saposin-like folds, they display a vast diversity of structural and physicochemical properties. While this diversity thwarts prediction of conserved functions encoded in structure, it suggests that bicycle proteins have evolved to target diverse plant processes and/or to evade plant immune surveillance. Significance statementParasites introduce specialized "effector" proteins into hosts, both to suppress host immunity and to release nutrients. The molecular functions and structures of most effector proteins are unknown. Effector proteins often evolve rapidly and share no similarity with proteins of known function. Here, we demonstrate that machine learning algorithms can accurately predict the structures of aphid "bicycle" effector proteins when supplemented with data from closely related species. We exploit this finding to generate predictions of 2400 bicycle protein structures. These proteins exploit a common motif, yet exhibit diverse structures that form distinct structural clusters. Despite the clustering of these proteins in structure space, they occupy a nearly uniformly physicochemical space, suggesting that they encode a large diversity of molecular functions.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cell
370 papers in training set
Top 0.2%
17.9%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 3%
13.8%
3
Science
429 papers in training set
Top 3%
9.7%
4
Nature
575 papers in training set
Top 4%
6.6%
5
eLife
5422 papers in training set
Top 15%
6.1%
50% of probability mass above
6
Cell Host & Microbe
113 papers in training set
Top 1%
4.7%
7
Nature Communications
4913 papers in training set
Top 34%
4.7%
8
PLOS Biology
408 papers in training set
Top 5%
2.6%
9
PLOS Computational Biology
1633 papers in training set
Top 12%
2.5%
10
Molecular Biology and Evolution
488 papers in training set
Top 2%
2.3%
11
Current Biology
596 papers in training set
Top 7%
2.3%
12
Nature Plants
84 papers in training set
Top 0.9%
2.0%
13
Structure
175 papers in training set
Top 1%
2.0%
14
Cell Reports
1338 papers in training set
Top 23%
1.8%
15
Developmental Cell
168 papers in training set
Top 8%
1.7%
16
Cell Genomics
162 papers in training set
Top 4%
1.6%
17
PLOS Pathogens
721 papers in training set
Top 6%
1.6%
18
Science Advances
1098 papers in training set
Top 19%
1.6%
19
Neuron
282 papers in training set
Top 6%
1.6%
20
The Plant Cell
141 papers in training set
Top 1%
1.6%
21
Cell Systems
167 papers in training set
Top 8%
1.4%
22
Nature Structural & Molecular Biology
218 papers in training set
Top 4%
1.2%
23
PLOS Genetics
756 papers in training set
Top 14%
0.8%
24
Communications Biology
886 papers in training set
Top 26%
0.7%
25
Nature Genetics
240 papers in training set
Top 9%
0.6%