Back

A structural grammar of truncation across the human homodimer landscape

Karagöl, T.; Karagöl, A.

2026-05-09 bioinformatics
10.64898/2026.05.06.723091 bioRxiv
Show abstract

Alternative splicing and proteolytic truncation generate tens of thousands of protein isoforms in the human proteome, but the structural consequences for quaternary state, the level at which most signaling, enzymatic and regulatory function operates, have largely been examined one molecule at a time. Leveraging the recent expansion of the AlphaFold Database to predicted human homodimers, we systematically compared 5,168 canonical-versus-truncated homodimer pairs across the human proteome. In high-confidence canonical homodimers, truncation is associated with predicted structural conservation in 56.4% of pairs (mean 85 residues lost), complete interface ablation in 26.1% (mean 178 residues lost), and partial destabilization in 17.5% (mean 134 residues lost); a distinct fourth class (4.0% of the dataset, n = 208) shows truncation-associated emergence of a predicted high-confidence interface from a sub-threshold canonical baseline. Two reproducible rules govern these transitions: a topological asymmetry in which N-terminal losses are preferentially enriched [~]1.6-fold in interface preservation while C-terminal losses are rare overall ([~]6% of pairs) and modestly under-represented in the conservation class, and a biophysical rule in which emergence-class proteins show substantially elevated intrinsic disorder content relative to ablation-class proteins, as measured by both AlphaFold pLDDT-defined disorder of the canonical structure (Cohens d {approx} 1.39) and AIUPred peak binding propensity of the truncated isoform (Cohens d {approx} 0.65). Formal pathway enrichment recovered only a small nucleotide-metabolism signal, indicating that these rules operate across diverse gene-functional categories. Truncation-associated remodeling of homodimer architecture thus constitutes a structural grammar of the human proteome rather than a specialty of any single regulatory family.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.6%
14.6%
2
Nature Communications
4913 papers in training set
Top 10%
14.6%
3
Nature
575 papers in training set
Top 4%
8.2%
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 9%
7.1%
5
Nature Structural & Molecular Biology
218 papers in training set
Top 0.7%
6.3%
50% of probability mass above
6
Science
429 papers in training set
Top 6%
4.8%
7
Nature Genetics
240 papers in training set
Top 2%
3.9%
8
eLife
5422 papers in training set
Top 24%
3.6%
9
Molecular Cell
308 papers in training set
Top 5%
2.7%
10
Nature Methods
336 papers in training set
Top 3%
2.6%
11
Science Advances
1098 papers in training set
Top 12%
2.3%
12
Molecular Systems Biology
142 papers in training set
Top 0.4%
2.1%
13
Nature Biotechnology
147 papers in training set
Top 4%
2.1%
14
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
15
Cell
370 papers in training set
Top 11%
1.7%
16
Genome Biology
555 papers in training set
Top 4%
1.7%
17
Cell Genomics
162 papers in training set
Top 4%
1.5%
18
Journal of Molecular Biology
217 papers in training set
Top 2%
1.3%
19
Nucleic Acids Research
1128 papers in training set
Top 13%
1.3%
20
Advanced Science
249 papers in training set
Top 15%
1.1%
21
Nature Cell Biology
99 papers in training set
Top 4%
0.8%
22
Genome Medicine
154 papers in training set
Top 8%
0.8%
23
Scientific Reports
3102 papers in training set
Top 75%
0.7%
24
Genome Research
409 papers in training set
Top 5%
0.7%
25
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.6%