Back

Decoupling Structure and Property in Discrete Protein Diffusion: Alignment Dynamics and Collapse Mechanisms

Wu, J.; Dong, L.; Jia, N.; Li, L.; Zhang, H.

2026-02-08 bioinformatics
10.64898/2026.02.05.701726 bioRxiv
Show abstract

Direct Preference Optimization (DPO) has emerged as a powerful paradigm for aligning generative models, yet its temporal optimization dynamics in the discrete diffusion space of proteins remain poorly understood. Existing approaches often assume that maintaining structural integrity while optimizing physicochemical properties requires simultaneous, tightly coupled reinforcement learning constraints. In this work, we challenge this assumption by uncovering a fundamental temporal decoupling between structural and functional alignment. Using antibody design as a testbed, extensive trajectory analysis reveals two distinct regimes: (1) Instant Structural Alignment, where the strong generative prior of discrete diffusion rapidly eliminates structural hallucinations via denoising within the first few epochs; and (2) Slow Property Adaptation, where physicochemical attributes improve gradually over a prolonged optimization window. We further identify a critical transition point around Epoch 50, which empirically defines a Pareto-optimal boundary between property improvement and structural stability. Beyond this point, continued optimization induces a sharp phase transition into a Structural Collapse regime. To isolate the physical driver underlying this collapse, we introduce a counterfactual preference experiment targeting negative charge. We observe a striking symmetrical collapse: while hydrophilicity optimization induces a Poly-Arginine (+) degeneration, negative charge optimization drives a Poly-Aspartate (-) degeneration. Despite opposite physicochemical trajectories, including extreme shifts in isoelectric point (> 11 vs. < 4.5), both regimes converge to the same structural failure. This symmetry demonstrates that generic Coulombic repulsion, rather than residue-specific bias, constitutes the fundamental physical constraint being violated. Our findings reveal that discrete diffusion models possess strong intrinsic structural robustness, enabling minimalist alignment strategies provided optimization halts before this physical boundary. More broadly, this work offers a mechanistic warning against unchecked reward optimization in biological generation, illustrating a concrete manifestation of Goodharts Law in protein design. Code and data are available at https://github.com/Wu-Junqi/DPO-Protein-Diffusion.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.1%
43.5%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 10%
6.7%
50% of probability mass above
3
Nature Communications
4913 papers in training set
Top 27%
6.7%
4
Nature Neuroscience
216 papers in training set
Top 1%
6.6%
5
Nature Machine Intelligence
61 papers in training set
Top 0.8%
3.8%
6
PLOS Computational Biology
1633 papers in training set
Top 9%
3.8%
7
Nature Biotechnology
147 papers in training set
Top 4%
1.8%
8
eLife
5422 papers in training set
Top 44%
1.6%
9
Science
429 papers in training set
Top 15%
1.6%
10
Physical Review X
23 papers in training set
Top 0.3%
1.4%
11
Scientific Reports
3102 papers in training set
Top 69%
1.0%
12
Neuron
282 papers in training set
Top 7%
1.0%
13
Science Advances
1098 papers in training set
Top 24%
1.0%
14
Cell Reports
1338 papers in training set
Top 30%
0.9%
15
Nature Computational Science
50 papers in training set
Top 1%
0.9%
16
Cell Genomics
162 papers in training set
Top 6%
0.8%
17
PRX Life
34 papers in training set
Top 0.8%
0.8%
18
Nature Methods
336 papers in training set
Top 6%
0.8%
19
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.8%
20
Physical Review Research
46 papers in training set
Top 0.8%
0.8%
21
Nature
575 papers in training set
Top 15%
0.8%
22
Bioinformatics
1061 papers in training set
Top 9%
0.8%
23
Advanced Science
249 papers in training set
Top 21%
0.7%
24
Biophysical Journal
545 papers in training set
Top 6%
0.7%
25
iScience
1063 papers in training set
Top 39%
0.5%