Decoupling Structure and Property in Discrete Protein Diffusion: Alignment Dynamics and Collapse Mechanisms
Wu, J.; Dong, L.; Jia, N.; Li, L.; Zhang, H.
Show abstract
Direct Preference Optimization (DPO) has emerged as a powerful paradigm for aligning generative models, yet its temporal optimization dynamics in the discrete diffusion space of proteins remain poorly understood. Existing approaches often assume that maintaining structural integrity while optimizing physicochemical properties requires simultaneous, tightly coupled reinforcement learning constraints. In this work, we challenge this assumption by uncovering a fundamental temporal decoupling between structural and functional alignment. Using antibody design as a testbed, extensive trajectory analysis reveals two distinct regimes: (1) Instant Structural Alignment, where the strong generative prior of discrete diffusion rapidly eliminates structural hallucinations via denoising within the first few epochs; and (2) Slow Property Adaptation, where physicochemical attributes improve gradually over a prolonged optimization window. We further identify a critical transition point around Epoch 50, which empirically defines a Pareto-optimal boundary between property improvement and structural stability. Beyond this point, continued optimization induces a sharp phase transition into a Structural Collapse regime. To isolate the physical driver underlying this collapse, we introduce a counterfactual preference experiment targeting negative charge. We observe a striking symmetrical collapse: while hydrophilicity optimization induces a Poly-Arginine (+) degeneration, negative charge optimization drives a Poly-Aspartate (-) degeneration. Despite opposite physicochemical trajectories, including extreme shifts in isoelectric point (> 11 vs. < 4.5), both regimes converge to the same structural failure. This symmetry demonstrates that generic Coulombic repulsion, rather than residue-specific bias, constitutes the fundamental physical constraint being violated. Our findings reveal that discrete diffusion models possess strong intrinsic structural robustness, enabling minimalist alignment strategies provided optimization halts before this physical boundary. More broadly, this work offers a mechanistic warning against unchecked reward optimization in biological generation, illustrating a concrete manifestation of Goodharts Law in protein design. Code and data are available at https://github.com/Wu-Junqi/DPO-Protein-Diffusion.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.