Protein Language Modeling and Evolutionary Analysis Reveal an N-terminal Determinant of Functional Divergence in Cytochrome P450s from Sophora. tonkinensis
Qiao, Z.; Wang, J.; Qin, B.; Wei, F.; Liang, Y.
Show abstract
O_LIThe N-terminal signal sequences of plant cytochrome P450 enzymes are recognized as critical determinants for subcellular localization and functional diversification, yet their evolutionary drivers and mechanisms remain largely unresolved. C_LIO_LIIn this study, the evolutionary trajectories of these signals were systematically decoded through the integration of the protein language model ESM-2 with phylogenetic and selection analyses. A conserved functional fingerprint was identified. This region may serve as the essential endoplasmic reticulum targeting signal and be evolutionarily decoupled from adjacent surfaces under positive selection during lineage-specific expansions. C_LIO_LIA functional-adaptive decoupling model is proposed to explain this pattern, wherein a conserved functional core is maintained while surrounding interfaces diversify. This evolutionary architecture is interpreted as the outcome of a two-step cycle: an initial phase of positive selection driving functional innovation, followed by pervasive neutral evolution that facilitates structural exploration and potentiates future adaptations. C_LIO_LIThis work demonstrates how interpretable machine learning can be integrated with evolutionary theory to reconcile neutralist and selectionist perspectives on protein evolution. A novel framework is thus provided for understanding the layered evolution of protein modules, where structural constraint, adaptive innovation, and neutral drift operate on distinct tiers to generate functional diversity. C_LI
Matching journals
The top 7 journals account for 50% of the predicted probability mass.