Back

Protein Language Modeling and Evolutionary Analysis Reveal an N-terminal Determinant of Functional Divergence in Cytochrome P450s from Sophora. tonkinensis

Qiao, Z.; Wang, J.; Qin, B.; Wei, F.; Liang, Y.

2026-03-07 plant biology
10.64898/2026.03.06.710024 bioRxiv
Show abstract

O_LIThe N-terminal signal sequences of plant cytochrome P450 enzymes are recognized as critical determinants for subcellular localization and functional diversification, yet their evolutionary drivers and mechanisms remain largely unresolved. C_LIO_LIIn this study, the evolutionary trajectories of these signals were systematically decoded through the integration of the protein language model ESM-2 with phylogenetic and selection analyses. A conserved functional fingerprint was identified. This region may serve as the essential endoplasmic reticulum targeting signal and be evolutionarily decoupled from adjacent surfaces under positive selection during lineage-specific expansions. C_LIO_LIA functional-adaptive decoupling model is proposed to explain this pattern, wherein a conserved functional core is maintained while surrounding interfaces diversify. This evolutionary architecture is interpreted as the outcome of a two-step cycle: an initial phase of positive selection driving functional innovation, followed by pervasive neutral evolution that facilitates structural exploration and potentiates future adaptations. C_LIO_LIThis work demonstrates how interpretable machine learning can be integrated with evolutionary theory to reconcile neutralist and selectionist perspectives on protein evolution. A novel framework is thus provided for understanding the layered evolution of protein modules, where structural constraint, adaptive innovation, and neutral drift operate on distinct tiers to generate functional diversity. C_LI

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 3%
9.8%
2
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.2%
8.9%
3
Nature Communications
4913 papers in training set
Top 24%
8.1%
4
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.8%
6.6%
5
eLife
5422 papers in training set
Top 12%
6.6%
6
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 10%
6.6%
7
The Journal of Physical Chemistry Letters
58 papers in training set
Top 0.2%
6.1%
50% of probability mass above
8
Physical Chemistry Chemical Physics
34 papers in training set
Top 0.1%
3.5%
9
Advanced Science
249 papers in training set
Top 6%
3.5%
10
National Science Review
22 papers in training set
Top 0.4%
3.5%
11
Journal of Molecular Biology
217 papers in training set
Top 0.9%
2.6%
12
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
2.5%
13
Communications Biology
886 papers in training set
Top 4%
2.5%
14
PLOS Biology
408 papers in training set
Top 10%
1.6%
15
Biophysical Journal
545 papers in training set
Top 3%
1.6%
16
Computational Biology and Chemistry
23 papers in training set
Top 0.2%
1.4%
17
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.6%
1.3%
18
Cell Reports
1338 papers in training set
Top 27%
1.3%
19
Molecular Biology and Evolution
488 papers in training set
Top 3%
1.3%
20
Science Advances
1098 papers in training set
Top 24%
1.2%
21
Journal of Genetics and Genomics
36 papers in training set
Top 2%
1.1%
22
Scientific Reports
3102 papers in training set
Top 72%
0.9%
23
Science China Life Sciences
26 papers in training set
Top 2%
0.8%
24
iScience
1063 papers in training set
Top 34%
0.7%
25
Angewandte Chemie International Edition
81 papers in training set
Top 4%
0.6%
26
Frontiers in Genetics
197 papers in training set
Top 12%
0.6%
27
International Journal of Molecular Sciences
453 papers in training set
Top 18%
0.6%
28
PLOS ONE
4510 papers in training set
Top 72%
0.6%