Back

Multiscale conformational sampling of multidomain fusion proteins by a physics informed diffusion model

Su, Z.; Wang, B.; Wu, Y.

2026-03-13 bioinformatics
10.64898/2026.03.11.711061 bioRxiv
Show abstract

Multidomain fusion proteins, such as bispecific antibodies, rely on highly flexible linker regions for their therapeutic efficacy. Characterizing these vast conformational ensembles is crucial for rational drug design; however, while all-atom molecular dynamics (MD) is the traditional gold standard, its immense computational cost makes simulating large-scale domain motions prohibitive. Recently, deep generative diffusion models have emerged as a rapid alternative for sampling protein dynamics. Yet, being trained primarily on massive databases of structured, static domains, these generic models often lack the biophysical constraints required to thoroughly sample the large-scale dynamics of highly flexible multidomain architectures. To overcome this, we leverage microsecond MD trajectories of a multidomain protein construct with various linkers to train a multiscale diffusion framework utilizing an Equivariant Graph Neural Network (EGNN). To efficiently model the dynamics of the large molecular complexes, we employ a coarse-grained spatial graph that condenses rigid domains into center-of-mass anchors while preserving explicit backbone resolution for the flexible linker. By further integrating foundational rules in biophysics directly into both the training objective and the inference process, our model generates high-fidelity conformational ensembles that reproduce the thermodynamic distributions of long-timescale MD. This physics-informed approach provides a mathematically stable, highly scalable platform for the rapid multiscale characterization of flexible biologics, significantly accelerating the rational design of fusion protein therapeutics.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 20%
9.3%
2
Cell Systems
167 papers in training set
Top 2%
6.5%
3
Nature Biotechnology
147 papers in training set
Top 2%
4.9%
4
Nature Machine Intelligence
61 papers in training set
Top 0.5%
4.9%
5
Nature Computational Science
50 papers in training set
Top 0.1%
4.9%
6
Nano Letters
63 papers in training set
Top 0.6%
4.4%
7
Bioinformatics
1061 papers in training set
Top 5%
4.0%
8
Advanced Science
249 papers in training set
Top 4%
4.0%
9
eLife
5422 papers in training set
Top 25%
3.6%
10
Nature Methods
336 papers in training set
Top 3%
3.6%
50% of probability mass above
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 19%
3.6%
12
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.1%
13
Nucleic Acids Research
1128 papers in training set
Top 7%
2.6%
14
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
2.1%
15
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
16
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
17
Communications Biology
886 papers in training set
Top 6%
1.9%
18
Patterns
70 papers in training set
Top 0.7%
1.8%
19
Science Advances
1098 papers in training set
Top 17%
1.7%
20
Biophysical Journal
545 papers in training set
Top 3%
1.5%
21
Cell Reports
1338 papers in training set
Top 27%
1.3%
22
iScience
1063 papers in training set
Top 24%
1.0%
23
The Journal of Physical Chemistry Letters
58 papers in training set
Top 1%
1.0%
24
Science
429 papers in training set
Top 18%
0.9%
25
Scientific Reports
3102 papers in training set
Top 72%
0.8%
26
Communications Chemistry
39 papers in training set
Top 0.9%
0.8%
27
Genome Research
409 papers in training set
Top 4%
0.8%
28
PRX Life
34 papers in training set
Top 0.9%
0.8%
29
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.8%
30
National Science Review
22 papers in training set
Top 2%
0.7%