Back

MDZip: Neural Compression of Molecular Dynamics Trajectories for Scalable Storage and Ensemble Reconstruction

De Silva, N.; perez, a.

2025-08-01 biophysics
10.1101/2025.07.31.667955 bioRxiv
Show abstract

The size of molecular dynamics (MD) trajectories remains a major obstacle for data sharing, long-term storage, and ensemble analysis at scale. Existing solutions often rely on frame subsampling or reduced atom representations, which limit the utility of shared datasets. Here, we present MDZip, a neural compression framework based on convolutional autoencoders trained per system to reconstruct atomic trajectories with high geometric fidelity from compact latent representations. MDZip achieves over 95% reduction in storage size across a diverse benchmark of proteins, protein-peptide complexes, and nucleic acids. Despite operating in a physics-agnostic manner, the reconstructed trajectories accurately preserve ensemble-level features, including RMSD fluctuations, pairwise distance distributions, radius of gyration, and projections onto principal and time-lagged independent components. A residual (skip-connected) autoencoder variant consistently improves reconstruction accuracy and reduces outliers. While local structural deviations can impair energetic fidelity, short energy minimization partially recovers physically reasonable conformations. This framework enables customizable compression-accuracy trade-offs and supports a modular workflow for sharing latent representations, decoder models, and reconstruction protocols. MDZip offers a scalable solution to current storage limitations, facilitating broader dissemination of MD data without sacrificing essential dynamical information.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.1%
33.3%
2
Nature Computational Science
50 papers in training set
Top 0.1%
12.6%
3
Nature Biotechnology
147 papers in training set
Top 1%
6.9%
50% of probability mass above
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 11%
6.4%
5
Nature Communications
4913 papers in training set
Top 32%
4.9%
6
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
2.6%
7
Nucleic Acids Research
1128 papers in training set
Top 9%
1.9%
8
Science
429 papers in training set
Top 14%
1.7%
9
Cell Systems
167 papers in training set
Top 7%
1.7%
10
IUCrJ
29 papers in training set
Top 0.2%
1.5%
11
eLife
5422 papers in training set
Top 47%
1.3%
12
Bioinformatics
1061 papers in training set
Top 8%
1.3%
13
PLOS ONE
4510 papers in training set
Top 60%
1.2%
14
Structure
175 papers in training set
Top 2%
1.2%
15
Journal of Structural Biology
58 papers in training set
Top 1%
1.1%
16
Scientific Reports
3102 papers in training set
Top 69%
1.0%
17
Nature
575 papers in training set
Top 14%
0.9%
18
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
19
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
20
ACS Nano
99 papers in training set
Top 4%
0.6%
21
Protein Science
221 papers in training set
Top 2%
0.6%
22
Bioinformatics Advances
184 papers in training set
Top 5%
0.6%
23
Biophysical Journal
545 papers in training set
Top 6%
0.6%
24
Communications Biology
886 papers in training set
Top 28%
0.6%
25
Patterns
70 papers in training set
Top 3%
0.5%
26
iScience
1063 papers in training set
Top 40%
0.5%