Back

Correcting Preprocessing Bias in Sparse Chromatin Contact Data Enables Physically Interpretable Reconstruction of Genome Architecture

Sys, S.; Misak, M.; Soliman, A.; Herrera-Rodriguez, R.; Lambuta, R.-A.; Weissbach, S.; Everschor, K.; Schweiger, S.; Michels, J.; Padeken, J.; Gerber, S.

2026-04-02 biophysics
10.64898/2026.03.31.715622 bioRxiv
Show abstract

DNA is the largest biopolymer in nature, and chromatin contact maps are widely interpreted as quantitative readouts of its three-dimensional organization. However, the validity of such interpretations critically depends on how these maps are processed. Here, we identify a previously overlooked but fundamental source of bias in chromatin contact data analysis. We demonstrate that a widely adopted preprocessing convention, namely whole-matrix percentile clipping, systematically distorts sparse contact maps by collapsing their dynamic range. This effect is strongest in near-diagonal interactions, precisely the regime encoding chromatin domains and looping structures, thereby compromising quantitative interpretation while preserving superficial structural features. We show that this distortion represents a sparsity-dependent failure mode of current preprocessing standards and affects the comparability of datasets and computational methods across technologies and sequencing depths. To address this, we introduce a statistically consistent preprocessing framework based on nonzero-percentile clipping and log-space normalization, which preserves the intrinsic dynamic range of observed contacts. Building on this foundation, we present CCUT, a modular deep learning framework for chromatin contact map reconstruction. Under corrected preprocessing, reconstructed maps recover domain organization, contact decay, and scaling behavior consistent with polymer physics. Importantly, we demonstrate quantitative agreement between reconstructed maps and simulated contact patterns derived from a kinetic Monte Carlo loop extrusion model, enabling direct comparison between experimental data and physical models. Together, our results establish preprocessing as a decisive determinant of the physical interpretability of chromatin contact maps and provide a principled framework for robust and comparable analysis across chromatin conformation capture technologies.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.8%
12.0%
2
Nature Communications
4913 papers in training set
Top 20%
9.8%
3
Nucleic Acids Research
1128 papers in training set
Top 2%
9.8%
4
Nature Biotechnology
147 papers in training set
Top 1.0%
8.2%
5
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 13%
6.1%
6
ACS Nano
99 papers in training set
Top 1.0%
3.8%
7
PLOS Computational Biology
1633 papers in training set
Top 10%
3.5%
50% of probability mass above
8
Cell Systems
167 papers in training set
Top 4%
3.2%
9
Physical Review X
23 papers in training set
Top 0.1%
3.2%
10
Nature
575 papers in training set
Top 8%
3.0%
11
Biophysical Journal
545 papers in training set
Top 2%
2.5%
12
Science Advances
1098 papers in training set
Top 13%
2.0%
13
Advanced Science
249 papers in training set
Top 12%
1.6%
14
Nature Computational Science
50 papers in training set
Top 0.7%
1.6%
15
Genome Biology
555 papers in training set
Top 5%
1.4%
16
eLife
5422 papers in training set
Top 46%
1.4%
17
Science
429 papers in training set
Top 16%
1.3%
18
Cell Reports
1338 papers in training set
Top 27%
1.3%
19
Nano Letters
63 papers in training set
Top 2%
1.3%
20
Cell
370 papers in training set
Top 13%
1.3%
21
Scientific Reports
3102 papers in training set
Top 67%
1.2%
22
Molecular Cell
308 papers in training set
Top 8%
1.2%
23
ACS Photonics
13 papers in training set
Top 0.3%
1.1%
24
Nature Structural & Molecular Biology
218 papers in training set
Top 4%
0.9%
25
Nature Physics
39 papers in training set
Top 0.9%
0.9%
26
Biophysical Reports
36 papers in training set
Top 0.5%
0.7%
27
Nature Materials
21 papers in training set
Top 1%
0.7%
28
Cell Reports Methods
141 papers in training set
Top 6%
0.6%
29
Journal of the American Chemical Society
199 papers in training set
Top 6%
0.6%
30
Communications Biology
886 papers in training set
Top 30%
0.6%