Correcting Preprocessing Bias in Sparse Chromatin Contact Data Enables Physically Interpretable Reconstruction of Genome Architecture
Sys, S.; Misak, M.; Soliman, A.; Herrera-Rodriguez, R.; Lambuta, R.-A.; Weissbach, S.; Everschor, K.; Schweiger, S.; Michels, J.; Padeken, J.; Gerber, S.
Show abstract
DNA is the largest biopolymer in nature, and chromatin contact maps are widely interpreted as quantitative readouts of its three-dimensional organization. However, the validity of such interpretations critically depends on how these maps are processed. Here, we identify a previously overlooked but fundamental source of bias in chromatin contact data analysis. We demonstrate that a widely adopted preprocessing convention, namely whole-matrix percentile clipping, systematically distorts sparse contact maps by collapsing their dynamic range. This effect is strongest in near-diagonal interactions, precisely the regime encoding chromatin domains and looping structures, thereby compromising quantitative interpretation while preserving superficial structural features. We show that this distortion represents a sparsity-dependent failure mode of current preprocessing standards and affects the comparability of datasets and computational methods across technologies and sequencing depths. To address this, we introduce a statistically consistent preprocessing framework based on nonzero-percentile clipping and log-space normalization, which preserves the intrinsic dynamic range of observed contacts. Building on this foundation, we present CCUT, a modular deep learning framework for chromatin contact map reconstruction. Under corrected preprocessing, reconstructed maps recover domain organization, contact decay, and scaling behavior consistent with polymer physics. Importantly, we demonstrate quantitative agreement between reconstructed maps and simulated contact patterns derived from a kinetic Monte Carlo loop extrusion model, enabling direct comparison between experimental data and physical models. Together, our results establish preprocessing as a decisive determinant of the physical interpretability of chromatin contact maps and provide a principled framework for robust and comparable analysis across chromatin conformation capture technologies.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.