Toward a probabilistic definition of chromatin accessible regions at the single-cell level

Sanchez-Escabias, E.; Rico, D.; Reyes, J. C.

2026-05-04 genomics

10.64898/2026.05.01.722232 bioRxiv

Show abstract

Understanding cis-regulatory elements (CREs) at the single cell level is fundamental to deciphering transcriptional changes during development, cell differentiation, and homeostasis. Recent studies have shown that arbitrary peak-calling thresholds complicate data interpretation and cross-study comparisons. Furthermore, due to the inherent sparsity of single-nuclei ATAC-seq (snATAC-seq) data, distinguishing between truly inaccessible regions and technical dropouts remains challenging. Our analysis of snATAC-seq experiments performed in a well-established cell line suggests that the dichotomy between accessible (open) or inaccessible (close) CREs is misleading. Thousands of accessible regions are present in a very small fraction of cells of the population but they are repeatedly identified, suggesting that they have a low accessibility or are only transiently accessible. However, depending on the detection threshold selected they could be considered as either genuine CREs or noise. To resolve this inconsistency, we propose a model where chromatin accessibility is treated as a continuum, defined by a probability of accessibility (Pa) for each accessible region across cell types and conditions. Through computational simulations, we demonstrate that snATAC-seq results can be explained by a simple "balls into bins" probability model, offering a theoretical framework for calculating Pa distributions from any snATAC-seq dataset. Furthermore, we examine how Pa distributions shift following activation of the TGF{beta} signaling pathway. This probabilistic approach removes the reliance on arbitrary thresholds and supports a more quantitative, and dynamic understanding of accessible regions function.

Toward a probabilistic definition of chromatin accessible regions at the single-cell level

Matching journals