Back

Toward a probabilistic definition of chromatin accessible regions at the single-cell level

Sanchez-Escabias, E.; Rico, D.; Reyes, J. C.

2026-05-04 genomics
10.64898/2026.05.01.722232 bioRxiv
Show abstract

Understanding cis-regulatory elements (CREs) at the single cell level is fundamental to deciphering transcriptional changes during development, cell differentiation, and homeostasis. Recent studies have shown that arbitrary peak-calling thresholds complicate data interpretation and cross-study comparisons. Furthermore, due to the inherent sparsity of single-nuclei ATAC-seq (snATAC-seq) data, distinguishing between truly inaccessible regions and technical dropouts remains challenging. Our analysis of snATAC-seq experiments performed in a well-established cell line suggests that the dichotomy between accessible (open) or inaccessible (close) CREs is misleading. Thousands of accessible regions are present in a very small fraction of cells of the population but they are repeatedly identified, suggesting that they have a low accessibility or are only transiently accessible. However, depending on the detection threshold selected they could be considered as either genuine CREs or noise. To resolve this inconsistency, we propose a model where chromatin accessibility is treated as a continuum, defined by a probability of accessibility (Pa) for each accessible region across cell types and conditions. Through computational simulations, we demonstrate that snATAC-seq results can be explained by a simple "balls into bins" probability model, offering a theoretical framework for calculating Pa distributions from any snATAC-seq dataset. Furthermore, we examine how Pa distributions shift following activation of the TGF{beta} signaling pathway. This probabilistic approach removes the reliance on arbitrary thresholds and supports a more quantitative, and dynamic understanding of accessible regions function.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genetics
225 papers in training set
Top 0.4%
12.1%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
12.1%
3
Bioinformatics
1061 papers in training set
Top 4%
6.7%
4
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.4%
6.7%
5
PLOS Genetics
756 papers in training set
Top 2%
6.7%
6
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.2%
6.3%
50% of probability mass above
7
Nucleic Acids Research
1128 papers in training set
Top 3%
6.2%
8
Scientific Reports
3102 papers in training set
Top 25%
4.8%
9
PLOS ONE
4510 papers in training set
Top 36%
3.9%
10
Genome Research
409 papers in training set
Top 1%
3.5%
11
eLife
5422 papers in training set
Top 36%
2.0%
12
Frontiers in Genetics
197 papers in training set
Top 4%
2.0%
13
Genome Biology
555 papers in training set
Top 4%
1.9%
14
G3 Genes|Genomes|Genetics
351 papers in training set
Top 1%
1.9%
15
Cell Reports
1338 papers in training set
Top 29%
1.2%
16
Cell Systems
167 papers in training set
Top 10%
1.1%
17
Chromosoma
10 papers in training set
Top 0.1%
0.9%
18
Epigenetics & Chromatin
42 papers in training set
Top 0.2%
0.9%
19
GENETICS
189 papers in training set
Top 1%
0.9%
20
Nucleus
11 papers in training set
Top 0.1%
0.9%
21
Nature Communications
4913 papers in training set
Top 60%
0.9%
22
BMC Genomics
328 papers in training set
Top 5%
0.8%
23
iScience
1063 papers in training set
Top 30%
0.8%
24
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
25
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 9%
0.7%
26
Communications Biology
886 papers in training set
Top 25%
0.7%