Back

A layered standards framework for integrating single-cell and spatial omics data into brain cell atlases

Ray, P. L.; Miller, J. A.; Jarecka, D.; Smith, K. A.; Baker, P. M.; Ng, L.; Martone, M. E.; Trivedi, P.; Abeysinghe, R.; Anderson, L.; Bandrowski, A. E.; Edyta, V.; Bhandiwad, A. A.; Chhetri, T. R.; Cui, L.; Giglio, M.; Goldy, J.; Hong, N.; Huang, H.; Huang, Y.; Hussain, Y.; Johansen, N.; Kenney, M.; Kruse, L.; Li, X.; Meldrim, J.; Mollenkopf, T.; Nadendla, S.; Osumi-Sutherland, D.; Sanchez, R.; Scheuermann, R. H.; Tao, S.; Vanderburg, C. R.; Yang, Y.; Ropelewski, A.; Mufti, S.; Lein, E.; Xu, H.; Zheng, W. J.; Ghosh, S. S.; White, O.; Hawrylycz, M.; Zhang, G.-Q.; Thompson, C. L.

2026-05-04 genomics
10.64898/2026.04.30.722039 bioRxiv
Show abstract

The BRAIN Initiative Cell Atlas Network (BICAN) is generating large-scale multimodal datasets to profile cell types in the human, non-human primate, and mouse brain. The diversity of single-cell and spatial transcriptomic and epigenomic assays, combined with varied experimental contexts, multiple data-generating laboratories and distributed infrastructure, poses substantial challenges for data integration and reuse in BICAN. To address this, we implemented a standards framework that enables layered integration of these data into knowledge-ready products for interoperable brain cell atlases. This framework organizes data based on three progressively structured layers. First, we introduced an assay-agnostic modeling layer that unifies the representation of single-cell and spatial omics data using a common set of biological entities and processes assessed by diverse experimental techniques. Second, we implemented harmonized metadata standards that capture key experimental features linked to biospecimen provenance across heterogeneous tissue sources, species, and preparations, supporting integration and validation while minimizing burden on data contributors. Third, we present an extensible representation for data-driven cell type taxonomies that integrates molecular data with annotations, ontology mappings, and evidence. Together, these contributions represent an end-to-end framework that transforms heterogeneous datasets into structured, interoperable resources that support broad community reuse via mapping algorithms, annotation systems, and visualization platforms. This approach links biospecimen provenance with cell-level outputs and embeds these in a standardized taxonomy format, enabling downstream applications such as cross-dataset integration, reference mapping, and knowledge-driven analysis. More broadly, our work demonstrates a generalizable strategy for enabling an efficient data-to-knowledge pipeline in a large-scale consortium setting.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Biotechnology
147 papers in training set
Top 0.2%
18.2%
2
Scientific Data
174 papers in training set
Top 0.1%
10.2%
3
Nature Methods
336 papers in training set
Top 1%
9.9%
4
Nature Communications
4913 papers in training set
Top 24%
8.2%
5
Bioinformatics
1061 papers in training set
Top 5%
4.3%
50% of probability mass above
6
Nucleic Acids Research
1128 papers in training set
Top 5%
4.1%
7
Genome Biology
555 papers in training set
Top 3%
3.5%
8
Genome Medicine
154 papers in training set
Top 2%
3.5%
9
GigaScience
172 papers in training set
Top 0.6%
3.5%
10
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
11
Cell Genomics
162 papers in training set
Top 3%
2.0%
12
PLOS ONE
4510 papers in training set
Top 50%
1.8%
13
Nature
575 papers in training set
Top 11%
1.7%
14
Genome Research
409 papers in training set
Top 2%
1.6%
15
Cell Reports
1338 papers in training set
Top 26%
1.5%
16
Scientific Reports
3102 papers in training set
Top 64%
1.3%
17
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
18
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
19
Nature Computational Science
50 papers in training set
Top 1%
1.2%
20
eLife
5422 papers in training set
Top 51%
1.1%
21
Science
429 papers in training set
Top 18%
0.9%
22
Heliyon
146 papers in training set
Top 4%
0.9%
23
Nature Genetics
240 papers in training set
Top 7%
0.8%
24
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
25
Cell
370 papers in training set
Top 17%
0.8%
26
iScience
1063 papers in training set
Top 30%
0.8%
27
Communications Biology
886 papers in training set
Top 22%
0.8%
28
Database
51 papers in training set
Top 1%
0.7%
29
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.7%
30
American Journal of Respiratory Cell and Molecular Biology
38 papers in training set
Top 0.8%
0.6%