Back

scFAIR Consortium: a decentralized hub for single-cell RNA-Seq data standardization and unification

Gardeux, V.; Carsanaro, S.; Chen, W. J.; David, F. P. A.; Goutte-Gattat, D.; Hilton, J. A.; Lubiana, T.; Patel, N.; Raymor, B.; Zucchi, I.; Deplancke, B.; Ernst, C.; Osumi-Sutherland, D.; Robinson-Rechavi, M.; Sternberg, P. W.; Bastian, F. B.

2026-06-08 bioinformatics
10.64898/2026.06.05.730084 bioRxiv
Show abstract

The rapid accumulation of single-cell RNA-Seq (scRNA-seq) data across multiple repositories presents major challenges for data accessibility, integration, and reproducibility. While primary repositories provide raw data, they rarely include structured cell-type annotations or descriptions of analytical workflows, limiting the ability to reuse and integrate datasets in a FAIR (Findable, Accessible, Interoperable, Reusable) manner. Here we present scFAIR, a consortium of single-cell data resources that has developed a unified metadata schema and common curation framework to improve the FAIRness of scRNA-seq data. Building on and extending the CZ CELLxGENE Discover metadata schema, the scFAIR consortium has been instrumental in driving key schema improvements, including the expansion of supported organisms, richer biological context, and structured reporting of computational workflows. To provide unified access to decentralized datasets, the consortium developed the sc-fair.org portal, which currently aggregates 2,346 datasets across partner resources through ontology-aware semantic search. We demonstrate the practical value of FAIR-compliant datasets through a cross-species validation between human and mouse Allen Brain Atlases, showing that standardized ontology annotations enable reliable annotation transfer across species, with 90% of neuronal clusters receiving an exact or equivalent label. Together, the scFAIR schema, validator, and portal constitute a community-driven framework that advances single-cell data standardization and lays the foundation for reproducible, large-scale integration of single-cell datasets.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.3%
22.0%
2
Nucleic Acids Research
1128 papers in training set
Top 0.7%
17.1%
3
Nature Biotechnology
147 papers in training set
Top 0.4%
14.0%
50% of probability mass above
4
Nature Communications
4913 papers in training set
Top 34%
4.7%
5
Genome Biology
555 papers in training set
Top 2%
3.9%
6
Genome Medicine
154 papers in training set
Top 2%
3.5%
7
Genome Research
409 papers in training set
Top 1%
3.2%
8
Bioinformatics Advances
184 papers in training set
Top 2%
2.5%
9
Bioinformatics
1061 papers in training set
Top 6%
2.5%
10
Cell Genomics
162 papers in training set
Top 2%
2.3%
11
GigaScience
172 papers in training set
Top 1%
2.0%
12
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.0%
13
Cell Systems
167 papers in training set
Top 6%
2.0%
14
Nature Genetics
240 papers in training set
Top 5%
1.7%
15
Nature
575 papers in training set
Top 13%
1.3%
16
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
17
PLOS ONE
4510 papers in training set
Top 60%
1.2%
18
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
19
Scientific Reports
3102 papers in training set
Top 75%
0.7%
20
Scientific Data
174 papers in training set
Top 3%
0.7%
21
Cell Reports Methods
141 papers in training set
Top 5%
0.7%
22
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
23
PLOS Computational Biology
1633 papers in training set
Top 26%
0.7%
24
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.7%