scFAIR Consortium: a decentralized hub for single-cell RNA-Seq data standardization and unification
Gardeux, V.; Carsanaro, S.; Chen, W. J.; David, F. P. A.; Goutte-Gattat, D.; Hilton, J. A.; Lubiana, T.; Patel, N.; Raymor, B.; Zucchi, I.; Deplancke, B.; Ernst, C.; Osumi-Sutherland, D.; Robinson-Rechavi, M.; Sternberg, P. W.; Bastian, F. B.
Show abstract
The rapid accumulation of single-cell RNA-Seq (scRNA-seq) data across multiple repositories presents major challenges for data accessibility, integration, and reproducibility. While primary repositories provide raw data, they rarely include structured cell-type annotations or descriptions of analytical workflows, limiting the ability to reuse and integrate datasets in a FAIR (Findable, Accessible, Interoperable, Reusable) manner. Here we present scFAIR, a consortium of single-cell data resources that has developed a unified metadata schema and common curation framework to improve the FAIRness of scRNA-seq data. Building on and extending the CZ CELLxGENE Discover metadata schema, the scFAIR consortium has been instrumental in driving key schema improvements, including the expansion of supported organisms, richer biological context, and structured reporting of computational workflows. To provide unified access to decentralized datasets, the consortium developed the sc-fair.org portal, which currently aggregates 2,346 datasets across partner resources through ontology-aware semantic search. We demonstrate the practical value of FAIR-compliant datasets through a cross-species validation between human and mouse Allen Brain Atlases, showing that standardized ontology annotations enable reliable annotation transfer across species, with 90% of neuronal clusters receiving an exact or equivalent label. Together, the scFAIR schema, validator, and portal constitute a community-driven framework that advances single-cell data standardization and lays the foundation for reproducible, large-scale integration of single-cell datasets.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.