Back

Building an Interoperable Rare Disease Multi-omic Resource: The GREGoR Data Model and Dataset

Heavner, B. D.; Wheeler, M. M.; Bengtsson, J. D.; Carvalho, C. M. B.; Cheung, W. A.; Conomos, M. P.; Delot, E. C.; DiTroia, S.; Ganesh, V. S.; Gogarten, S. M.; Grochowski, C. M.; Jhangiani, S. N.; King, C. H.; LeMaster, C.; Marvin, C. T.; Marwaha, S.; Miller, D. E.; O'Donnell-Luria, A.; Pais, L.; Patterson, K.; Qi, G.; Richardson, M.; Smail, C.; Stilp, A. M.; Tong, C. C.; Ungar, R. A.; Weisburd, B.; Bamshad, M. J.; Bernstein, J. A.; Eichler, E. E.; Gibbs, R. A.; Lupski, J. R.; May, S. J.; Montgomery, S. B.; Pastinen, T.; Posey, J.; Rehm, H. L.; Shojaie, A.; Talkowski, M. E.; Vilain, E.; Wei, C

2026-05-19 genomics
10.64898/2026.05.15.725546 bioRxiv
Show abstract

Rare disease research and diagnosis rely on the integration of genomic and phenotypic data generated across diverse clinical sites; however, the absence of widely adopted standards for representing genomic data and associated metadata has limited data interoperability, reuse, and cross-study analysis. The Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium was established to investigate challenging rare disease cases and evaluate emerging multi-omic technologies for clinical translation. To support coordinated data integration across distributed research sites, we developed a common Consortium Data Model in partnership with domain experts to standardize the capture of participant-, family-, phenotype- and assay-level metadata, with a particular emphasis on using a modular architecture to support linking of multiple data versions from multiple omic technologies to a single individual and attribution of a genetic finding to the specific technology used for its initial discovery. Adoption of the GREGoR Data Model has enabled continued generation and public release of a harmonized, analysis-ready Consortium Dataset. The most recent release includes phenotypic, family and multi-omic data from 12,292 participants in 5,029 families. Other rare disease data sharing efforts are beginning to adopt this data model which will facilitate cross consortium analyses and empower rare disease research. This work demonstrates that a collaborative, flexible, and scalable data model can enable large-scale rare disease research, facilitate cross-center data harmonization, and enable data interoperability.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.2%
14.5%
2
Nucleic Acids Research
1128 papers in training set
Top 1%
12.5%
3
Database
51 papers in training set
Top 0.1%
6.7%
4
Human Mutation
29 papers in training set
Top 0.1%
6.7%
5
Cell Genomics
162 papers in training set
Top 0.6%
6.2%
6
Scientific Data
174 papers in training set
Top 0.5%
3.5%
50% of probability mass above
7
PLOS ONE
4510 papers in training set
Top 40%
3.5%
8
Genetics in Medicine
69 papers in training set
Top 0.5%
3.2%
9
Nature Communications
4913 papers in training set
Top 43%
3.0%
10
GigaScience
172 papers in training set
Top 1.0%
2.1%
11
Frontiers in Genetics
197 papers in training set
Top 4%
1.9%
12
BMC Medical Genomics
36 papers in training set
Top 0.4%
1.9%
13
BMC Bioinformatics
383 papers in training set
Top 4%
1.9%
14
Scientific Reports
3102 papers in training set
Top 59%
1.7%
15
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.7%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.6%
17
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
18
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
19
Nature Biotechnology
147 papers in training set
Top 6%
1.3%
20
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
21
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.3%
22
Genome Research
409 papers in training set
Top 4%
0.9%
23
BMC Genomics
328 papers in training set
Top 5%
0.9%
24
Nature Medicine
117 papers in training set
Top 5%
0.8%
25
Nature Methods
336 papers in training set
Top 6%
0.7%
26
PLOS Genetics
756 papers in training set
Top 15%
0.7%
27
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%
28
Communications Biology
886 papers in training set
Top 27%
0.7%
29
Bioinformatics
1061 papers in training set
Top 10%
0.6%
30
American Journal of Respiratory Cell and Molecular Biology
38 papers in training set
Top 0.8%
0.6%