Back

Dissecting the functional landscape of rare diseases through genomic variation in a heterogeneous cohort of 11,000 patients

Uria-Regojo, G.; Fernandez-Caballero, L.; Lopez-Alcojor, A.; Lopez-Lopez, L.; Benitez, Y.; Rodilla, C.; Avila Fernandez, A.; Trujillo-Tiebas, M. J.; Osorio, A.; Corton, M.; Almoguera, B.; Ayuso, C.; Minguez, P.

2026-06-11 genetic and genomic medicine
10.64898/2026.06.10.26355349 medRxiv
Show abstract

Rare diseases (RDs) remain a major diagnostic challenge. Genetic and phenotypic heterogeneity, incomplete knowledge of disease mechanisms, and limitations in variant clinical interpretation leave many patients without a molecular diagnosis. Meanwhile, the growing volume of genomic data generated in clinical practice offers an opportunity to develop data-driven methodologies for exploring disease mechanisms and improving the reanalysis of unsolved cases. We aggregated real-world genomic data from 11,084 unrelated patients with suspected RD. Patients were clinically classified into 122 diseases. We built a multi-disease genomic variant frequency database (FJD-DB), which enabled the development of variant and gene-disease association scores by means of case-control subcohort comparisons across 32 disease groups. Functional enrichment analyses were then used to highlight disease-associated protein domains, pathways, biological processes, and phenotypes. Finally, the resulting knowledge was integrated into a data-driven framework for the guided reanalysis of unsolved RD patients applied to Inherited Retinal Dystrophies (IRD) patients as first use case. FJD-DB contained more than 45 million unique variants, including ~185,000 potentially pathogenic variants. Disease-specific analyses identified disease-associated pathogenic variants and highlighted both established and candidate disease genes. We detected 179 significantly enriched protein domains across 23 diseases, 124 Human Phenotype Ontology terms across 13 diseases, 79 Reactome pathways across 10 diseases, and 72 Gene Ontology biological processes across 8 diseases, revealing highly disease-specific functional signatures. Integration of disease-specific variant, gene, and functional association signals enabled the development of a data-driven framework for guided reanalysis of unsolved RD cases. Applied to more than 1,100 unsolved IRD cases, the framework generated clinically relevant findings in 26 patients, including four molecular diagnoses, seven candidate diagnoses, and 15 cases upgraded from non-informative findings to variants of uncertain significance. Aggregated real-world genomic data can be leveraged to identify disease-associated molecular signals generating novel biological hypotheses. A unified analytical framework provides a scalable strategy for knowledge discovery and guided reanalysis, facilitating the identification of overlooked and potentially novel genetic causes of RDs.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
40.7%
2
Nature Communications
4913 papers in training set
Top 26%
7.0%
3
Cell Genomics
162 papers in training set
Top 0.5%
6.6%
50% of probability mass above
4
The American Journal of Human Genetics
206 papers in training set
Top 0.7%
6.6%
5
Nature Genetics
240 papers in training set
Top 2%
3.7%
6
Genetics in Medicine
69 papers in training set
Top 0.6%
2.0%
7
Nature Medicine
117 papers in training set
Top 2%
2.0%
8
Human Mutation
29 papers in training set
Top 0.3%
1.8%
9
Nucleic Acids Research
1128 papers in training set
Top 10%
1.8%
10
Cell Systems
167 papers in training set
Top 7%
1.8%
11
Med
38 papers in training set
Top 0.3%
1.5%
12
npj Genomic Medicine
33 papers in training set
Top 0.5%
1.4%
13
Scientific Reports
3102 papers in training set
Top 63%
1.4%
14
Cell
370 papers in training set
Top 15%
1.0%
15
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.0%
16
eLife
5422 papers in training set
Top 52%
0.9%
17
Bioinformatics
1061 papers in training set
Top 9%
0.9%
18
Communications Biology
886 papers in training set
Top 20%
0.8%
19
Cell Reports Medicine
140 papers in training set
Top 7%
0.8%
20
BMC Medical Genomics
36 papers in training set
Top 1%
0.7%
21
Annals of Neurology
57 papers in training set
Top 2%
0.7%
22
PLOS Genetics
756 papers in training set
Top 16%
0.7%
23
Human Genetics
25 papers in training set
Top 0.5%
0.7%
24
Molecular Systems Biology
142 papers in training set
Top 2%
0.7%
25
Nature Methods
336 papers in training set
Top 7%
0.7%
26
Science Translational Medicine
111 papers in training set
Top 7%
0.7%
27
Brain
154 papers in training set
Top 5%
0.5%
28
Science Advances
1098 papers in training set
Top 34%
0.5%
29
Nature Biomedical Engineering
42 papers in training set
Top 3%
0.5%
30
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 49%
0.5%