Back

GA4GH Phenopacket-Driven Characterization of Genotype-Phenotype Correlations in Mendelian Disorders

Rekerle, L.; Danis, D.; Rehburg, F.; Graefe, A. S.; Bily, V.; Caballero-Oteyza, A.; Cacheiro, P.; Chimirri, L.; Chong, J. X.; Connelly, E.; de Vries, B. B.; Dingemans, A. J.; Duyzend, M. H.; Freiberger, T.; Gehle, P.; Groza, T.; Hansen, P.; Jacobsen, J.; Klocperk, A.; Ladewig, M. S.; Love, M. I.; Marcello, A. J.; Mordhorst, A.; Munoz-Torres, M. C.; Reese, J.; Schuetz, C.; Smedley, D.; Strauss, T.; Vladyka, O.; Zocche, D.; Thun, S.; Mungall, C. J.; Haendel, M. A.; Robinson, P. N.

2025-03-06 genetic and genomic medicine
10.1101/2025.03.05.25323315 medRxiv
Show abstract

Comprehensively characterizing genotype-phenotype correlations (GPCs) in Mendelian disease would create new opportunities for improving clinical management and understanding disease biology. However, heterogeneous approaches to data sharing, reuse, and analysis have hindered progress in the field. We developed Genotype Phenotype Evaluation of Statistical Association (GPSEA), a software package that leverages the Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema to represent case-level clinical and genetic data about individuals. GPSEA applies an independent filtering strategy to boost statistical power to detect categorical GPCs represented by Human Phenotype Ontology terms. GPSEA additionally enables visualization and analysis of continuous phenotypes, clinical severity scores, and survival data such as age of onset of disease or clinical manifestations. We applied GPSEA to 85 cohorts with 6613 previously published individuals with variants in one of 80 genes associated with 122 Mendelian diseases and identified 225 significant GPCs, with 48 cohorts having at least one statistically significant GPC. These results highlight the power of standardized representations of clinical data for scalable discovery of GPCs in Mendelian disease.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
44.0%
2
Nature Communications
4913 papers in training set
Top 25%
7.2%
50% of probability mass above
3
Nature Genetics
240 papers in training set
Top 1%
6.7%
4
The American Journal of Human Genetics
206 papers in training set
Top 0.9%
5.1%
5
Bioinformatics
1061 papers in training set
Top 5%
4.2%
6
Genetics in Medicine
69 papers in training set
Top 0.5%
2.8%
7
Nucleic Acids Research
1128 papers in training set
Top 8%
2.2%
8
Nature Medicine
117 papers in training set
Top 2%
2.0%
9
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.9%
10
Cell Genomics
162 papers in training set
Top 3%
1.8%
11
Genome Biology
555 papers in training set
Top 5%
1.3%
12
Cell Systems
167 papers in training set
Top 9%
1.2%
13
Scientific Reports
3102 papers in training set
Top 68%
1.0%
14
Frontiers in Genetics
197 papers in training set
Top 7%
1.0%
15
Med
38 papers in training set
Top 0.6%
0.8%
16
PLOS Genetics
756 papers in training set
Top 14%
0.8%
17
Nature
575 papers in training set
Top 15%
0.8%
18
European Journal of Human Genetics
49 papers in training set
Top 1%
0.8%
19
npj Digital Medicine
97 papers in training set
Top 4%
0.7%
20
Human Mutation
29 papers in training set
Top 0.9%
0.5%
21
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.5%
22
Molecular Systems Biology
142 papers in training set
Top 2%
0.5%
23
Human Genetics
25 papers in training set
Top 0.5%
0.5%
24
Cell
370 papers in training set
Top 19%
0.5%
25
PLOS Computational Biology
1633 papers in training set
Top 28%
0.5%