Back

Combining phenotypic similarity and network propagation to improve performance and clinical consistency of rare disease diagnosis

Chahdil, M.; Fabrizzi, C.; Hanauer, M.; Lucano, C.; Rath, A.; Lagorce, D.; Tichit, L.

2026-02-17 health informatics
10.64898/2026.02.15.26346357 medRxiv
Show abstract

Achieving timely diagnosis for rare diseases remains challenging due to, among others, phenotypic heterogeneity and incomplete clinical data. While the Solve-RD project developed a phenotype-based gene prioritisation method, this approach did not account for the clinical consistency among related diseases in Orphanets hierarchical classifications. We present a phenotype-based computational pipeline that ranks candidate ORPHAcodes based on patient phenotypes. The pipeline computes patient-disease similarity using asymmetric semantic aggregation of Human Phenotype Ontology terms, filtering subsumed terms and incorporating Orphanet frequency annotations. Evaluated on 139 expert curated Solve-RD cases representing 78 distinct ORPHAcodes, our methodology outperformed the established Solve-RD baseline method, achieving a harmonic mean rank of 4.64 for confirmed diagnoses (versus 7.97) and retrieving the correct suspected rare disease within the top 10 positions for 39% of patients (versus 29%). We then explore a disease similarity network using Random Walk with Restart to generate ranked candidate lists. Two complementary experiments demonstrate that RWR-ranked candidates exhibited improved clinical consistency, reflected by their proximity within the Orphanet nomenclature of rare diseases. This approach provides more interpretable and actionable differential diagnosis hypotheses to guide clinical decision-making Author summaryMany patients with rare diseases face prolonged diagnostic delays due to the extreme heterogeneity of rare disorders associated with the variability of their clinical manifestations, which complicates interpretation and requires structured phenotypic representations and expert knowledge. We developed a computational pipeline that compares patients phenotypes with those documented for rare diseases in the Orphanet database. Rather than relying solely on direct matching of clinical signs and symptoms, our approach leverages relationships between diseases by propagating information through a network connecting patients and diseases. Testing on 139 cases from the European Solve-RD project, our method improved identification of correct diagnoses and generated more clinically coherent candidate lists by accounting the Orphanet nomenclature. This work provides a methodology dedicated to assisting clinicians in developing diagnostic hypotheses for rare diseases.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 7%
10.0%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
9.0%
3
Bioinformatics
1061 papers in training set
Top 3%
8.3%
4
Journal of Biomedical Informatics
45 papers in training set
Top 0.2%
6.3%
5
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.5%
3.5%
6
PLOS Computational Biology
1633 papers in training set
Top 10%
3.5%
7
npj Digital Medicine
97 papers in training set
Top 1%
3.0%
8
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1.0%
2.8%
9
BMC Bioinformatics
383 papers in training set
Top 3%
2.7%
10
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.3%
2.7%
50% of probability mass above
11
Orphanet Journal of Rare Diseases
18 papers in training set
Top 0.2%
2.6%
12
Journal of Personalized Medicine
28 papers in training set
Top 0.2%
2.3%
13
Human Mutation
29 papers in training set
Top 0.3%
2.1%
14
GENETICS
189 papers in training set
Top 0.5%
1.9%
15
iScience
1063 papers in training set
Top 15%
1.7%
16
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
17
Database
51 papers in training set
Top 0.4%
1.7%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
19
PLOS Digital Health
91 papers in training set
Top 2%
1.7%
20
Nature Communications
4913 papers in training set
Top 52%
1.6%
21
BMC Medical Genomics
36 papers in training set
Top 0.6%
1.5%
22
Frontiers in Digital Health
20 papers in training set
Top 0.8%
1.3%
23
eBioMedicine
130 papers in training set
Top 2%
1.3%
24
PLOS ONE
4510 papers in training set
Top 61%
1.2%
25
JAMIA Open
37 papers in training set
Top 1%
1.1%
26
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
0.9%
27
Patterns
70 papers in training set
Top 2%
0.9%
28
International Journal of Medical Informatics
25 papers in training set
Top 1%
0.9%
29
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.8%
30
Genome Medicine
154 papers in training set
Top 8%
0.8%