Back

Topological Deep Learning Identifies Polygenic Variant Clusters Across Familial Multimorbid Disorders

Vomo-Donfack, K. L.; Bousquet, G.; Falgarone, G.; Ginot, G.; Morilla, I.

2026-06-09 health informatics
10.64898/2026.06.03.26354242 medRxiv
Show abstract

Whole-genome sequencing comprehensively captures coding, non-coding and structural variation in families with suspected inherited disorders, yet its clinical utility remains constrained by an interpretation bottleneck: selecting a handful of relevant variants from millions of candidates. Current rule-based pipelines, anchored in ACMG/AMP criteria, excel at identifying highly penetrant Mendelian alleles but frequently miss variants of low-to-moderate penetrance, non-coding alterations and germline-somatic interactions. Here we introduce PolyCLIP-T, a topology-guided multimodal framework that transforms variant selection from a classification problem into a geometric discovery task. By contrastively aligning DNA-sequence embeddings with functional annotations, PolyCLIP-T constructs a unified latent space in which the displacement between reference and alternate embeddings quantifies the molecular perturbation induced by each variant. Persistent homology then identifies stable topological components - coherent variant groups shared among affected relatives - that transcend single-variant scoring logic. Applied to six families with multi-morbid cancer, autoimmune and cardiovascular disease, PolyCLIP-T recovered non-coding and structural candidates overlooked by conventional pipelines and revealed pleiotropic networks spanning disease categories. This approach provides an interpretable, scalable solution for genome-first investigations of disorders driven by polygenic architectures that evade single-variant analysis. The framework was developed and benchmarked on deeply characterised familial cohorts selected for transgenerational multimorbidity; validation in larger, independent populations will be essential to establish its generalisability. An interactive web tool is freely available at https://www.polyclip-t.uma.es/.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 0.7%
33.2%
2
Nature Genetics
240 papers in training set
Top 2%
4.0%
3
Nature Machine Intelligence
61 papers in training set
Top 0.9%
3.6%
4
Patterns
70 papers in training set
Top 0.2%
3.6%
5
Science Translational Medicine
111 papers in training set
Top 0.8%
3.6%
6
Advanced Science
249 papers in training set
Top 5%
3.6%
50% of probability mass above
7
Scientific Reports
3102 papers in training set
Top 39%
3.3%
8
Communications Biology
886 papers in training set
Top 4%
2.4%
9
Bioinformatics
1061 papers in training set
Top 6%
2.1%
10
Nature Medicine
117 papers in training set
Top 2%
2.1%
11
EMBO Molecular Medicine
85 papers in training set
Top 1%
2.1%
12
Nature
575 papers in training set
Top 9%
2.1%
13
Nature Biotechnology
147 papers in training set
Top 4%
1.9%
14
Nature Methods
336 papers in training set
Top 4%
1.9%
15
Genome Medicine
154 papers in training set
Top 4%
1.9%
16
eBioMedicine
130 papers in training set
Top 0.9%
1.9%
17
Cell Reports Medicine
140 papers in training set
Top 3%
1.8%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
19
Nature Computational Science
50 papers in training set
Top 0.6%
1.7%
20
Nature Biomedical Engineering
42 papers in training set
Top 0.9%
1.7%
21
Science Advances
1098 papers in training set
Top 23%
1.2%
22
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.0%
23
Med
38 papers in training set
Top 0.8%
0.8%
24
Genome Biology
555 papers in training set
Top 7%
0.8%
25
Frontiers in Immunology
586 papers in training set
Top 7%
0.8%
26
Science
429 papers in training set
Top 21%
0.7%
27
iScience
1063 papers in training set
Top 37%
0.6%
28
Cell Genomics
162 papers in training set
Top 8%
0.5%
29
Communications Medicine
85 papers in training set
Top 2%
0.5%
30
GENETICS
189 papers in training set
Top 2%
0.5%