Back

Integrative Multi-Omics Analysis Reveals Novel Molecular Signatures, Disease Stratification and Therapeutic Opportunities in Primary Ciliary Dyskinesia: First AI-ML empowered platform towards precision medicine targeting human ciliopathies

Jitender, ; Hossain, M. W.; Mohanty, S.; Kateriya, S.

2026-01-14 health informatics
10.64898/2026.01.12.26343910 medRxiv
Show abstract

Primary ciliary dyskinesia (PCD) belongs to the group of rare genetic disorders that is extremely hard to diagnose and treat. Current diagnostic modalities detect only 70% of cases and are technically demanding. It necessitates novel computational approaches for biomarker discovery and the identification of therapeutic targets. We have developed an integrative computational pipeline analysing transcriptomic data from 6 PCD patients and 9 healthy controls. We identified 1,249 differentially expressed genes (false discovery rate below 0.05, absolute log2 fold-change exceeding 1), revealing oxidative stress as a central pathophysiological mechanism, with glutathione S-transferase theta 2B (GSTT2B) emerging as a master regulatory hub. WGCNA detected 12 co-expression modules with three significantly disease-associated modules. The application of machine learning enabled outstanding diagnostic performance with a minimal 10-gene signature, maintaining an accuracy of 0.93. The Random Forest area under the receiver operating characteristic curve was estimated to be 0.96 {+/-} 0.03. This study aided in analyzing uncharacterized genes, such as FRMPD3, C1orf194, and METTL26, which were not previously associated with PCD. The methodology adopted for drug repurposing helped in the identification of FDA-approved drugs, including N-acetylcysteine, metformin, and resveratrol. They appeared as top candidates for therapeutic intervention of PCD. The age-dependent classification revealed that 156 genes exhibited significant disease progression interactions. On the other hand, gender-associated classifications precisely identified 342 sex-specific responsive genes. BackgroundPrimary ciliary dyskinesia (PCD), is considered a rare genetic disorder that arises due to ciliary dysfunction. It causes severe respiratory illness including chronic infections, bronchiectasis, and morbidity. Although more than 50 PCD genes have been identified, the molecular mechanisms underlying PCD pathophysiology remain unclear. This obscurity leads to failed therapeutic interventions, highlighting the need for robust PCD-specific molecular characterization. MethodsThis study has incorporated an integrated computational analysis of transcriptomic data obtained from the GSE25186 dataset. This dataset encompasses nasal epithelial cells samples extracted from six and nine confirmed cases of PCD and healthy controls respectively. Different approaches were undertaken in this study. These included empirical Bayes moderated t statistics, weighted gene co-expression network analysis (WGCNA) with soft threshold {beta}=6, comprehensive pathway enrichment across KEGG, Reactome, and GO databases, machine learning classification using Random Forest and Support Vector Machines, temporal trajectory inference through pseudotime analysis, and systematic drug repurposing screening against DrugBank v5.1.8 and ChEMBL v29 databases. ResultsWe identified 1,249 differentially expressed genes (adjusted p-value < 0.05, |log2FC| > 1), comprising 533 upregulated and 716 downregulated genes. The application of WGCNA identified 12 co-expression modules that were found to be associated with three different modules. These three modules were brown module: r = 0.78, p = 2x10-, blue module: r = - 0.65, p = 0.008, and green module: r = 0.82, p = 0.001). The machine learning tools yielded outstanding diagnostic performance, with a Random Forest AUC value of 0.96 {+/-} 0.03. This led to the generation of a minimal 10-gene diagnostic signature. This study identified N-acetylcysteine (NAC) as the top therapeutic candidate, with enhanced potential for treating PCD. The other candidates, metformin and resveratrol, had composite scores of 1.85 and 0.28, respectively, whereas NAC possessed a composite score of 2.46. Systems biology-based classification by age revealed progressive molecular deterioration. A total of 156 genes had a significant age x disease interaction, with a false detection rate of less than 0.05. Gender stratification located 342 genes that were differentially responsive, leading to the design of male/female-dependent therapeutic interventions. ConclusionsThe multi-omics analysis gives significant revelations onto PCD molecular pathophysiology. The oxidative stress (GSTT2B, GPX1, SOD2) mechanism and protein homeostasis disruption (HSPA8, PDIA3, CALR) served as central regulators for disease progression. This study helps to gain novel insights into reliable diagnostic markers, FDA-approved and readily available drug candidates for PCDs therapeutic interventions. Further, age and gender associated classification of biological markers in PCD offers novel path for tailored medicines. This study established a robust molecular framework for therapeutics of rare genetic diseases.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
European Respiratory Journal
54 papers in training set
Top 0.1%
13.1%
2
Journal of Personalized Medicine
28 papers in training set
Top 0.1%
9.5%
3
Scientific Reports
3102 papers in training set
Top 11%
7.4%
4
PLOS ONE
4510 papers in training set
Top 26%
6.5%
5
Advanced Biology
29 papers in training set
Top 0.1%
5.0%
6
Computers in Biology and Medicine
120 papers in training set
Top 0.7%
3.7%
7
Environmental Research
46 papers in training set
Top 0.6%
2.4%
8
BMC Medical Genomics
36 papers in training set
Top 0.3%
2.2%
9
Frontiers in Neurology
91 papers in training set
Top 2%
2.1%
50% of probability mass above
10
Journal of Medical Virology
137 papers in training set
Top 2%
1.8%
11
Orphanet Journal of Rare Diseases
18 papers in training set
Top 0.3%
1.8%
12
eClinicalMedicine
55 papers in training set
Top 0.5%
1.8%
13
eBioMedicine
130 papers in training set
Top 1%
1.8%
14
Biomedicines
66 papers in training set
Top 1%
1.5%
15
Advanced Science
249 papers in training set
Top 14%
1.3%
16
Genes
126 papers in training set
Top 2%
1.3%
17
Frontiers in Public Health
140 papers in training set
Top 6%
1.3%
18
Human Molecular Genetics
130 papers in training set
Top 2%
1.3%
19
BMC Medicine
163 papers in training set
Top 5%
1.1%
20
iScience
1063 papers in training set
Top 23%
1.0%
21
Frontiers in Digital Health
20 papers in training set
Top 1%
1.0%
22
Frontiers in Medicine
113 papers in training set
Top 5%
1.0%
23
Acta Neuropsychiatrica
12 papers in training set
Top 0.7%
1.0%
24
Critical Care
14 papers in training set
Top 0.5%
0.9%
25
Experimental Neurology
57 papers in training set
Top 1%
0.8%
26
Frontiers in Immunology
586 papers in training set
Top 7%
0.8%
27
EBioMedicine
39 papers in training set
Top 1%
0.8%
28
Clinical and Translational Medicine
30 papers in training set
Top 1%
0.7%
29
Bioinformatics
1061 papers in training set
Top 10%
0.7%
30
Investigative Opthalmology & Visual Science
37 papers in training set
Top 0.7%
0.7%