Back

Feasibility of Machine Learning Analysis for the Identification of Patients with Possible Primary Ciliary Dyskinesia

Burns, G.; Kauffman, C.; Manion, M.; Pai, R.-A.; Milla, C.; O Connor, M. G.; Shapiro, A. J.; Bjornson-Pennell, H.

2025-04-20 respiratory medicine
10.1101/2025.04.18.25326065 medRxiv
Show abstract

BACKGROUNDSignificant diagnostic delays are common in primary ciliary dyskinesia (PCD), a rare disease that is significantly underdiagnosed. Scalable screening methods could improve early identification and health outcomes. RESEARCH QUESTIONCan machine learning (ML) be used to screen for PCD in pediatric patients? STUDY DESIGN AND METHODSWe evaluated the feasibility of a random forest model to screen for PCD using data from the PCD Foundation Registry and a national claims database. We identified a cohort of pediatric patients with diagnostic codes indicative of conditions potentially associated with PCD, and studied diagnostic, procedural, and pharmaceutical codes associated with PCD to develop ML features. Models were trained on composite claims data from confirmed patients with PCD, patients with Q34.8 (Specific Congenital Malformation of the Respiratory System) diagnosed within six months of an Electron Microscopy procedure (Q34.8+EM), and a randomly-selected, matched control group. Model performance was tested through 5-fold cross-validation. RESULTSUsing 82 confirmed PCD cases and 4,161 matched controls, the model demonstrated variable performance (positive predictive value 0.45-0.73, sensitivity 0.75-0.94). Synthetic data augmentation did not improve results (positive predictive value 0.45-0.67, sensitivity 0.71-1.00). Expanding the dataset to include 319 Q34.8+EM patients and 8,214 controls improved performance (positive predictive value 0.51-0.54, sensitivity 0.82-0.90), suitable for screening. In a cohort of 1.32 million pediatric patients, 7,705 were classified as positive, consistent with the estimated prevalence of PCD (1:7,554). INTERPRETATIONThis study demonstrates the feasibility of using ML to screen for PCD using claims data, even in the absence of a specific International Classification of Disease (ICD) code. Such screening approaches may aid in the identification of individuals who may benefit from timely diagnostic testing and targeted interventions.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Annals of Clinical and Translational Neurology
29 papers in training set
Top 0.1%
33.3%
2
ERJ Open Research
44 papers in training set
Top 0.2%
6.4%
3
Genetics in Medicine
69 papers in training set
Top 0.3%
4.9%
4
European Respiratory Journal
54 papers in training set
Top 0.4%
4.2%
5
BMJ Open Respiratory Research
32 papers in training set
Top 0.2%
3.6%
50% of probability mass above
6
Human Molecular Genetics
130 papers in training set
Top 0.7%
3.6%
7
Scientific Reports
3102 papers in training set
Top 36%
3.6%
8
Frontiers in Molecular Biosciences
100 papers in training set
Top 0.7%
2.8%
9
Frontiers in Pediatrics
29 papers in training set
Top 0.2%
2.6%
10
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.9%
11
PLOS ONE
4510 papers in training set
Top 51%
1.8%
12
British Journal of General Practice
22 papers in training set
Top 0.3%
1.5%
13
Pediatric Pulmonology
14 papers in training set
Top 0.2%
1.3%
14
JAMA Pediatrics
10 papers in training set
Top 0.1%
1.2%
15
Pediatrics
10 papers in training set
Top 0.2%
1.0%
16
Orphanet Journal of Rare Diseases
18 papers in training set
Top 0.5%
0.9%
17
Journal of Clinical Medicine
91 papers in training set
Top 5%
0.9%
18
BJGP Open
12 papers in training set
Top 0.5%
0.9%
19
Frontiers in Medicine
113 papers in training set
Top 6%
0.8%
20
International Journal of Epidemiology
74 papers in training set
Top 2%
0.8%
21
American Journal of Medical Genetics Part A
17 papers in training set
Top 0.3%
0.8%
22
Communications Medicine
85 papers in training set
Top 0.9%
0.8%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 42%
0.8%
24
Human Mutation
29 papers in training set
Top 0.7%
0.8%
25
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
26
Frontiers in Neurology
91 papers in training set
Top 5%
0.7%
27
Genetics in Medicine Open
10 papers in training set
Top 0.1%
0.7%
28
Cureus
67 papers in training set
Top 6%
0.6%
29
BMJ Open
554 papers in training set
Top 13%
0.6%
30
Journal of Internal Medicine
12 papers in training set
Top 1.0%
0.5%