Back

Integrative multiomic analysis on single-nucleotide variants identifies candidate genes for human craniofacial malformation

Yam, M. H.; So, K. K. H.; Tong, K. K.; Choy, K. W.; Sham, M. H.

2025-12-29 genetics
10.64898/2025.12.29.696805 bioRxiv
Show abstract

Craniofacial malformation (CFM) is a congenital defect encompassing a wide range of phenotypic presentations and is largely driven by genetics. Despite the discovery of more than 300 causal genes, there are a myriad of CFM cases with unknown genetic etiology. The complex gene regulations and heterogeneous cellular interactions in the developing head complicate disease-gene identification and prenatal genetic diagnosis. Recent progress in multiomic profiling of human embryogenesis enables the discovery of novel candidates from established GWAS data. Here, we developed an approach to prioritize GWAS variants using the epigenomes and single-cell transcriptomes of embryonic tissues and progenitor cells by implementing machine learning classifiers and combinatorial analysis. Systematic evaluation revealed significant improvement in the machine learning model performance after integrating transcriptome of neural crest cells (NCCs) and cranial placodes, as well as epigenomic profile of early craniofacial tissues. We identified 249 genes from the best-performing classifier, which include documented CFM-associated genes. Gene regulatory network (GRN) inference showed that 24 candidate genes were involved in NCC- and placode-specific regulons, of which 15 (F11R, ISL1, KANK4, L1TD1, LAMB1, MIA, PRDM1, S100A10, S100A11, STOM, STT3B, TESK2, USP43, WDR86, ZNF439) were novel candidates for human CFM. Motif analysis revealed putative functional SNPs contributing to CFM pathogenesis by disrupting transcription factor binding motifs in neural crest and placodes. Our analyses suggested that PRDM1 and ISL1 are strong candidates for human CFM, as supported by other animal functional studies. This study demonstrates a successful method for disease gene identification using epigenomic and single-cell transcriptomic profiles, and sheds light on the linkage between early cell lineages and the pathogenic process of CFM. Author SummaryCraniofacial malformation is one of the most common congenital disorders that affects food ingestion, speech and social interaction of the patients. The identification of craniofacial disease genes is difficult due to the dynamic gene expression and contribution from multiple cell types during embryonic development. In this study, we combine artificial intelligence with patient genetic and embryo multiomic information to identify new candidate genes for human craniofacial malformation. Using machine learning classifiers and combinatorial analyses, we prioritized single-nucleotide variants from patient datasets and identified 249 candidate genes. Annotation of the variants and candidate genes showed that some of them overlapped with known disease genes, demonstrating the efficacy of our approach. Further analyses using lineage reconstruction and motif analyses revealed a number of promising novel candidates, in particular PRDM1 and ISL1, are likely to be causative genes for human CFM. Our study has demonstrated a translatable approach for disease gene identification utilizing machine learning algorithm and multiomic data, and provides a gene list for improving diagnostic panels and understanding the pathogenic processes of craniofacial disorders.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Journal of Genetics and Genomics
36 papers in training set
Top 0.1%
14.5%
2
Scientific Reports
3102 papers in training set
Top 5%
10.6%
3
Human Molecular Genetics
130 papers in training set
Top 0.1%
10.2%
4
Frontiers in Genetics
197 papers in training set
Top 0.3%
10.2%
5
PLOS ONE
4510 papers in training set
Top 33%
4.4%
6
American Journal of Medical Genetics Part A
17 papers in training set
Top 0.1%
3.6%
50% of probability mass above
7
Human Genetics
25 papers in training set
Top 0.1%
3.1%
8
Genes
126 papers in training set
Top 0.4%
2.9%
9
Disease Models & Mechanisms
119 papers in training set
Top 0.9%
1.9%
10
BioMed Research International
25 papers in training set
Top 2%
1.7%
11
Human Mutation
29 papers in training set
Top 0.4%
1.5%
12
BMC Medical Genomics
36 papers in training set
Top 0.5%
1.5%
13
PLOS Genetics
756 papers in training set
Top 10%
1.3%
14
iScience
1063 papers in training set
Top 21%
1.2%
15
Journal of Medical Genetics
28 papers in training set
Top 0.4%
1.2%
16
Biochemistry and Biophysics Reports
28 papers in training set
Top 0.8%
1.2%
17
European Journal of Human Genetics
49 papers in training set
Top 0.8%
1.2%
18
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.2%
19
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
20
eLife
5422 papers in training set
Top 53%
0.9%
21
Biology
43 papers in training set
Top 2%
0.9%
22
Genomics
60 papers in training set
Top 2%
0.9%
23
Biomedicines
66 papers in training set
Top 3%
0.8%
24
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 8%
0.8%
25
npj Genomic Medicine
33 papers in training set
Top 0.8%
0.8%
26
Biomolecules
95 papers in training set
Top 2%
0.8%
27
Developmental Dynamics
50 papers in training set
Top 0.7%
0.8%
28
BioTechniques
24 papers in training set
Top 0.3%
0.8%
29
Human Genetics and Genomics Advances
70 papers in training set
Top 0.8%
0.8%
30
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%