Integrative multiomic analysis on single-nucleotide variants identifies candidate genes for human craniofacial malformation
Yam, M. H.; So, K. K. H.; Tong, K. K.; Choy, K. W.; Sham, M. H.
Show abstract
Craniofacial malformation (CFM) is a congenital defect encompassing a wide range of phenotypic presentations and is largely driven by genetics. Despite the discovery of more than 300 causal genes, there are a myriad of CFM cases with unknown genetic etiology. The complex gene regulations and heterogeneous cellular interactions in the developing head complicate disease-gene identification and prenatal genetic diagnosis. Recent progress in multiomic profiling of human embryogenesis enables the discovery of novel candidates from established GWAS data. Here, we developed an approach to prioritize GWAS variants using the epigenomes and single-cell transcriptomes of embryonic tissues and progenitor cells by implementing machine learning classifiers and combinatorial analysis. Systematic evaluation revealed significant improvement in the machine learning model performance after integrating transcriptome of neural crest cells (NCCs) and cranial placodes, as well as epigenomic profile of early craniofacial tissues. We identified 249 genes from the best-performing classifier, which include documented CFM-associated genes. Gene regulatory network (GRN) inference showed that 24 candidate genes were involved in NCC- and placode-specific regulons, of which 15 (F11R, ISL1, KANK4, L1TD1, LAMB1, MIA, PRDM1, S100A10, S100A11, STOM, STT3B, TESK2, USP43, WDR86, ZNF439) were novel candidates for human CFM. Motif analysis revealed putative functional SNPs contributing to CFM pathogenesis by disrupting transcription factor binding motifs in neural crest and placodes. Our analyses suggested that PRDM1 and ISL1 are strong candidates for human CFM, as supported by other animal functional studies. This study demonstrates a successful method for disease gene identification using epigenomic and single-cell transcriptomic profiles, and sheds light on the linkage between early cell lineages and the pathogenic process of CFM. Author SummaryCraniofacial malformation is one of the most common congenital disorders that affects food ingestion, speech and social interaction of the patients. The identification of craniofacial disease genes is difficult due to the dynamic gene expression and contribution from multiple cell types during embryonic development. In this study, we combine artificial intelligence with patient genetic and embryo multiomic information to identify new candidate genes for human craniofacial malformation. Using machine learning classifiers and combinatorial analyses, we prioritized single-nucleotide variants from patient datasets and identified 249 candidate genes. Annotation of the variants and candidate genes showed that some of them overlapped with known disease genes, demonstrating the efficacy of our approach. Further analyses using lineage reconstruction and motif analyses revealed a number of promising novel candidates, in particular PRDM1 and ISL1, are likely to be causative genes for human CFM. Our study has demonstrated a translatable approach for disease gene identification utilizing machine learning algorithm and multiomic data, and provides a gene list for improving diagnostic panels and understanding the pathogenic processes of craniofacial disorders.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.