Back

Prioritizing Cardiovascular Disease-Associated Variants Altering NKX2-5 Binding through an Integrative Computational Approach

Pena-Martinez, E. G.; Pomales-Matos, D. A.; Rivera-Madera, A.; Messon-Bird, J. L.; Medina-Feliciano, J. G.; Sanabria-Alberto, L.; Barreiro-Rosario, A. C.; Rodriguez-Rios, J. M.; Rodriguez-Martinez, J. A.

2023-09-02 genetic and genomic medicine
10.1101/2023.09.01.23294951 medRxiv
Show abstract

Cardiovascular diseases (CVDs) are the leading cause of death worldwide and are heavily influenced by genetic factors. Genome-wide association studies (GWAS) have mapped > 90% of CVD-associated variants within the non-coding genome, which can alter the function of regulatory proteins, like transcription factors (TFs). However, due to the overwhelming number of GWAS single nucleotide polymorphisms (SNPs) (>500,000), prioritizing variants for in vitro analysis remains challenging. In this work, we implemented a computational approach that considers support vector machine (SVM)-based TF binding site classification and cardiac expression quantitative trait loci (eQTL) analysis to identify and prioritize potential CVD-causing SNPs. We identified 1,535 CVD-associated SNPs that occur within human heart footprints/enhancers and 9,309 variants in linkage disequilibrium (LD) with differential gene expression profiles in cardiac tissue. Using hiPSC-CM ChIP-seq data from NKX2-5 and TBX5, two cardiac TFs essential for proper heart development, we trained a large-scale gapped k-mer SVM (LS-GKM-SVM) predictive model that can identify binding sites altered by CVD-associated SNPs. The computational predictive model was tested by scoring human heart footprints and enhancers in vitro through electrophoretic mobility shift assay (EMSA). Three variants (rs59310144, rs6715570, and rs61872084) were prioritized for in vitro validation based on their eQTL in cardiac tissue and LS-GKM-SVM prediction to alter NKX2-5 DNA binding. All three variants altered NKX2-5 DNA binding. In summary, we present a bioinformatic approach that considers tissue-specific eQTL analysis and SVM-based TF binding site classification to prioritize CVD-associated variants for in vitro experimental analysis. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=55 SRC="FIGDIR/small/23294951v1_ufig1.gif" ALT="Figure 1"> View larger version (18K): org.highwire.dtl.DTLVardef@d12742org.highwire.dtl.DTLVardef@1687d3forg.highwire.dtl.DTLVardef@f6d7b9org.highwire.dtl.DTLVardef@1ccc18a_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Computers in Biology and Medicine
120 papers in training set
Top 0.1%
10.0%
2
Frontiers in Genetics
197 papers in training set
Top 0.3%
10.0%
3
Human Genomics
21 papers in training set
Top 0.1%
6.3%
4
Scientific Reports
3102 papers in training set
Top 20%
6.2%
5
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.9%
6
Journal of Biomedical Informatics
45 papers in training set
Top 0.5%
3.5%
7
International Journal of Molecular Sciences
453 papers in training set
Top 3%
3.5%
8
Human Genetics and Genomics Advances
70 papers in training set
Top 0.1%
3.5%
9
BMC Genomics
328 papers in training set
Top 0.9%
3.5%
50% of probability mass above
10
Journal of Translational Medicine
46 papers in training set
Top 0.2%
3.5%
11
Nucleic Acids Research
1128 papers in training set
Top 7%
2.7%
12
Genomics
60 papers in training set
Top 0.5%
2.6%
13
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
14
Communications Biology
886 papers in training set
Top 5%
2.0%
15
Genome Medicine
154 papers in training set
Top 4%
1.9%
16
Cell Proliferation
12 papers in training set
Top 0.1%
1.7%
17
Bioinformatics
1061 papers in training set
Top 7%
1.6%
18
Human Genetics
25 papers in training set
Top 0.2%
1.3%
19
PLOS ONE
4510 papers in training set
Top 59%
1.3%
20
Gene
41 papers in training set
Top 1%
1.2%
21
Genetic Epidemiology
46 papers in training set
Top 0.6%
1.2%
22
eLife
5422 papers in training set
Top 49%
1.2%
23
Journal of the American Heart Association
119 papers in training set
Top 3%
0.9%
24
PLOS Genetics
756 papers in training set
Top 13%
0.9%
25
iScience
1063 papers in training set
Top 27%
0.9%
26
Frontiers in Immunology
586 papers in training set
Top 7%
0.8%
27
BMC Medical Genomics
36 papers in training set
Top 1%
0.7%
28
Biomedicines
66 papers in training set
Top 3%
0.7%
29
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
30
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.6%
0.7%