Back

Leveraging hierarchical structures for genetic block interaction studies using the hierarchical transformer

Li, s.; Arora, S.; Attaoua, r.; Hamet, P.; Tremblay, J.; Bihlo, A.; Liu, B.; Rutter, G.

2024-11-18 genetic and genomic medicine
10.1101/2024.11.18.24317486 medRxiv
Show abstract

Initially introduced in 1909 by William Bateson, classic epistasis (genetic variant interaction) refers to the phenomenon that one variant prevents another variant from a different locus from manifesting its effects. The potential effects of genetic variant interactions on complex diseases have been recognized for the past decades. Moreover, It has been studied and demonstrated that leveraging the combined SNP effects within the genetic block can significantly increase calculation power, reducing background noise, ultimately leading to novel epistasis discovery that the single SNP statistical epistasis study might overlook. However, it is still an open question how we can best combine gene structure representation modelling and interaction learning into an end-to-end model for gene interaction searching. Here, in the current study, we developed a neural genetic block interaction searching model that can effectively process large SNP chip inputs and output the potential genetic block interaction heatmap. Our model augments a previously published hierarchical transformer architecture (Liu and Lapata, 2019) with the ability to model genetic blocks. The cross-block relationship mapping was achieved via a hierarchical attention mechanism which allows the sharing of information regarding specific phenotypes, as opposed to simple unsupervised dimensionality reduction methods e.g. PCA. Results on both simulation and UK Biobank studies show our model brings substantial improvements compared to traditional exhaustive searching and neural network methods.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Frontiers in Genetics
197 papers in training set
Top 0.1%
21.9%
2
Bioinformatics
1061 papers in training set
Top 1%
18.9%
3
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
9.8%
50% of probability mass above
4
Briefings in Bioinformatics
326 papers in training set
Top 1%
6.2%
5
Bioinformatics Advances
184 papers in training set
Top 0.8%
4.7%
6
PLOS Computational Biology
1633 papers in training set
Top 11%
3.0%
7
Scientific Reports
3102 papers in training set
Top 45%
2.7%
8
Heliyon
146 papers in training set
Top 2%
1.6%
9
BMC Genomics
328 papers in training set
Top 3%
1.6%
10
European Journal of Human Genetics
49 papers in training set
Top 0.7%
1.6%
11
Genetic Epidemiology
46 papers in training set
Top 0.5%
1.6%
12
BMC Medical Genomics
36 papers in training set
Top 0.7%
1.3%
13
BMC Bioinformatics
383 papers in training set
Top 6%
1.2%
14
PLOS ONE
4510 papers in training set
Top 61%
1.2%
15
GigaScience
172 papers in training set
Top 2%
0.9%
16
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
17
Communications Biology
886 papers in training set
Top 20%
0.9%
18
Frontiers in Bioinformatics
45 papers in training set
Top 0.7%
0.9%
19
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
20
Frontiers in Neuroscience
223 papers in training set
Top 8%
0.7%
21
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.8%
0.7%
22
Genes
126 papers in training set
Top 4%
0.7%
23
Genomics
60 papers in training set
Top 3%
0.7%
24
Human Genetics and Genomics Advances
70 papers in training set
Top 1%
0.6%
25
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.6%
26
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.6%
27
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%