Back

RatsPub: a webservice aided by deep learning to mine PubMed for addiction-related genes

Gunturkun, M. H.; Flashner, E.; Wang, T.; Mulligan, M. K.; Williams, R. W.; Prins, P.; Chen, H.

2020-11-05 bioinformatics
10.1101/2020.09.17.297358 bioRxiv
Show abstract

Interpreting and integrating results from omics studies typically requires a comprehensive and time consuming survey of extant literature. Here, we introduce GeneCup, an easy to use literature mining web service that searches all PubMed abstracts for user-provided gene symbols in conjunction with a set of custom keywords organized into a customized ontology, as well as results from human genome-wide association studies (GWAS). As an example, we organized over 300 keywords related to drug addiction into seven categories. The literature search is conducted by querying the NIH PubMed server using a programming interface, which is followed by retrieving abstracts from a local copy of the PubMed archive. The main results presented to the user are individual sentences containing the gene symbol, organized by the keywords they also contain. These sentences are presented through an interactive graphical interface or as tables. GWAS results are displayed using a similar method. All results are linked to the original abstract in PubMed. In addition, a convolutional neural network is employed to distinguish sentences describing systemic stress from those describing cellular stress. The automated and comprehensive search strategy provided by GeneCup facilitates the integration of new discoveries from omic studies with existing literature. GeneCup is free and open source software. The source code of GeneCup and the link to a running instance is available at https://github.com/hakangunturkun/GeneCup

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics Advances
184 papers in training set
Top 0.1%
22.7%
2
Bioinformatics
1061 papers in training set
Top 1%
18.9%
3
PLOS ONE
4510 papers in training set
Top 28%
6.4%
4
Patterns
70 papers in training set
Top 0.1%
4.9%
50% of probability mass above
5
BMC Bioinformatics
383 papers in training set
Top 2%
4.9%
6
Nucleic Acids Research
1128 papers in training set
Top 5%
3.7%
7
Database
51 papers in training set
Top 0.2%
2.8%
8
GigaScience
172 papers in training set
Top 0.7%
2.8%
9
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
10
Scientific Reports
3102 papers in training set
Top 50%
2.1%
11
BioData Mining
15 papers in training set
Top 0.3%
1.8%
12
Journal of Biomedical Informatics
45 papers in training set
Top 0.9%
1.5%
13
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.2%
14
Nature Communications
4913 papers in training set
Top 56%
1.2%
15
Genome Medicine
154 papers in training set
Top 7%
0.9%
16
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.8%
17
eLife
5422 papers in training set
Top 57%
0.8%
18
iScience
1063 papers in training set
Top 34%
0.7%
19
eneuro
389 papers in training set
Top 9%
0.7%
20
Communications Biology
886 papers in training set
Top 26%
0.7%
21
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%
22
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.6%
23
PeerJ
261 papers in training set
Top 17%
0.6%
24
Frontiers in Neuroinformatics
38 papers in training set
Top 1%
0.5%
25
BMC Medical Genomics
36 papers in training set
Top 2%
0.5%
26
Biological Psychiatry
119 papers in training set
Top 3%
0.5%
27
Neuroinformatics
40 papers in training set
Top 1%
0.5%
28
NAR Genomics and Bioinformatics
214 papers in training set
Top 5%
0.5%