Back

ChexMix: A Literature Content Extraction Tool for Bioentities

Yang, H.; Park, B.; Park, J.; Lee, J.; Jang, H. S.; Lee, N.; Yoo, H.

2021-03-10 bioinformatics
10.1101/2021.03.09.434525 bioRxiv
Show abstract

Biomedical databases grow by more than a thousand new publications every day. The large volume of biomedical literature that is being published at an unprecedented rate hinders the discovery of relevant knowledge from keywords of interest to gather new insights and form hypotheses. A text-mining tool, PubTator, helps to automatically annotate bioentities, such as species, chemicals, genes, and diseases, from PubMed abstracts and full-text articles. However, the manual re-organization and analysis of bioentities is a non-trivial and highly time-consuming task. ChexMix was designed to extract the unique identifiers of bioentities from query results. Herein, ChexMix was used to construct a taxonomic tree with allied species among Korean native plants and to extract the medical subject headings unique identifier of the bioentities, which co-occurred with the keywords in the same literature. ChexMix discovered the allied species related to a keyword of interest and experimentally proved its usefulness for multi-species analysis.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 0.1%
54.6%
50% of probability mass above
2
PLOS ONE
4510 papers in training set
Top 26%
6.6%
3
Briefings in Bioinformatics
326 papers in training set
Top 1%
5.1%
4
BMC Bioinformatics
383 papers in training set
Top 4%
2.0%
5
Database
51 papers in training set
Top 0.3%
2.0%
6
Scientific Reports
3102 papers in training set
Top 54%
1.9%
7
Horticulture Research
43 papers in training set
Top 0.9%
1.9%
8
Scientific Data
174 papers in training set
Top 1.0%
1.8%
9
Plant Communications
35 papers in training set
Top 0.8%
1.7%
10
Metabolites
50 papers in training set
Top 0.5%
1.6%
11
Journal of Genetics and Genomics
36 papers in training set
Top 1%
1.4%
12
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.2%
13
Molecular Plant
36 papers in training set
Top 1%
0.9%
14
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
15
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
16
Bioinformatics
1061 papers in training set
Top 9%
0.8%
17
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.8%
18
PeerJ
261 papers in training set
Top 14%
0.8%
19
GigaScience
172 papers in training set
Top 3%
0.8%
20
Advanced Science
249 papers in training set
Top 19%
0.8%
21
ACS Omega
90 papers in training set
Top 4%
0.8%
22
Journal of Cheminformatics
25 papers in training set
Top 0.7%
0.5%
23
Biomedical Signal Processing and Control
18 papers in training set
Top 0.6%
0.5%