Back

BioASQ-QA: A manually curated corpus for Biomedical Question Answering

Krithara, A.; Nentidis, A.; Bougiatiotis, K.; Paliouras, G.

2022-12-16 bioinformatics
10.1101/2022.12.14.520213 bioRxiv
Show abstract

The BioASQ question answering (QA) benchmark dataset contains questions in English, along with golden standard (reference) answers and related material. The dataset has been designed to reflect real information needs of biomedical experts and is therefore more realistic and challenging than most existing datasets. Furthermore, unlike most previous QA benchmarks that contain only exact answers, the BioASQ-QA dataset also includes ideal answers (in effect summaries), which are particularly useful for research on multi-document summarization. The dataset combines structured and unstructured data. The material linked with each question comprise documents and snippets, which are useful for Information Retrieval and Passage Retrieval experiments, as well as concepts that are useful in concept-to-text Natural Language Generation. Researchers working on paraphrasing and textual entailment can also measure the degree to which their methods improve the performance of biomedical QA systems. Last but not least, the dataset is continuously extended, as the BioASQ challenge is running and new data are generated.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Database
51 papers in training set
Top 0.1%
37.9%
2
Nucleic Acids Research
1128 papers in training set
Top 2%
8.5%
3
Bioinformatics
1061 papers in training set
Top 4%
4.9%
50% of probability mass above
4
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
3.6%
5
BMC Bioinformatics
383 papers in training set
Top 3%
3.3%
6
Scientific Reports
3102 papers in training set
Top 43%
2.9%
7
BioData Mining
15 papers in training set
Top 0.1%
2.7%
8
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.4%
9
GigaScience
172 papers in training set
Top 0.9%
2.1%
10
Scientific Data
174 papers in training set
Top 0.8%
2.1%
11
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
12
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.5%
13
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
1.1%
14
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.1%
15
Biology Methods and Protocols
53 papers in training set
Top 2%
0.9%
16
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.9%
17
PLOS ONE
4510 papers in training set
Top 66%
0.8%
18
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.8%
0.8%
19
npj Digital Medicine
97 papers in training set
Top 3%
0.7%
20
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.7%
21
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
22
Cureus
67 papers in training set
Top 5%
0.7%
23
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
24
iScience
1063 papers in training set
Top 34%
0.7%
25
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
26
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.6%
27
Frontiers in Genetics
197 papers in training set
Top 12%
0.5%
28
Journal of Personalized Medicine
28 papers in training set
Top 2%
0.5%
29
BMC Biology
248 papers in training set
Top 7%
0.5%