Back

An end-to-end workflow for statistical analysis and inference of large-scale biomedical datasets

Heidari, E.; Sharifi-Zarchi, A.; Sadeghi, M. A.; Mirzaei, M.; Ahmadi, N.; Balazadeh-Meresht, V.; Sadr, M.

2020-01-13 health informatics
10.1101/2020.01.09.20017095 medRxiv
Show abstract

Throughout time, as medical and epidemiological studies have grown larger in scale, the challenges associated with extracting useful and relevant information from these data has mounted. General health surveys provide a good example for such studies as they usually cover large populations and are conducted throughout long periods in multiple locations. The challenges associated with interpreting the results of such studies include: the presence of both categorical and continuous variables and the need to compare them within a single statistical framework; the presence of variations in data resulting from the technical limitations in data collection; the danger of selection and information biases in hypothesis-directed study design and implementation; and the complete inadequacy of p values in identifying significant relationships. As a solution to these challenges, we propose an end-to-end analysis workflow using the MUltivariate analysis and VISualization (MUVIS) package within R statistical software. MUVIS consists of a comprehensive set of statistical tools that follow the basic tenet of unbiased exploration of associations within a dataset. We validate its performance by applying MUVIS to data from the Yazd Health Study (YaHS). YaHS is a prospective cohort study consisting of a general health survey of more than 30 health-related measurements and a questionnaire with over 300 questions acquired from 10050 participants. Given the nature of the YaHS dataset, most of the identified associations are corroborated by a large body of medical literature. Nevertheless, some more interesting and less investigated connections were also found which are presented here. We conclude that MUVIS provides a robust statistical framework for extraction of useful and relevant information from medical datasets and their visualization in easily comprehensible ways.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 16%
12.2%
2
Bioinformatics
1061 papers in training set
Top 3%
10.3%
3
SoftwareX
15 papers in training set
Top 0.1%
8.1%
4
Scientific Reports
3102 papers in training set
Top 15%
6.7%
5
JAMIA Open
37 papers in training set
Top 0.2%
6.7%
6
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
6.3%
50% of probability mass above
7
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.6%
4.3%
8
European Journal of Epidemiology
40 papers in training set
Top 0.1%
3.9%
9
Journal of Biomedical Informatics
45 papers in training set
Top 0.5%
3.5%
10
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.2%
2.7%
11
BMC Bioinformatics
383 papers in training set
Top 3%
2.6%
12
International Journal of Medical Informatics
25 papers in training set
Top 0.7%
2.1%
13
PLOS Computational Biology
1633 papers in training set
Top 15%
1.9%
14
PeerJ
261 papers in training set
Top 6%
1.9%
15
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.8%
16
BMC Research Notes
29 papers in training set
Top 0.1%
1.6%
17
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.3%
18
Patterns
70 papers in training set
Top 2%
0.8%
19
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.8%
20
Data in Brief
13 papers in training set
Top 0.4%
0.8%
21
GigaScience
172 papers in training set
Top 3%
0.8%
22
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
0.8%
23
Frontiers in Bioinformatics
45 papers in training set
Top 1.0%
0.7%
24
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.8%
0.7%
25
PLOS Digital Health
91 papers in training set
Top 3%
0.7%
26
Wellcome Open Research
57 papers in training set
Top 2%
0.7%
27
Artificial Intelligence in Medicine
15 papers in training set
Top 0.8%
0.7%