Back

Data quality and Big Data in the health industry: a scoping review protocol

Tomaz Santos, L. C.; Bublitz, F. M.

2024-10-18 health systems and quality improvement
10.1101/2024.10.18.24315741 medRxiv
Show abstract

IntroductionBig Data is characterized by the large volume of data, the variety of types and formats, the speed with which they are generated, and the veracity and value that can be extracted from the data. However, the result obtained with this technology will depend on the quality of the information obtained from the data. Big Data has great potential in healthcare and can be used to advance diagnosis, treatment, and healthcare management. Health data is highly vulnerable due to its sensitive nature, as it contains personal and confidential information. If exposed or compromised, it could lead to privacy violations, inaccuracies, misuse, incorrect diagnoses, or misguided decision-making in patient care. It is important to prioritize confidentiality, adhere to regulatory compliance, and maintain data integrity; for that, it is essential to use efficient methods to obtain quality data and make them able to reach the proposed objective. ObjectiveIn this context, the scoping review protocol aims to identify and map existing strategies, methods, or models that improve the quality of medical and health data in Big Data environments. This review explores the methods to support the effective use of Big Data in healthcare while addressing the challenges to maintain data integrity and ensure safe decision-making. Methods and analysisThis scoping review will be conducted based on the six-step process outlined in the framework proposed by Levac et al. in "Scoping Studies: Advancing the methodology" and will be reported following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) checklist. The research team will use Data Quality, Big Data, and Health terms to search for primary studies in the Scopus Document Search, IEEE Xplore Digital Library, and ACM Digital Library databases.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.1%
28.7%
2
PLOS ONE
4510 papers in training set
Top 17%
10.5%
3
BMJ Open
554 papers in training set
Top 3%
6.6%
4
JMIRx Med
31 papers in training set
Top 0.1%
6.6%
50% of probability mass above
5
Journal of Medical Internet Research
85 papers in training set
Top 0.7%
6.6%
6
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
3.7%
7
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.2%
2.7%
8
F1000Research
79 papers in training set
Top 0.9%
2.4%
9
Scientific Reports
3102 papers in training set
Top 52%
2.0%
10
Sensors
39 papers in training set
Top 1.0%
1.7%
11
GigaScience
172 papers in training set
Top 1%
1.7%
12
Frontiers in Public Health
140 papers in training set
Top 5%
1.4%
13
BMJ Open Quality
15 papers in training set
Top 0.5%
1.4%
14
BMJ Health & Care Informatics
13 papers in training set
Top 0.5%
1.4%
15
JMIR Research Protocols
18 papers in training set
Top 0.9%
1.3%
16
Healthcare
16 papers in training set
Top 1.0%
1.3%
17
JMIR Medical Informatics
17 papers in training set
Top 1.0%
1.3%
18
JAMIA Open
37 papers in training set
Top 1%
1.1%
19
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
1.0%
20
Database
51 papers in training set
Top 0.7%
0.9%
21
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.9%
22
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
23
PLOS Digital Health
91 papers in training set
Top 2%
0.8%
24
Frontiers in Digital Health
20 papers in training set
Top 1%
0.8%
25
Heliyon
146 papers in training set
Top 7%
0.7%
26
International Journal of Environmental Research and Public Health
124 papers in training set
Top 7%
0.7%
27
JMIR Formative Research
32 papers in training set
Top 2%
0.7%
28
PLOS Global Public Health
293 papers in training set
Top 6%
0.5%
29
Physiological Measurement
12 papers in training set
Top 0.5%
0.5%