Back

Real time scalable data acquisition of COVID-19 in six continents through PySpark - a big data tool

Patel, T. S.; Patel, D. P.; Patel, C. N.

2021-07-06 health informatics
10.1101/2021.07.04.21259983 medRxiv
Show abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was declared as a global emergency in January 2020 due to its pandemic outbreak. To examine this Coronavirus disease 2019 (COVID-19) effects various data are being generated through different platforms. This study was focused on the clinical data of COVID-19 which relied on python programming. Here, we proposed a machine learning approach to provide a insights into the COVID-19 information. PySpark is a machine learning approach which also known as Apache spark an accurate tool for the searching of results with minimum time intervals as compare to Hadoop and other tools. World Health Organization (WHO) started gathering corona patients data from last week of the February 2020. On March 11, 2020, the WHO declared COVID-19 a global pandemic. The cases became more evident and common after mid-March. This paper used the live owid (our world in data) dataset and will analyse and find out the following details on the live COVID-19 dataset. (1) The daily Corona virus scenario on various continents using PySpark in microseconds of Processor time. (2) After the various antibodies have been implemented, how they impact new cases on a regular basis utilizing various graphs. (3) Tabular representation of COVID-19 new cases in all the continents.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Informatics in Medicine Unlocked
21 papers in training set
Top 0.1%
12.5%
2
PLOS ONE
4510 papers in training set
Top 21%
8.5%
3
Scientific Reports
3102 papers in training set
Top 10%
8.3%
4
JMIR Public Health and Surveillance
45 papers in training set
Top 0.3%
4.9%
5
International Journal of Medical Informatics
25 papers in training set
Top 0.2%
4.9%
6
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.6%
4.4%
7
BioMed Research International
25 papers in training set
Top 0.5%
4.4%
8
Journal of Medical Internet Research
85 papers in training set
Top 1%
4.4%
50% of probability mass above
9
Chaos, Solitons & Fractals
32 papers in training set
Top 0.5%
3.6%
10
Physica A: Statistical Mechanics and its Applications
10 papers in training set
Top 0.1%
2.1%
11
Frontiers in Applied Mathematics and Statistics
10 papers in training set
Top 0.1%
2.1%
12
Heliyon
146 papers in training set
Top 1%
2.1%
13
SoftwareX
15 papers in training set
Top 0.1%
1.9%
14
PeerJ
261 papers in training set
Top 7%
1.7%
15
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
16
JAMIA Open
37 papers in training set
Top 1.0%
1.4%
17
Data in Brief
13 papers in training set
Top 0.2%
1.2%
18
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.1%
19
Cureus
67 papers in training set
Top 4%
1.1%
20
Advanced Biology
29 papers in training set
Top 0.8%
0.9%
21
JMIR Medical Informatics
17 papers in training set
Top 1%
0.9%
22
Frontiers in Neuroinformatics
38 papers in training set
Top 0.6%
0.9%
23
Infectious Disease Modelling
50 papers in training set
Top 1%
0.9%
24
Frontiers in Bioinformatics
45 papers in training set
Top 0.6%
0.9%
25
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%
26
Biomedicines
66 papers in training set
Top 3%
0.8%
27
Frontiers in Microbiology
375 papers in training set
Top 9%
0.7%
28
Journal of Medical Virology
137 papers in training set
Top 4%
0.7%
29
Frontiers in Public Health
140 papers in training set
Top 9%
0.7%
30
Bioinformatics
1061 papers in training set
Top 10%
0.7%