Back

Covid-19 Pandemic Data Analysis and Forecasting using Machine Learning Algorithms

Sengupta, S.; Mugde, S.; Sharma, G.

2020-08-12 public and global health
10.1101/2020.06.25.20140004 medRxiv
Show abstract

India reported its first Covid-19 case on 30th Jan 2020 and the number of cases reported heavily escalated from March, 2020. This research paper analyses COVID -19 data initially at a global level and then drills down to the scenario obtained in India. Data is gathered from multiple data sources-several authentic government websites. The need of the hour is to accurately forecast when the numbers will reach at its peak and then diminish. It will be of huge help to public welfare professionals to plan the preventive measures to be taken keeping the economic balance of the country as well. Variables such as gender, geographical location, age etc. have been represented using Python and Data Visualization techniques. Time Series Forecasting techniques including Machine Learning models like Linear Regression, Support Vector Regression, Polynomial Regression and Deep Learning Forecasting Model like LSTM(Long short-term memory) are deployed to study the probable hike in cases and in the near future. A comparative analysis is also done to understand which model fits the best for our data. Data is considered till 30th July, 2020. The results show that a statistical model named sigmoid model is outperforming other models. Also the Sigmoid model is giving an estimate of the day on which we can expect the number of active cases to reach its peak and also when the curve will start to flatten. Strength of Sigmoid model lies in providing a count of date that no other model offers and thus it is the best model to predict Covid cases counts -this is unique feature of analysis in this paper. Certain feature engineering techniques have been used to transfer data into logarithmic scale as is affords better comparison removing any data extremities or outliers. Based on the predictions of the short-term interval, our model can be tuned to forecast long time intervals.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
JMIRx Med
31 papers in training set
Top 0.1%
14.5%
2
PLOS ONE
4510 papers in training set
Top 15%
12.5%
3
Scientific Reports
3102 papers in training set
Top 9%
8.5%
4
Journal of Medical Internet Research
85 papers in training set
Top 0.6%
7.2%
5
Cureus
67 papers in training set
Top 0.9%
4.3%
6
Biomedical Signal Processing and Control
18 papers in training set
Top 0.1%
3.7%
50% of probability mass above
7
JMIR Public Health and Surveillance
45 papers in training set
Top 0.6%
3.6%
8
International Journal of Medical Informatics
25 papers in training set
Top 0.4%
3.6%
9
Heliyon
146 papers in training set
Top 0.6%
2.8%
10
IEEE Access
31 papers in training set
Top 0.2%
2.1%
11
Frontiers in Public Health
140 papers in training set
Top 4%
1.8%
12
Chaos, Solitons & Fractals
32 papers in training set
Top 0.9%
1.8%
13
Expert Systems with Applications
11 papers in training set
Top 0.1%
1.8%
14
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
15
PeerJ
261 papers in training set
Top 7%
1.7%
16
F1000Research
79 papers in training set
Top 3%
1.2%
17
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.6%
1.1%
18
PLOS Digital Health
91 papers in training set
Top 2%
1.0%
19
JMIR Formative Research
32 papers in training set
Top 1%
0.9%
20
International Journal of Environmental Research and Public Health
124 papers in training set
Top 6%
0.9%
21
Frontiers in Physiology
93 papers in training set
Top 5%
0.9%
22
Data in Brief
13 papers in training set
Top 0.3%
0.9%
23
BMC Infectious Diseases
118 papers in training set
Top 5%
0.8%
24
Frontiers in Physics
20 papers in training set
Top 1%
0.7%
25
Nonlinear Dynamics
10 papers in training set
Top 0.5%
0.7%
26
Applied Sciences
24 papers in training set
Top 1%
0.7%
27
PLOS Computational Biology
1633 papers in training set
Top 27%
0.6%
28
Wellcome Open Research
57 papers in training set
Top 3%
0.5%
29
PLOS Global Public Health
293 papers in training set
Top 7%
0.5%
30
BMC Public Health
147 papers in training set
Top 7%
0.5%