Back

Using a decision tree to predict COVID case numbers: a tutorial for beginners

Moctezuma Tan, L.; Benitez, L.; van Nouhuijs, F.; Orcales, F.; Kim, A.; Campbell, R.; Fuse, M.; Pennings, P. S.

2023-12-20 scientific communication and education
10.1101/2023.12.19.572463 bioRxiv
Show abstract

This manuscript describes the development of a module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on machine learning and decision tree concepts in an interactive format that uses appropriate cloud resources for data access and analyses. Machine learning (ML) is an important tool in biomedical research and can lead to improvements in diagnosis, treatment, and prevention of diseases. During the COVID pandemic ML was used for predictions at the patient and community levels. Given its ubiquity, it is important that future doctors, researchers and teachers get acquainted with ML and its contributions to research. Our goal is to make it easier for everyone to learn about machine learning. The learning module we present here is based on a small COVID dataset, videos, annotated code and the use of Google Colab or the Google Cloud Platform (GCP). The benefit of these platforms is that students do not have to set up a programming environment on their computer which saves time and is also an important democratization factor. The module focuses on learning the basics of decision trees by applying them to COVID data. It introduces basic terminology used in supervised machine learning and its relevance to research. Our experience with biology students at San Francisco State University suggests that the material increases interest in ML.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Heliyon
146 papers in training set
Top 0.1%
17.4%
2
PLOS Computational Biology
1633 papers in training set
Top 4%
8.1%
3
PLOS ONE
4510 papers in training set
Top 24%
7.1%
4
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.1%
6.8%
5
F1000Research
79 papers in training set
Top 0.2%
6.3%
6
GigaScience
172 papers in training set
Top 0.3%
4.3%
7
Entropy
20 papers in training set
Top 0.1%
3.6%
50% of probability mass above
8
Patterns
70 papers in training set
Top 0.2%
3.6%
9
eLife
5422 papers in training set
Top 31%
2.7%
10
PLOS Biology
408 papers in training set
Top 6%
2.3%
11
BioData Mining
15 papers in training set
Top 0.2%
2.3%
12
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.2%
2.1%
13
Bioinformatics
1061 papers in training set
Top 7%
1.9%
14
Journal of Cheminformatics
25 papers in training set
Top 0.3%
1.9%
15
Scientific Reports
3102 papers in training set
Top 56%
1.8%
16
Communications Biology
886 papers in training set
Top 9%
1.7%
17
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
1.6%
18
FEBS Open Bio
29 papers in training set
Top 0.2%
1.3%
19
Frontiers in Behavioral Neuroscience
46 papers in training set
Top 0.8%
0.9%
20
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.9%
21
BMC Genomics
328 papers in training set
Top 5%
0.8%
22
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
23
Bioinformatics Advances
184 papers in training set
Top 4%
0.8%
24
Neuroinformatics
40 papers in training set
Top 1%
0.7%
25
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%
26
Artificial Intelligence in Medicine
15 papers in training set
Top 0.7%
0.7%
27
Frontiers in Molecular Biosciences
100 papers in training set
Top 5%
0.7%
28
Quantitative Biology
11 papers in training set
Top 0.8%
0.7%
29
PeerJ
261 papers in training set
Top 16%
0.7%
30
FASEB BioAdvances
15 papers in training set
Top 0.3%
0.7%