Back

Characterization of long-term patient-reported symptoms of COVID-19: an analysis of social media data

Banda, J. M.; Adderley, N.; Ahmed, W.-U.-R.; AlGhoul, H.; Alser, O.; Alser, M.; Areia, C.; Cogenur, M.; Fister, K.; Gombar, S.; Huser, V.; Jonnagaddala, J.; Lai, L.; Leis, A.; Mateu, L.; Mayer, M. A.; Minty, E.; Morales, D. R.; Natarajan, K.; Paredes, R.; Periyakoil, V. S.; Prats-Uribe, A.; Ross, E. G.; Singh, G. V.; Subbian, V.; Vivekanantham, A.; Prieto-Alhambra, D.

2021-07-15 infectious diseases
10.1101/2021.07.13.21260449 medRxiv
Show abstract

As the SARS-CoV-2 virus (COVID-19) continues to affect people across the globe, there is limited understanding of the long term implications for infected patients1-3. While some of these patients have documented follow-ups on clinical records, or participate in longitudinal surveys, these datasets are usually designed by clinicians, and not granular enough to understand the natural history or patient experiences of long COVID. In order to get a complete picture, there is a need to use patient generated data to track the long-term impact of COVID-19 on recovered patients in real time. There is a growing need to meticulously characterize these patients experiences, from infection to months post-infection, and with highly granular patient generated data rather than clinician narratives. In this work, we present a longitudinal characterization of post-COVID-19 symptoms using social media data from Twitter. Using a combination of machine learning, natural language processing techniques, and clinician reviews, we mined 296,154 tweets to characterize the post-acute infection course of the disease, creating detailed timelines of symptoms and conditions, and analyzing their symptomatology during a period of over 150 days.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.8%
18.8%
2
Nature Computational Science
50 papers in training set
Top 0.1%
12.5%
3
Nature Medicine
117 papers in training set
Top 0.2%
8.5%
4
Nature Communications
4913 papers in training set
Top 26%
6.9%
5
npj Digital Medicine
97 papers in training set
Top 0.9%
4.9%
50% of probability mass above
6
iScience
1063 papers in training set
Top 3%
4.4%
7
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.6%
8
Patterns
70 papers in training set
Top 0.3%
2.9%
9
Nature Biotechnology
147 papers in training set
Top 4%
2.1%
10
Scientific Data
174 papers in training set
Top 0.8%
2.1%
11
Nature
575 papers in training set
Top 10%
1.8%
12
Med
38 papers in training set
Top 0.3%
1.5%
13
Cell Reports Methods
141 papers in training set
Top 3%
1.5%
14
PLOS ONE
4510 papers in training set
Top 58%
1.3%
15
Communications Biology
886 papers in training set
Top 14%
1.2%
16
Science Advances
1098 papers in training set
Top 25%
1.0%
17
Heliyon
146 papers in training set
Top 4%
1.0%
18
Epidemics
104 papers in training set
Top 1%
0.9%
19
The Lancet Digital Health
25 papers in training set
Top 0.9%
0.8%
20
The Lancet Infectious Diseases
71 papers in training set
Top 3%
0.8%
21
Viruses
318 papers in training set
Top 5%
0.8%
22
Bioinformatics
1061 papers in training set
Top 9%
0.8%
23
Nano Letters
63 papers in training set
Top 2%
0.8%
24
Clinical Infectious Diseases
231 papers in training set
Top 5%
0.7%
25
Communications Medicine
85 papers in training set
Top 1%
0.7%
26
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.7%
27
Frontiers in Physics
20 papers in training set
Top 1%
0.5%
28
Frontiers in Digital Health
20 papers in training set
Top 2%
0.5%
29
JMIR Public Health and Surveillance
45 papers in training set
Top 5%
0.5%