Back

LLM-Based Annotation and Token-Augmented Modeling for Emotional Tone Classification in Online Cancer Peer-Support Posts

Xu, S.; Wang, Z.; Wang, H.; Ding, Z.; Zou, Y.; Cao, Y.

2026-01-30 oncology
10.64898/2026.01.27.26344999 medRxiv
Show abstract

Online cancer peer-support communities generate large volumes of patient-authored and caregiver-authored text that may reflect distress, coping, and informational needs. Automated emotional tone classification could support scalable monitoring, but supervised modeling depends on label quality and may benefit from explicit context features. Using the Mental Health Insights: Vulnerable Cancer Survivors & Caregivers dataset, we compared five model families (TF-IDF Logistic Regression, Random Forest, LightGBM, GRU, and fine-tuned ALBERT) on a three-class target (Negative/Neutral/Positive) derived from four original categories. We introduced two extensions: (i) LLM-based annotation to generate parallel "AI labels" and (ii) token-based augmentation that prepends LLM-extracted structured variables (reporter role and cancer type) to the post text. Models were trained with a 60/20/20 stratified train/validation/test split, with hyperparameters selected on validation data only. Test performance was summarized using weighted F1 and macro one-vs-rest AUC with bootstrap confidence intervals, with paired comparisons based on McNemar tests and false discovery rate adjustment. The LLM annotator produced substantial redistribution in the four-class label space, shifting prevalence toward very negative relative to the original labels; the shift persisted but attenuated after collapsing to three classes. Across all model families, token augmen-tation improved held-out performance, with the largest gains for GRU and consistent improvements for ALBERT. Augmentation also reduced polarity-reversing errors (Nega-{leftrightarrow} tive Positive) for ALBERT, while adjacent errors (Negative {leftrightarrow} Neutral) remained the dominant residual failure mode. These results indicate that LLM-based supervision can introduce systematic measurement shifts that require auditing, yet LLM-extracted context incorporated via simple token augmentation provides a pragmatic, model-agnostic mechanism to improve downstream emotional tone classification for supportive oncology decision support. Author summaryWe studied how to better monitor emotional tone in posts from online cancer peer-support communities, where patients and caregivers share experiences that may signal distress, coping, or unmet needs. Automated classification could help organizations and moderators identify when additional support may be needed, but these systems depend on the quality of the labels used for training and may miss clinical context. Using a public dataset of cancer survivor and caregiver posts, we trained and compared several machine-learning and deep-learning models to classify each post as negative, neutral, or positive. We tested two practical improvements. First, we used a large language model to generate an additional set of "AI labels" and examined how these differed from the original categories. Second, we extracted simple context information--whether the writer was a patient or caregiver and what cancer type was mentioned--and added this context to the text before model training. We found that adding context consistently improved performance across model types. However, the AI-generated labels shifted class distributions, indicating that automated labeling can introduce systematic changes that should be audited. Overall, simple context extraction can make emotional tone monitoring more accurate and useful for supportive oncology decision support.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
33.1%
2
Artificial Intelligence in Medicine
15 papers in training set
Top 0.1%
6.4%
3
npj Digital Medicine
97 papers in training set
Top 0.9%
4.9%
4
Biology Methods and Protocols
53 papers in training set
Top 0.1%
4.6%
5
Scientific Reports
3102 papers in training set
Top 29%
4.2%
50% of probability mass above
6
PLOS Computational Biology
1633 papers in training set
Top 9%
4.0%
7
JAMA Network Open
127 papers in training set
Top 0.8%
4.0%
8
iScience
1063 papers in training set
Top 8%
2.6%
9
European Journal of Cancer
10 papers in training set
Top 0.1%
1.9%
10
PLOS ONE
4510 papers in training set
Top 52%
1.8%
11
Frontiers in Bioinformatics
45 papers in training set
Top 0.2%
1.8%
12
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.7%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 35%
1.5%
14
Frontiers in Oncology
95 papers in training set
Top 3%
1.3%
15
Nature Communications
4913 papers in training set
Top 56%
1.2%
16
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.4%
1.2%
17
Cancer Medicine
24 papers in training set
Top 1%
1.0%
18
Clinical Cancer Research
58 papers in training set
Top 2%
0.9%
19
PLOS Digital Health
91 papers in training set
Top 2%
0.9%
20
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
21
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%
22
Journal of Translational Medicine
46 papers in training set
Top 2%
0.8%
23
Database
51 papers in training set
Top 0.8%
0.8%
24
BMC Infectious Diseases
118 papers in training set
Top 5%
0.7%
25
JAMIA Open
37 papers in training set
Top 1%
0.7%
26
Journal of Pathology Informatics
13 papers in training set
Top 0.4%
0.7%
27
JMIR Medical Informatics
17 papers in training set
Top 1%
0.7%
28
PeerJ
261 papers in training set
Top 15%
0.7%
29
npj Precision Oncology
48 papers in training set
Top 1%
0.7%
30
Cancers
200 papers in training set
Top 5%
0.7%