Back

Multi-Task Learning and Soft-Label Supervision for Psychosocial Burden Profiling in Cancer Peer-Support Text

Wang, Z.; Cao, Y.; Shen, X.; Ding, Z.; Liu, Y.; Zhang, Y.

2026-04-04 health informatics
10.64898/2026.04.03.26350034 medRxiv
Show abstract

Objective: Online cancer peer-support text contains signals of psychosocial burden beyond emotional tone, including treatment burden, financial strain, uncertainty, and unmet support needs. We evaluated 2 modeling extensions: multi-task learning (MTL) for joint prediction of health economics and outcomes research (HEOR) burden dimensions, and soft-label supervision using large language model (LLM)-derived probability distributions. Materials and Methods: We analyzed 10,392 cancer peer-support posts. GPT-4o-mini generated proxy annotations for HEOR burden subscales, composite burden, high-need status, speaker role, cancer type, and emotion probabilities. Study 1 trained a shared ALBERT encoder under 4 MTL conditions: composite and subscale burden targets, each with and without auxiliary heads, using Kendall uncertainty weighting. Study 2 compared soft-label training on LLM emotion distributions with hard-label baselines under regular and token-augmented inputs, evaluating performance against both human labels and AI distributions. Results: Composite-only MTL achieved R2=0.446 for burden regression and weighted F1=0.810 for high-need screening; subscale classification achieved mean weighted F1=0.646. Adding auxiliary role and cancer-type heads reduced regression performance ({triangleup}R2 = -0.209). Soft-label training reduced weighted F1 by 0.16 versus hard-label baselines (0.68 vs. 0.86), and token augmentation did not improve performance under soft supervision. Discussion: Composite-only MTL supported modeling of multidimensional burden-related signals from forum text, whereas auxiliary prediction heads appeared to compete with primary tasks. Soft-label training aligned poorly with human-labeled emotion categories, suggesting that uncalibrated LLM distributions may propagate bias rather than improve supervision. Conclusion: Composite-only MTL was the strongest burden-modeling approach, and hard-label supervision remained preferable for emotion classification.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
12.2%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
10.0%
3
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
6.8%
4
npj Digital Medicine
97 papers in training set
Top 0.9%
4.8%
5
The Lancet Digital Health
25 papers in training set
Top 0.1%
4.8%
6
JAMA Network Open
127 papers in training set
Top 0.8%
3.9%
7
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.6%
8
Scientific Reports
3102 papers in training set
Top 38%
3.6%
9
eBioMedicine
130 papers in training set
Top 0.4%
3.6%
50% of probability mass above
10
International Journal of Medical Informatics
25 papers in training set
Top 0.5%
3.0%
11
BMC Medicine
163 papers in training set
Top 2%
2.3%
12
BMC Medical Research Methodology
43 papers in training set
Top 0.5%
2.1%
13
JAMIA Open
37 papers in training set
Top 0.8%
1.7%
14
Annals of Internal Medicine
27 papers in training set
Top 0.4%
1.7%
15
PLOS ONE
4510 papers in training set
Top 55%
1.6%
16
Cancer Medicine
24 papers in training set
Top 0.8%
1.6%
17
Artificial Intelligence in Medicine
15 papers in training set
Top 0.3%
1.6%
18
JMIR Medical Informatics
17 papers in training set
Top 0.9%
1.5%
19
Scientific Data
174 papers in training set
Top 1%
1.5%
20
Frontiers in Digital Health
20 papers in training set
Top 0.8%
1.3%
21
Nature Communications
4913 papers in training set
Top 55%
1.3%
22
Journal of Clinical Epidemiology
28 papers in training set
Top 0.4%
1.3%
23
Bioinformatics
1061 papers in training set
Top 8%
1.3%
24
PLOS Computational Biology
1633 papers in training set
Top 21%
0.9%
25
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.6%
0.9%
26
BMJ Open
554 papers in training set
Top 12%
0.8%
27
BMJ Health & Care Informatics
13 papers in training set
Top 0.9%
0.7%
28
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.7%
29
Biological Psychiatry
119 papers in training set
Top 2%
0.7%
30
Journal of Personalized Medicine
28 papers in training set
Top 1%
0.7%