Back

Thyroid Cancer Risk Prediction from Multimodal Datasets Using Large Language Model

Ray, P.

2026-03-06 health informatics
10.64898/2026.03.05.26347766 medRxiv
Show abstract

Thyroid carcinoma is one of the most prevalent endocrine malignancies worldwide, and accurate preoperative differentiation between benign and malignant thyroid nodules remains clinically challenging. Diagnostic methods that medical practitioners use at present depend on their personal judgment to evaluate both imaging results and separate clinical tests, which creates inconsistency that leads to incorrect medical evaluations. The combination of radiological imaging with clinical information systems enables healthcare providers to enhance their capacity to make reliable predictions about patient outcomes while improving their decision-making abilities. The study introduces a deep learning framework that utilizes multiple data sources by combining magnetic resonance imaging (MRI) data with clinical text to predict thyroid cancer. The system uses a Vision Transformer (ViT) to obtain advanced MRI scan features, while a domain-adapted language model processes clinical documents that contain patient medical history and symptoms and laboratory results. The cross-modal attention system enables the system to merge imaging data with textual information from different sources, which helps to identify how the two types of data are interconnected. The system uses a classification layer to classify the fused features, which allows it to determine the probability of cancerous tumors. The experimental results show that the proposed multimodal system achieves better results than the unimodal base systems because it has higher accuracy, sensitivity, specificity, and AUC values, which help medical personnel to make better preoperative decisions.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.1%
14.4%
2
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.1%
8.4%
3
Scientific Reports
3102 papers in training set
Top 12%
7.2%
4
Artificial Intelligence in Medicine
15 papers in training set
Top 0.1%
4.2%
5
Computers in Biology and Medicine
120 papers in training set
Top 0.7%
4.0%
6
PLOS Digital Health
91 papers in training set
Top 0.7%
3.7%
7
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
3.6%
8
JAMIA Open
37 papers in training set
Top 0.4%
3.6%
9
PLOS ONE
4510 papers in training set
Top 39%
3.6%
50% of probability mass above
10
npj Digital Medicine
97 papers in training set
Top 1%
3.6%
11
JMIR Medical Informatics
17 papers in training set
Top 0.5%
2.1%
12
Informatics in Medicine Unlocked
21 papers in training set
Top 0.4%
1.9%
13
IEEE Access
31 papers in training set
Top 0.3%
1.8%
14
International Journal of Medical Informatics
25 papers in training set
Top 0.8%
1.7%
15
IEEE Transactions on Biomedical Engineering
38 papers in training set
Top 0.5%
1.7%
16
Expert Systems with Applications
11 papers in training set
Top 0.1%
1.7%
17
Cureus
67 papers in training set
Top 3%
1.3%
18
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.3%
19
Sensors
39 papers in training set
Top 1%
1.3%
20
JMIR Public Health and Surveillance
45 papers in training set
Top 2%
1.3%
21
Frontiers in Physiology
93 papers in training set
Top 4%
1.3%
22
Biomedical Signal Processing and Control
18 papers in training set
Top 0.4%
1.1%
23
Nature Communications
4913 papers in training set
Top 58%
0.9%
24
Frontiers in Medicine
113 papers in training set
Top 6%
0.9%
25
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.8%
0.9%
26
Neurocomputing
13 papers in training set
Top 0.4%
0.9%
27
Applied Sciences
24 papers in training set
Top 0.7%
0.9%
28
Cells
232 papers in training set
Top 6%
0.8%
29
Bioinformatics
1061 papers in training set
Top 9%
0.8%
30
Journal of Personalized Medicine
28 papers in training set
Top 1%
0.8%