Back

MyeGPT: an AI agent for Multiple Myeloma

Chang, J. G.; Gout, A. M.; Rodiger, J.; Chung, T.-H.; Mulligan, G.; Chng, W. J.

2026-05-20 hematology
10.64898/2026.05.14.26353252 medRxiv
Show abstract

Today, advancements in our understanding of cancer biology are increasingly attributed to large-scale clinical-molecular datasets. The case in point for multiple myeloma, the second-most prevalent haematological malignancy, is the CoMMpass study, a dataset with the paired clinical and sequencing data of 1,143 patients. Given its complexity, the multi-omics data of CoMMpass demands programming skills which imposes a hurdle for experimental myeloma researchers who want to validate their hypotheses on population data. The rise of agentic AI over the past few years presents unparalleled opportunities to bridge this technical gap. We propose MyeGPT (Myeloma Generative Pretrained Transformer), an AI bioinformatician for multiple myeloma that relies on the CoMMpass dataset as its ground truth. MyeGPT converts natural language queries such as 'What are the characteristics of patients who relapse after induction therapy' or 'Compare the overall survival of high vs normal NSD2 expression' into de novo analyses backed on real data, then pro-actively generates plots to visualize the results. We develop a set of evaluation questions based on CoMMpass, complete with scoring criteria, and ran benchmarks to identify the best choice for LLMs and text-embedding models. We package MyeGPT as a ready-to-use browser application, enabling CoMMpass-grounded hypothesis validation from a smartphone.

Matching journals

The top 11 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 4%
8.5%
2
Cell
370 papers in training set
Top 2%
7.3%
3
Nature Methods
336 papers in training set
Top 2%
4.4%
4
Cell Systems
167 papers in training set
Top 3%
4.2%
5
npj Precision Oncology
48 papers in training set
Top 0.1%
4.0%
6
eLife
5422 papers in training set
Top 22%
4.0%
7
Nature
575 papers in training set
Top 7%
3.7%
8
Nature Communications
4913 papers in training set
Top 39%
3.6%
9
Bioinformatics Advances
184 papers in training set
Top 1%
3.6%
10
Nature Medicine
117 papers in training set
Top 0.8%
3.6%
11
Leukemia
39 papers in training set
Top 0.3%
3.6%
50% of probability mass above
12
npj Digital Medicine
97 papers in training set
Top 1%
3.6%
13
Bioinformatics
1061 papers in training set
Top 6%
2.6%
14
Journal of The Royal Society Interface
189 papers in training set
Top 2%
2.4%
15
Genome Biology
555 papers in training set
Top 3%
2.4%
16
Science Advances
1098 papers in training set
Top 12%
2.1%
17
Nature Biotechnology
147 papers in training set
Top 4%
1.9%
18
Blood Advances
54 papers in training set
Top 0.6%
1.9%
19
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
20
PLOS ONE
4510 papers in training set
Top 52%
1.7%
21
Patterns
70 papers in training set
Top 1%
1.5%
22
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.4%
23
Heliyon
146 papers in training set
Top 3%
1.4%
24
iScience
1063 papers in training set
Top 21%
1.2%
25
Frontiers in Immunology
586 papers in training set
Top 6%
1.1%
26
Nature Genetics
240 papers in training set
Top 6%
1.0%
27
PLOS Genetics
756 papers in training set
Top 12%
0.9%
28
Blood
67 papers in training set
Top 1%
0.9%
29
Genome Research
409 papers in training set
Top 4%
0.8%
30
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%