Back

Understanding the Sources of Performance in Deep Learning Drug Response Prediction Models

Branson, N.; Cutillas, P. R.; Bessant, C.

2024-06-06 bioinformatics
10.1101/2024.06.05.597337 bioRxiv
Show abstract

Anti-cancer drug response prediction (DRP) using cancer cell lines plays a vital role in stratified medicine and drug discovery. Recently there has been a surge of new deep learning (DL) models for DRP that improve on the performance of their predecessors. However, different models use different input data types and neural network architectures making it hard to find the source of these improvements. Here we consider multiple published DRP models that report state-of-the-art performance in predicting continuous drug response values. These models take the chemical structures of drugs and omics profiles of cell lines as input. By experimenting with these models and comparing with our own simple benchmarks we show that no performance comes from drug features, instead, performance is due to the transcriptomics cell line profiles. Furthermore, we show that, depending on the testing type, much of the current reported performance is a property of the training target values. To address these limitations we create novel models (BinaryET and BinaryCB) that predict binary drug response values, guided by the hypothesis that this reduces the noise in the drug efficacy data. Thus, better aligning them with biochemistry that can be learnt from the input data. BinaryCB leverages a chemical foundation model, while BinaryET is trained from scratch using a transformer-type model. We show that these models learn useful chemical drug features, which is the first time this has been demonstrated for multiple DRP testing types to our knowledge. We further show binarising the drug response values is what causes the models to learn useful chemical drug features. We also show that BinaryET improves performance over BinaryCB, and over the published models that report state-of-the-art performance.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.7%
28.6%
2
PLOS Computational Biology
1633 papers in training set
Top 5%
6.6%
3
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
5.0%
4
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
4.3%
5
Bioinformatics Advances
184 papers in training set
Top 1.0%
4.1%
6
Journal of Cheminformatics
25 papers in training set
Top 0.1%
4.1%
50% of probability mass above
7
BMC Bioinformatics
383 papers in training set
Top 2%
3.8%
8
iScience
1063 papers in training set
Top 4%
3.7%
9
Scientific Reports
3102 papers in training set
Top 34%
3.7%
10
npj Systems Biology and Applications
99 papers in training set
Top 0.9%
2.0%
11
Frontiers in Molecular Biosciences
100 papers in training set
Top 1%
1.8%
12
PLOS ONE
4510 papers in training set
Top 56%
1.5%
13
Nature Machine Intelligence
61 papers in training set
Top 2%
1.4%
14
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.4%
15
BioData Mining
15 papers in training set
Top 0.4%
1.4%
16
Biology Methods and Protocols
53 papers in training set
Top 1%
1.4%
17
Metabolites
50 papers in training set
Top 0.7%
1.3%
18
Frontiers in Bioinformatics
45 papers in training set
Top 0.5%
1.1%
19
Patterns
70 papers in training set
Top 2%
0.9%
20
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
21
Cancer Research Communications
46 papers in training set
Top 0.9%
0.9%
22
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 42%
0.8%
23
Frontiers in Pharmacology
100 papers in training set
Top 4%
0.8%
24
Cell Systems
167 papers in training set
Top 11%
0.8%
25
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.8%
0.8%
26
Cancers
200 papers in training set
Top 5%
0.8%
27
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.8%
28
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
29
Nature Communications
4913 papers in training set
Top 63%
0.7%
30
eLife
5422 papers in training set
Top 60%
0.7%