Back

Deep Discriminative Fine-Tuning for Cancer TypeClassification

Harley, A.

2019-11-13 cancer biology
10.1101/841056 bioRxiv
Show abstract

Determining the primary site of origin for metastatic tumors is one of the open problems in cancer care because the efficacy of treatment often depends on the cancer tissue of origin. Classification methods that can leverage tumor genomic data and predict the site of origin are therefore of great value. Because tumor DNA point mutation data is very sparse, only limited accuracy (64.5% for 12 tumor classes) was previously demonstrated by methods that rely on point mutations as features (1). Tumor classification accuracy can be greatly improved (to over 90% for 33 classes) by relying on gene expression data (2). However, this additional data is often not readily available in clinical setting, because point mutations are better profiled and targeted by clinical mutational profiling. Here we sought to develop an accurate deep transfer learning and fine-tuning method for tumor sub-type classification, where predicted class is indicative of the primary site of origin. Our method significantly outperforms the state-of-the-art for tumor classification using DNA point mutations, reducing the error by more than 30% at the same time discriminating over many more classes on The Cancer Genome Atlas (TCGA) dataset. Using our method, we achieve state-of-the-art tumor type classification accuracy of 78.3% for 29 tumor classes relying on DNA point mutations in the tumor only.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
npj Precision Oncology
48 papers in training set
Top 0.1%
15.1%
2
Nature Communications
4913 papers in training set
Top 16%
10.7%
3
Genome Medicine
154 papers in training set
Top 0.4%
10.3%
4
Scientific Reports
3102 papers in training set
Top 11%
7.4%
5
Cell Reports Medicine
140 papers in training set
Top 1%
3.7%
6
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.2%
3.7%
50% of probability mass above
7
Communications Biology
886 papers in training set
Top 6%
1.9%
8
PLOS ONE
4510 papers in training set
Top 49%
1.9%
9
Cancers
200 papers in training set
Top 2%
1.9%
10
Molecular Cancer
14 papers in training set
Top 0.2%
1.9%
11
npj Digital Medicine
97 papers in training set
Top 2%
1.9%
12
Clinical Cancer Research
58 papers in training set
Top 1%
1.5%
13
PLOS Computational Biology
1633 papers in training set
Top 17%
1.5%
14
Bioinformatics
1061 papers in training set
Top 8%
1.4%
15
Nucleic Acids Research
1128 papers in training set
Top 12%
1.4%
16
Cancer Research Communications
46 papers in training set
Top 0.6%
1.4%
17
Cancer Research
116 papers in training set
Top 2%
1.4%
18
Cancer Medicine
24 papers in training set
Top 1%
1.0%
19
Cell Reports Methods
141 papers in training set
Top 4%
0.9%
20
eBioMedicine
130 papers in training set
Top 3%
0.9%
21
Clinical Chemistry
22 papers in training set
Top 0.7%
0.8%
22
Translational Oncology
18 papers in training set
Top 0.4%
0.8%
23
Journal of Translational Medicine
46 papers in training set
Top 2%
0.8%
24
Laboratory Investigation
13 papers in training set
Top 0.2%
0.8%
25
Frontiers in Oncology
95 papers in training set
Top 3%
0.8%
26
Modern Pathology
21 papers in training set
Top 0.4%
0.8%
27
Med
38 papers in training set
Top 0.9%
0.7%
28
npj Breast Cancer
18 papers in training set
Top 0.2%
0.7%
29
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
30
Annals of Oncology
13 papers in training set
Top 1%
0.7%