Back

Machine learning predicts metastatic progression using novel differentially expressed lncRNAs as potential markers in pancreatic cancer

Alsharoh, H.

2023-11-07 oncology
10.1101/2023.11.01.23297724 medRxiv
Show abstract

AbstractPancreatic cancer (PC) is associated with high mortality overall. Recent literature has focused on investigating long noncoding RNAs (lncRNAs) in several cancers, but studies on their functions in PC are lacking. The purpose of this study was to identify novel lncRNAs and utilize machine learning to techniques to predict metastatic cases of PC using the identified lncRNAs. To identify significantly altered expression of lncRNA in PC, data was collected from The Cancer Genome Atlas (TCGA) and RNA-sequencing (RNA-seq) transcriptomic profiles of pancreatic carcinomas were extracted for differential gene expression analysis. To assess the contribution of these lncRNAs to metastatic progression, different ML algorithms were used, including logistic regression (LR), support vector machine (SVM), random forest classifier (RFC) and eXtreme Gradient Boosting Classifier (XGBC). To improve the predictive accuracy of these models, hyperparameter tuning was performed, in addition to reducing bias through the synthetic minority oversampling technique. Out of 60,660 gene transcripts shared between 151 PC patients, 38 lncRNAs that were significantly differentially expressed were identified. To further investigate the functions of the novel lncRNAs, gene set enrichment analysis (GSEA) was performed on the population lncRNA panel. GSEA results revealed enrichment of several terms implicated in proliferation. Moreover, using the 4 ML algorithms to predict metastatic progression returned 76% accuracy for both SVM and RFC, explicitly based on the novel lncRNA panel. To the best of my knowledge, this is the first study of its kind to identify this lncRNA panel to differentiate between non-metastatic PC and metastatic PC, with many novel lncRNAs previously unmapped to PC. The ML accuracy score reveals important involvement of the detected RNAs. Based on these findings, I suggest further investigations of this lncRNA panel in vitro and in vivo, as they could be targeted for improved outcomes in PC patients, as well as assist in the diagnosis of metastatic progression based on RNA-seq data of primary pancreatic tumors.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
BMC Cancer
52 papers in training set
Top 0.1%
26.6%
2
PeerJ
261 papers in training set
Top 0.2%
9.4%
3
Scientific Reports
3102 papers in training set
Top 13%
7.0%
4
PLOS ONE
4510 papers in training set
Top 26%
6.5%
5
Frontiers in Oncology
95 papers in training set
Top 0.8%
4.5%
50% of probability mass above
6
Frontiers in Genetics
197 papers in training set
Top 1%
4.3%
7
Molecular Biology Reports
19 papers in training set
Top 0.1%
3.7%
8
International Journal of Molecular Sciences
453 papers in training set
Top 3%
3.2%
9
Biology Methods and Protocols
53 papers in training set
Top 0.5%
2.1%
10
Gene Reports
13 papers in training set
Top 0.2%
1.9%
11
Cancer Medicine
24 papers in training set
Top 0.6%
1.8%
12
Heliyon
146 papers in training set
Top 2%
1.7%
13
Genes
126 papers in training set
Top 0.9%
1.7%
14
Frontiers in Pharmacology
100 papers in training set
Top 2%
1.7%
15
Cancers
200 papers in training set
Top 3%
1.5%
16
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.5%
17
Diagnostics
48 papers in training set
Top 1%
1.4%
18
Oncotarget
15 papers in training set
Top 0.2%
1.0%
19
Informatics in Medicine Unlocked
21 papers in training set
Top 0.8%
1.0%
20
Bioscience Reports
25 papers in training set
Top 0.9%
0.9%
21
International Journal of Biological Macromolecules
65 papers in training set
Top 3%
0.9%
22
Cell Death & Disease
126 papers in training set
Top 2%
0.8%
23
Frontiers in Bioinformatics
45 papers in training set
Top 0.8%
0.8%
24
Journal of Translational Medicine
46 papers in training set
Top 2%
0.8%
25
BMC Bioinformatics
383 papers in training set
Top 8%
0.7%
26
Frontiers in Immunology
586 papers in training set
Top 9%
0.5%
27
Brain and Behavior
37 papers in training set
Top 2%
0.5%