Back

Accurate classification of CNS tumors through DNA methylation data analysis of select genomic regions

Moradi, E.; Vuorinen, J.; Rodriguez-Martinez, A.; Pekkarinen, M.; Vulli, M.; Lehtipuro, S.; Fey, V.; Tabaro, F.; Hartewig, A.; Ampuja, S.; De Koker, A.; Paemel, R. V.; De Wilde, B.; Callewaert, N.; Kuusisto, M. E. L.; Teppo, H. R.; Kuittinen, O.; Nordfors, K.; Haapasalo, H.; Haapasalo, J.; Nykter, M.; Kesseli, J.; Rautajoki, K. J.

2025-10-13 oncology
10.1101/2025.10.07.25337348 medRxiv
Show abstract

BackgroundCurrent clinical neuropathology practice utilizing DNA methylation information to support diagnosis of central nervous system (CNS) tumors could benefit from increased interpretability and cost reductions. MethodsWe identified and characterized limited sets of genomic regions (i.e. features) that can be used for accurate classification of CNS tumors based on DNA methylation data. The features were selected using a hybrid strategy combining filtering and Elastic Net Logistic Regression (ENLR). A Support Vector Machine (SVM)-based classifier was trained using select 1003 informative features and an established cohort of 60 diagnostic tumor classes comprising 82 tumor DNA methylation classes and 9 control classes. Validation was performed using external microarray and targeted DNA methylation sequencing cohorts. ResultsInformative regions were enriched in enhancers and associated with genes involved in neural development and morphogenesis. In the microarray validation cohort of 1993 samples representing 76 DNA methylation classes, overall accuracy of our SVM classifier was 0.96, when using 1003 features and after the differences to the molecular neuropathology classifier were evaluated based on reported final tumor diagnosis and diagnostic relevance. Its performance remained similar (overall accuracy 0.95-0.96) when the number of features was further decreased, down to 163. An accuracy of 0.94 was detected in the in-house targeted sequencing cohort of 17 cases. ConclusionsThe classification of CNS tumors is feasible and accurate based on a very limited set of genomic regions, which facilitate further method development and the interpretation of classification results, likely benefiting CNS tumor diagnostics worldwide. HighlightsO_LIHybrid feature selection identifies 1,003 CpGs strongly linked to CNS tumors C_LIO_LISVM model achieves 0.96 accuracy with confidence and top-3 predictions C_LIO_LIRobust across sequencing and microarray platforms for clinical use C_LIO_LIReliable even when reduced to 163 CpG features, lowering cost and complexity C_LI

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Neuropathology and Applied Neurobiology
14 papers in training set
Top 0.1%
19.0%
2
Scientific Reports
3102 papers in training set
Top 6%
10.3%
3
Journal of the Neurological Sciences
17 papers in training set
Top 0.1%
4.9%
4
BMC Bioinformatics
383 papers in training set
Top 2%
4.0%
5
PLOS ONE
4510 papers in training set
Top 38%
3.7%
6
Biology Methods and Protocols
53 papers in training set
Top 0.3%
3.7%
7
Diagnostics
48 papers in training set
Top 0.6%
2.8%
8
Neuro-Oncology Advances
24 papers in training set
Top 0.2%
2.8%
50% of probability mass above
9
Epigenetics
43 papers in training set
Top 0.3%
2.4%
10
PeerJ
261 papers in training set
Top 5%
2.1%
11
Neuro-Oncology
30 papers in training set
Top 0.4%
1.9%
12
Cancers
200 papers in training set
Top 2%
1.9%
13
Clinical Epigenetics
53 papers in training set
Top 0.5%
1.7%
14
Clinical Chemistry
22 papers in training set
Top 0.5%
1.4%
15
Acta Neuropathologica
51 papers in training set
Top 0.8%
1.4%
16
Journal of Neurology
26 papers in training set
Top 0.8%
1.2%
17
iScience
1063 papers in training set
Top 21%
1.2%
18
BMC Cancer
52 papers in training set
Top 2%
1.2%
19
Frontiers in Oncology
95 papers in training set
Top 3%
1.2%
20
Brain, Behavior, & Immunity - Health
27 papers in training set
Top 0.3%
1.1%
21
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
1.1%
22
The Journal of Pathology
22 papers in training set
Top 0.3%
1.1%
23
Brain and Behavior
37 papers in training set
Top 1.0%
1.0%
24
Gigabyte
60 papers in training set
Top 1%
1.0%
25
npj Systems Biology and Applications
99 papers in training set
Top 2%
1.0%
26
Journal of Translational Medicine
46 papers in training set
Top 2%
1.0%
27
International Journal of Cancer
42 papers in training set
Top 0.9%
1.0%
28
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.9%
29
eBioMedicine
130 papers in training set
Top 4%
0.8%
30
Acta Neuropathologica Communications
81 papers in training set
Top 1%
0.8%