Accurate classification of CNS tumors through DNA methylation data analysis of select genomic regions
Moradi, E.; Vuorinen, J.; Rodriguez-Martinez, A.; Pekkarinen, M.; Vulli, M.; Lehtipuro, S.; Fey, V.; Tabaro, F.; Hartewig, A.; Ampuja, S.; De Koker, A.; Paemel, R. V.; De Wilde, B.; Callewaert, N.; Kuusisto, M. E. L.; Teppo, H. R.; Kuittinen, O.; Nordfors, K.; Haapasalo, H.; Haapasalo, J.; Nykter, M.; Kesseli, J.; Rautajoki, K. J.
Show abstract
BackgroundCurrent clinical neuropathology practice utilizing DNA methylation information to support diagnosis of central nervous system (CNS) tumors could benefit from increased interpretability and cost reductions. MethodsWe identified and characterized limited sets of genomic regions (i.e. features) that can be used for accurate classification of CNS tumors based on DNA methylation data. The features were selected using a hybrid strategy combining filtering and Elastic Net Logistic Regression (ENLR). A Support Vector Machine (SVM)-based classifier was trained using select 1003 informative features and an established cohort of 60 diagnostic tumor classes comprising 82 tumor DNA methylation classes and 9 control classes. Validation was performed using external microarray and targeted DNA methylation sequencing cohorts. ResultsInformative regions were enriched in enhancers and associated with genes involved in neural development and morphogenesis. In the microarray validation cohort of 1993 samples representing 76 DNA methylation classes, overall accuracy of our SVM classifier was 0.96, when using 1003 features and after the differences to the molecular neuropathology classifier were evaluated based on reported final tumor diagnosis and diagnostic relevance. Its performance remained similar (overall accuracy 0.95-0.96) when the number of features was further decreased, down to 163. An accuracy of 0.94 was detected in the in-house targeted sequencing cohort of 17 cases. ConclusionsThe classification of CNS tumors is feasible and accurate based on a very limited set of genomic regions, which facilitate further method development and the interpretation of classification results, likely benefiting CNS tumor diagnostics worldwide. HighlightsO_LIHybrid feature selection identifies 1,003 CpGs strongly linked to CNS tumors C_LIO_LISVM model achieves 0.96 accuracy with confidence and top-3 predictions C_LIO_LIRobust across sequencing and microarray platforms for clinical use C_LIO_LIReliable even when reduced to 163 CpG features, lowering cost and complexity C_LI
Matching journals
The top 8 journals account for 50% of the predicted probability mass.