Deep Learning for Automated Meningioma Segmentation: Toward Clinical Integration and Workflow Efficiency
Fenney, E.; Muralidharan, L.; Ruffle, J. K.; Pandit, A.; Millip, M.; Hammam, A.; Brookes, T.; Jabeen, F.; Colman, J.; Sarwani, O.; Alattar, K.; Efthymiou, E.; Kallam, N.; Siddiqui, J.; Marcus, H. J.; Nachev, P.; Hyare, H.
Show abstract
Background: Meningiomas are the most common primary intracranial tumors in adults, and volumetric assessment increasingly guides surveillance and treatment decisions. Automated segmentation could enable standardized volumetry but requires robust validation. Purpose: To develop a fully automated three-dimensional deep learning model for meningioma segmentation on multiparametric MRI, and to evaluate segmentation accuracy, external generalizability, failure modes, radiologist-rated clinical plausibility, and workflow feasibility. Methods: From 2024 to 2026, this retrospective study trained a custom 3D nnU-Net residual encoder model. Expert segmentations covered enhancing tumor (ET), tumor core (TC), and whole tumor (WT). Dice similarity coefficient (DSC) was the primary metric. External validation used an independent single-institution dataset (n = 310 intracranial cases) with incomplete MRI protocols. Failure modes, model equity, and inference time were assessed. A blinded multi-rater study (10 radiologists; 510 cases) rated TC segmentations using a 0-10 Likert scale, analyzed with linear mixed-effects models. Results: Model training used the BraTS Meningioma 2023 dataset (n = 1000; mean age 60.2 {+/-} 14.5; 705 female). In cross-validation, mean DSC was 0.939 for ET, 0.937 for TC, and 0.921 for WT. In external validation, mean DSC was 0.872 for TC and 0.842 for WT, despite heterogeneous protocols and incomplete sequences. Predicted TC volumes correlated strongly with reference volumes in cross-validation (r = 0.995) and external validation (r = 0.971). Most common failure modes were skull base and intraosseous tumors with performance equitable across demographic subgroups. Mean inference time was 1.2 seconds. In blinded evaluation (1120 ratings), model segmentations received higher scores than reference annotations (+0.32 BraTS; +1.38 external validation). Conclusion: A fully automated deep-learning model achieved high meningioma segmentation accuracy across multi-institutional training data and external clinical imaging. In a blinded study, model segmentation quality exceeded reference annotations, and 1.2-second inference supported workflow integration. Prospective evaluation is warranted before routine deployment.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.