Strengthening Deep-learning Models for Intracranial Hemorrhage Detection: Strongly Annotated Computed Tomography Images and Model Ensembles
Kang, D.-W.; Park, G.-H.; Ryu, W.-S.; Schellingerhout, D.; Kim, M.; Kim, Y. S.; Park, C.-Y.; Lee, K.-J.; Han, M.-K.; Jeong, H.-G.; Kim, D.-E.
Show abstract
Multiple attempts at intracranial hemorrhage (ICH) detection using deep-learning techniques have been made and plagued with clinical failures. Most studies for ICH detection have insufficient data or weak annotations. We sought to determine whether a deep-learning algorithm for ICH detection trained on a strongly annotated dataset outperforms that trained on a weakly annotated dataset, and whether a weighted ensemble model that integrates separate models trained using datasets with different ICH subtypes is more accurate. We used publicly available brain CT scans from the Radiological Society of North America (27,861 CT scans, 3,528 ICHs) and AI-Hub (53,045 CT scans, 7,013 ICHs) for training datasets. For external testing, 600 CT scans (327 with ICH) from Dongguk University Medical Center and 386 CT scans (160 with ICH) from Qure.ai were used. DenseNet121, InceptionResNetV2, MobileNetV2, and VGG19 were trained on strongly and weakly annotated datasets and compared. We then developed a weighted ensemble model combining separate models trained on all ICH, subdural hemorrhage (SDH), subarachnoid hemorrhage (SAH), and small-lesion ICH cases. The final weighted ensemble model was compared to four well-known deep-learning models. Six neurologists reviewed difficult ICH cases after external testing. InceptionResNetV2, MobileNetV2, and VGG19 models outperformed when trained on strongly annotated datasets. A weighted ensemble model combining models trained on SDH, SAH, and small-lesion ICH had a higher AUC than a model only trained on all ICH cases. This model outperformed four well-known deep-learning models in terms of sensitivity, specificity, and AUC. Strongly annotated data are superior to weakly annotated data for training deep-learning algorithms. Since no model can capture all aspects of a complex task well, we developed a weighted ensemble model for ICH detection after training with large-scale strongly annotated CT scans. We also showed that a better understanding and management of cases challenging for AI and human is required to facilitate clinical use of ICH detection algorithms. Key PointsQuestion Can a weighted ensemble method and strongly annotated training datasets develop a deep-learning model with high accuracy to detect intracranial hemorrhage? Findings A deep-learning algorithm for detecting ICH trained with a strongly annotated dataset outperformed models trained with a weakly annotated dataset. After ensembling separate models that were trained with only SDH, SAH, and small-lesion ICH, a weighted ensemble model had a higher AUC. Meaning This study suggests that to enhance the performance of deep-learning models, researchers should consider the distinct imaging characteristics of each hemorrhage subtype and use strongly annotated training datasets.
Matching journals
The top 8 journals account for 50% of the predicted probability mass.