Advancing Breast Cancer Detection: A Comprehensive Evaluation of Machine Learning Models on Mammogram Imaging

Al Muttaki, M. R. R.; Afrin, S.; Anil, A. I. A.; Shawon, M. M. H.

2025-10-10 medical education

10.1101/2025.10.08.25337620 medRxiv

Show abstract

Breast cancer, which is among the top causes of cancer-related deaths in women worldwide, demonstrates the importance of effective and rapid diagnostic tools, especially in early diagnosis, to enhance the survival level. Although machine learning (ML) advances have had an increasing number of medical imaging applications, limitations of diversity and applicability of datasets, the interpretation and efficiency of models remain a challenge to clinical use. The paper assesses eight of the most popular ML models, such as Convolutional Neural Network (CNN), Kolmogorov-Arnold Network (KAN), k-Nearest Neighbors, Support Vector Machine, XGBoost, Random Forest, Naive Bayes, and a Hybrid model based on the Mammogram Mastery dataset of Iraq-Sulaymaniyah, which consists of 745 original and 9,685 augmented mammogram images. The hybrid model has the best accuracy (0.9667) and F1 Score (0.9444), and the KAN model has the best ROC AUC (0.9760) and Log Loss (0.1421), meaning they are best in terms of discriminative power and proper calibration. Random Forest, which has the lowest false negatives (3) when compared with Fast Multinomial and Fast Text, became most secure in clinical screening since it struck a balance between sensitivity and computing efficiency. The two practical challenges, though, are the slow inference time of the KAN model (0.323 seconds) and the expensive training cost (1009.10 seconds) of the Hybrid model. These insights explain that the Hybrid and KAN models are promising means of improving the accuracy of the diagnostics, and Random Forest can serve as a practically representative tool for reducing the number of missed diagnoses. The context of future research needs to address multi-dataset validation from multiple institutions, speed optimization of inference, multi-classification, and improved interpretability that will be used in clinically integrative settings. By addressing these gaps, ML-based diagnostics have the potential to increase the rate of breast cancer diagnosis, minimizing diagnostic errors and improving patient outcomes in various clinical contexts, which can facilitate the scaling of screening services available across the world.

Advancing Breast Cancer Detection: A Comprehensive Evaluation of Machine Learning Models on Mammogram Imaging

Matching journals