Development and validation of a lesion-supervised deep learning system for diabetic retinopathy grading according to UK national screening criteria
Chowdhury, P. N.; Akter, Y.; Chowdhury, P.; Kaur, A.; Uddin, M.; Chowdhury, A.; Chowdhury, P. K.; Muqit, M.
Show abstract
BackgroundDiabetic retinopathy (DR) is the leading cause of preventable blindness among working-age adults worldwide, yet screening coverage remains inadequate, particularly in low-and middle-income countries. Automated deep learning systems offer potential to address the global shortage of expert graders, but most existing models lack lesion-level interpretability and are not aligned with established clinical referral frameworks. We developed and validated DRAGS (Diabetic Retinopathy Automated Grading System), a hybrid deep learning model that grades DR according to the UK Diabetic Eye Screening Programme (DESP) classification and provides lesion-level explainability. MethodsWe trained and validated a DenseNet-201-based convolutional neural network on 20,281 anonymised fundus images from two tertiary eye care institutions in Bangladesh. Images were graded by fellowship-trained retinal specialists using the UK DESP framework, resulting in 10 clinically interpretable classes that combine retinopathy grade (R0-R3) and maculopathy status (M0/M1). A companion dataset of 2,936 pixel-level lesion masks spanning nine pathological categories was used to train a parallel multi-label lesion-detection head. The dataset was partitioned 70:15:15 (patient-stratified). Performance was evaluated using macro-averaged AUROC (DeLong estimator), sensitivity, specificity, F1 score, quadratically weighted Cohens {kappa}, and expected calibration error (ECE), with 95% CIs from 2000 bootstrap resamples. Grad-CAM spatial alignment with ground-truth lesion masks was assessed using Dice and IoU. This study follows the TRIPOD+AI reporting guidelines. FindingsOn the held-out test set (Component I: n = 3,044; Component II: n {approx} 440), DRAGS achieved class-wise precision, recall, and F1 scores ranging from 0{middle dot}88 to 0{middle dot}99 across all ten UK DESP grades, with advanced proliferative stages (R3-M0, R3-M1) consistently exceeding 0{middle dot}95. Overall accuracy was approximately 91{middle dot}1% and quadratically weighted Cohens {kappa} was approximately 0{middle dot}90. For referable versus non-referable DR, sensitivity was 90{middle dot}7% and specificity was 91{middle dot}9%. The companion lesion-detection head achieved macro-averaged sensitivity of 93{middle dot}9%, specificity of 99{middle dot}5%, and AUC of 0{middle dot}997 across nine lesion classes; seven of nine classes achieved AUC = 1{middle dot}00. Grad-CAM activations showed progressive spatial shift from diffuse (normal) to lesion-dense peripheral patterns (proliferative DR), with maximal agreement for microaneurysms and exudates. Mean inference time was 110-160 ms per image. InterpretationDRAGS demonstrates high diagnostic accuracy for nine-class UK DESP-aligned DR grading, with clinically interpretable lesion-level explainability on a large real-world LMIC dataset. External validation and prospective clinical evaluation are warranted before deployment. FundingThe present study received no funding.
Matching journals
The top 9 journals account for 50% of the predicted probability mass.