Validity of deep learning algorithms for detecting wheezes and crackles from lung sound recordings in adults
Melbye, H.; Ravn, J.; Pabiszczak, M.; Bongo, L. A.; Aviles Solis, J. C.
Show abstract
We validated our state-of-the-art deep learning algorithm for detection of wheezes and crackles in sound files by comparing the classification of our algorithm with those of human experts. We had two validation sets classified by experienced raters that were not used to train the algorithm with 615 (A) and 120 (B) sound files, respectively. We calculated Area Under Curve (AUC) of the algorithms probability scores for wheezes and crackles. We dichotomized the scores and calculated sensitivity and specificity as well as kappa agreement. In set A, the AUC was 0.88 (95% CI 0.84 - 0.92) for wheezes and 0.88 (95% CI 0.84 - 0.92) for crackles. The sensitivities and specificities of the labels were 81% and 89% for wheezes and 67% and 96% for crackles. In set B, the kappa agreement between the algorithm and the validation set was 0.78 (95% CI 0.58 - 0.99) for wheezes and 0.75 (95% CI 0.59 - 0.92) for crackles. The 24 observers who had rated the same 120 sound files agreed less with the reference classification with a mean kappa of 0.68 for wheezes and 0.55 for crackles. We found the algorithm to be superior to doctors in detecting wheezes and crackles in lung sound files.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.