Assessing the Impact of Downsampled ECGs and Alternative Loss Functions in Multi-Label Classification of 12-Lead ECGs

Singstad, B.-J.; Muten, E. M.

2022-11-18 cardiovascular medicine

10.1101/2022.11.16.22282373 medRxiv

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWThe electrocardiogram (ECG) is an almost universally accessible diagnostic tool for heart disease. An ECG is measured by using an electrocardiograph, and todays electrocardiographs use built-in software to interpret the ECGs automatically after they are recorded. However, these algorithms show limited performance, and therefore clinicians usually have to manually interpret the ECG, regardless of whether an algorithm has interpreted the ECG or not. Manual interpretation of the ECG can be time-consuming and require specific skills. Therefore, a better algorithm is clearly needed to make correct ECG interpretations more accessible and time efficient. Algorithms based on artificial intelligence have shown promising performance in many fields, including ECG interpretation, over the last few years and might represent an alternative to manual ECG interpretation. In this study, we used a dataset with 88253 12-lead ECGs from multiple databases, annotated with SNOMED-CT codes by medical experts. We employed a supervised convolutional neural network with an Inception architecture to classify 30 of the most frequent annotated diagnoses in the dataset. Each patient could have more than one diagnosis, which makes this a multi-label classification. We compared the Inception models performance while applying different preprocessing methods on the ECGs and different model settings during 10-folded cross-validation. We compared the models classification performance using binary cross-entropy (BCE) loss and double soft F1 loss. Furthermore, we compared the classification performance when downsampling the original sampling rate of the input ECG. Finally, we trained 30 interpretable linear models to provide class activation maps to explain the relative importance of each sample in the ECG with respect to the 30 diagnoses considered in this study. Due to the heavily imbalanced class distribution in our dataset, we placed the most emphasis on the F1 score when evaluating the performance of the models. Our results show that the best performance in terms of F1-score was seen when the Inception model used double soft F1 as the loss function and ECGs downsampled to 75Hz. This model achieved an F1 score of 0.420 {+/-} 0.017, accuracy = 0.954 {+/-} 0.002, and an AUROC score of 0.832 {+/-} 0.019. An aggregation of the generated saliency maps, achieved using Local Interpretable Model-Agnostic Explanations (LIME), showed that the Inception model paid the most attention to the limb leads and the augmented leads and less importance to the precordial leads. One of the more significant contributions that emerge from this study is the use of aggregated saliency maps to obtain ECG lead importance for different diagnoses. In addition, we emphasized the relevance of evaluating different loss functions, and in this specific case, we found double soft F1 loss to be slightly better than BCE. Finally, we found it somewhat surprising that downsampling the ECG led to higher performance compared to the original 500Hz sampling rate. These findings contribute in several ways to our understanding of the artificial intelligence-based interpretation of ECGs, but further studies should be carried out to validate these findings in other datasets from other patient cohorts.

Assessing the Impact of Downsampled ECGs and Alternative Loss Functions in Multi-Label Classification of 12-Lead ECGs

Matching journals