AWS Trainium vs NVIDIA CUDA for Medical Image Classification: A Comprehensive Benchmark on ChestX-ray14
Fisher, G. R.
Show abstract
We present a rigorous benchmark comparing AWS Trainium (trn1 instances) and NVIDIA CUDA (g5 instances with A10G GPUs) for training convolutional neural networks on medical image classification. Using the NIH ChestX-ray14 dataset with 112,120 chest radiographs and 14 thoracic disease labels, we evaluate ResNet-50 and ConvNeXt architectures across both platforms. Our key findings are threefold: (1) Trainium achieves virtually identical accuracy to CUDA for compatible architectures (ConvNeXt-Pico: F1=0.8007 vs 0.8027, {Delta}=0.25%), (2) modern CNN architectures using depthwise convolutions and LayerNorm (ConvNeXt-Tiny and larger) fail to compile or load on Trainium due to hardware constraints, and (3) Trainium is 3-5 x more expensive than CUDA for CNN training even with correct instance sizing. We document the substantial porting effort required, including four critical XLA-specific code modifications, and provide guidance for practitioners considering Trainium for computer vision workloads.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.