Transformers Outperform ConvNets for Root Segmentation: A Systematic Comparison Across Nine Datasets
Smith, A. G.; Lamprinidis, S.; Seethepalli, A.; York, L. M.; Han, E.; Mohl, P.; Boulata, K.; Thorup-Kristensen, K.; Petersen, J.
Show abstract
Root segmentation is a fundamental yet challenging task in image-based plant phenotyping. We present the first systematic comparison of Transformer and Convolutional Neural Network (ConvNet) architectures for root segmentation, evaluating 21 architectures across nine diverse datasets and comparing pre-trained models to training from scratch. Transformer-based models significantly outperform ConvNets for segmentation accuracy and root-diameter agreement. Pre-training significantly improves mean Dice from 0.623 to 0.666 (p = 3.3 x 10-10). We also find that Transformers benefit more from pre-training than ConvNets, with Dice improvements of +0.072 versus +0.022 (p = 3.7 x 10-4), supporting the hypothesis that fine-tuned Transformers transfer more effectively across large domain gaps. Among evaluated models, MobileSAM achieved the highest Dice score while maintaining computational efficiency. Dataset choice explained far more performance variance (70.9%) than model architecture (6.7%), suggesting that data curation matters more than model selection.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.