Adaptive Frequency-Spatial Dual-Stream Network (AFS-DSN) for Nasal and Paranasal Sinus CT Segmentation
Wan, S.-Y.; Chen, W.-Y.
Show abstract
Accurate segmentation of nasal and paranasal sinus structures from CT scans is critical for surgical planning and treatment evaluation in rhinology. However, the complex anatomical topology and thin-wall boundaries of these structures pose significant challenges for automated segmentation methods. We propose AFS-DSN (Adaptive Frequency-Spatial Dual-Stream Network), a novel deep learning architecture that integrates multi-scale wavelet decomposition with spatial feature learning for binary segmentation of the nasal cavity complex. Our method employs a dual-stream encoder with frequency branch utilizing three wavelet scales (db1, db2, db4) to capture 24 frequency sub-bands, enabling enhanced boundary detection in anatomically challenging regions. Cross-domain attention and adaptive routing mechanisms dynamically fuse spatial and frequency features based on local tissue characteristics. We formulate the task as binary segmentation where all five anatomical structures (maxillary sinus, sphenoid sinus, ethmoid sinus, frontal sinus, and nasal cavity) are treated as a unified foreground region against the background, prioritizing clinical boundary detection over individual structure differentiation. Evaluated on the NasalSeg dataset (130 CT volumes) with a 70/15/15 train/validation/test split, AFS-DSN achieves 94.34% {+/-} 2.30% overall Dice coefficient with statistically significant improvements in thin-wall regions (91.34% vs. 90.57% baseline, p=0.004) and statistically significant improvement in Surface Dice at 1mm tolerance (0.874 vs. 0.868 baseline, p=0.010), demonstrating enhanced boundary precision while maintaining sub-second inference time, making the method suitable for surgical planning applications where sub-millimeter accuracy is clinically relevant. To address concerns regarding model complexity, we further introduce AFS-DSN-Lite, a parameter-efficient variant (27.41M parameters) that achieves comparable performance (94.37% Dice) through depthwise separable convolutions, and validate robustness via 3-fold cross-validation (mean Dice: 94.59% {+/-} 0.31%).
Matching journals
The top 7 journals account for 50% of the predicted probability mass.