S-IGTD: supervised tabular-to-image topology learning via between-group correlation for multiclass classification of biological data
WU, H.-M.
Show abstract
MotivationTabular-to-image methods allow convolutional neural network (CNN)-based classifiers to analyse high-dimensional biological tables by mapping features onto a two-dimensional grid. Existing layouts are usually driven by unsupervised global correlation, which can place class-discriminative features far apart when nuisance or housekeeping covariation dominates the total covariance structure. ResultsWe present the Supervised Image Generator for Tabular Data (S-IGTD), a supervised extension of IGTD that optimizes tabular-to-image topology by replacing total-correlation distance with one minus the absolute between-group correlation, computed from class-wise feature means, under the Within-And-Between-Analysis (WABA) decomposition. We prove entrywise consistency of the supervised distance matrix under standard moment conditions and identify balanced-class settings in which S-IGTD improves a Signal Dispersion Score (SDS)-related topology objective. In controlled simulations targeting between-group signal, S-IGTD outperformed Euclidean- and correlation-distance IGTD variants in SDS, accuracy and macro-F1 score. Across five biological benchmarks ranging from 4- to 91-class classification, S-IGTD produced compact class-supervised layouts, with 24/35 Holm-adjusted significant SDS wins against seven non-reference layout controls. As a secondary downstream diagnostic, a CNN with batch normalization showed higher mean accuracy than random layouts and correlation-distance IGTD on all real datasets, and higher mean accuracy than Euclidean-distance IGTD on four of five datasets, with the clearest gains on large multiclass cancer and methylation benchmarks. Availability and implementationSource code, datasets, configuration files and reproducibility scripts are freely available at https://github.com/hanmingwu1103/S-IGTD. Contactwuhm@g.nccu.edu.tw
Matching journals
The top 1 journal accounts for 50% of the predicted probability mass.