Back

S-IGTD: supervised tabular-to-image topology learning via between-group correlation for multiclass classification of biological data

WU, H.-M.

2026-05-21 bioinformatics
10.64898/2026.05.19.726105 bioRxiv
Show abstract

MotivationTabular-to-image methods allow convolutional neural network (CNN)-based classifiers to analyse high-dimensional biological tables by mapping features onto a two-dimensional grid. Existing layouts are usually driven by unsupervised global correlation, which can place class-discriminative features far apart when nuisance or housekeeping covariation dominates the total covariance structure. ResultsWe present the Supervised Image Generator for Tabular Data (S-IGTD), a supervised extension of IGTD that optimizes tabular-to-image topology by replacing total-correlation distance with one minus the absolute between-group correlation, computed from class-wise feature means, under the Within-And-Between-Analysis (WABA) decomposition. We prove entrywise consistency of the supervised distance matrix under standard moment conditions and identify balanced-class settings in which S-IGTD improves a Signal Dispersion Score (SDS)-related topology objective. In controlled simulations targeting between-group signal, S-IGTD outperformed Euclidean- and correlation-distance IGTD variants in SDS, accuracy and macro-F1 score. Across five biological benchmarks ranging from 4- to 91-class classification, S-IGTD produced compact class-supervised layouts, with 24/35 Holm-adjusted significant SDS wins against seven non-reference layout controls. As a secondary downstream diagnostic, a CNN with batch normalization showed higher mean accuracy than random layouts and correlation-distance IGTD on all real datasets, and higher mean accuracy than Euclidean-distance IGTD on four of five datasets, with the clearest gains on large multiclass cancer and methylation benchmarks. Availability and implementationSource code, datasets, configuration files and reproducibility scripts are freely available at https://github.com/hanmingwu1103/S-IGTD. Contactwuhm@g.nccu.edu.tw

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.2%
52.8%
50% of probability mass above
2
Genome Biology
555 papers in training set
Top 0.6%
8.5%
3
BMC Bioinformatics
383 papers in training set
Top 2%
4.4%
4
Nature Communications
4913 papers in training set
Top 39%
3.6%
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
6
Nature Biotechnology
147 papers in training set
Top 3%
2.8%
7
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
8
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.1%
9
Nature Methods
336 papers in training set
Top 4%
1.7%
10
Genome Medicine
154 papers in training set
Top 5%
1.5%
11
Cell Systems
167 papers in training set
Top 8%
1.5%
12
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.3%
13
Patterns
70 papers in training set
Top 2%
1.0%
14
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
15
Genome Research
409 papers in training set
Top 4%
0.9%
16
PLOS ONE
4510 papers in training set
Top 65%
0.8%
17
iScience
1063 papers in training set
Top 28%
0.8%
18
Cell Reports Methods
141 papers in training set
Top 5%
0.8%
19
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
20
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
21
GigaScience
172 papers in training set
Top 4%
0.7%
22
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 49%
0.5%
23
Nature Genetics
240 papers in training set
Top 9%
0.5%
24
Computational and Structural Biotechnology Journal
216 papers in training set
Top 12%
0.5%