Back

Tsallis-Gated Autoencoder: A Nonextensive Physics-Informed Approach for Unsupervised Anomaly Detection in Glioblastoma Multiforme RNA-seq Data

Assuncao Monteiro, S.; Alves Barbosa da Silva, F.

2026-05-15 bioinformatics
10.64898/2026.05.13.724767 bioRxiv
Show abstract

Glioblastoma multiforme (GBM) is characterised by profound genomic heterogeneity and heavy-tailed gene-expression distributions that challenge conventional machine-learning methods. We introduce the Tsallis-Gated Autoencoder (Tsallis-GAE), a physics-informed architecture that replaces classical softmax attention with a learnable Tsallis q-softmax followed by mean-field smoothing iterations, motivated by recent work on curved statistical manifolds and dense associative networks. Trained on the full TCGA-GBM RNA-seq cohort (391 samples, top 2,000 high-variance genes) under a rigorous 80/20 hold-out protocol, the Tsallis-GAE achieves a mean AUC-ROC of 0.977 {+/-} 0.002 across five independent seeds, compared to 0.906 {+/-} 0.003 for a matched-capacity Vanilla autoencoder trained under the identical protocol. The matched-capacity Vanilla autoencoder is statistically indistinguishable from a LocalOutlierFactor baseline (AUC 0.906 vs 0.906), confirming that the +0.07 AUC gain over the Vanilla AE stems from the gated attention architecture rather than from the use of a neural network per se. A fixed-q Softmax-AE ablation (q {equiv}1 by construction) achieves AUC 0.976 {+/-}0.001, only +0.001 below the Tsallis-GAE (DeLong p = 0.44); the physically meaningful contribution of the learnable q is its spontaneous convergence to the non-extensive regime described below. The three attention blocks each carry an independent learnable entropic index q; across 5 seeds x3 blocks = 15 measurements, q converges spontaneously to 1.554{+/-} 0.019, strictly bounded away from the Boltzmann-Gibbs limit q = 1 and in the moderate non-extensivity regime characteristic of complex biological systems. Cross-detector validation against OneClassSVM and LocalOutlier-Factor pseudo-labels yields Tsallis-GAE AUCs of 0.998 and 0.992 respectively, indicating that the learned representation captures anomaly structure intrinsic to the data rather than the decision boundary of any single labeling heuristic. We declare that DeLongs paired test on the present test-set size (n = 79) does not certify the +0.07 AUC gap as formally significant (p{approx} 0.26); a 5-fold cross-validation over the full cohort, which would supply the needed statistical power, is left to future work. The source code is available upon reasonable request to the corresponding author.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 1%
28.2%
2
Nature Machine Intelligence
61 papers in training set
Top 0.2%
8.6%
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 7%
8.6%
4
Scientific Reports
3102 papers in training set
Top 16%
6.4%
50% of probability mass above
5
eLife
5422 papers in training set
Top 19%
4.4%
6
Communications Biology
886 papers in training set
Top 2%
3.7%
7
PLOS Computational Biology
1633 papers in training set
Top 11%
3.1%
8
Patterns
70 papers in training set
Top 0.4%
2.7%
9
Cell Systems
167 papers in training set
Top 6%
2.1%
10
Briefings in Bioinformatics
326 papers in training set
Top 3%
1.9%
11
Bioinformatics
1061 papers in training set
Top 8%
1.5%
12
Nature Neuroscience
216 papers in training set
Top 5%
1.4%
13
PLOS ONE
4510 papers in training set
Top 57%
1.4%
14
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
15
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
16
Nature Computational Science
50 papers in training set
Top 2%
0.8%
17
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
18
Chemical Science
71 papers in training set
Top 2%
0.8%
19
Nature Methods
336 papers in training set
Top 7%
0.7%
20
Ecology Letters
121 papers in training set
Top 2%
0.7%
21
Physical Review X
23 papers in training set
Top 0.8%
0.5%
22
Physical Review Research
46 papers in training set
Top 1%
0.5%
23
Computational and Structural Biotechnology Journal
216 papers in training set
Top 12%
0.5%
24
International Journal of Molecular Sciences
453 papers in training set
Top 19%
0.5%
25
Advanced Science
249 papers in training set
Top 23%
0.5%
26
Genome Biology
555 papers in training set
Top 9%
0.5%
27
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.5%
28
NeuroImage
813 papers in training set
Top 7%
0.5%
29
Science Advances
1098 papers in training set
Top 35%
0.5%