Back

DVPNet: A New XAI-Based Interpretable Genetic Profiling Framework Using Nucleotide Transformer and Probabilistic Circuits

Kusumoto, T.

2026-01-30 bioinformatics
10.64898/2026.01.28.695053 bioRxiv
Show abstract

In this study, we present an XAI-based genetic profiling framework that quantifies gene importance for distinguishing cancer cells from normal cells based on an interpretable AI decision process. We propose a new explainable AI (XAI) classification model that combines probabilistic circuits with the Nucleotide Transformer. By leveraging the strong feature-extraction capability of the Nucleotide Transformer, we design a tractable classification framework based on probabilistic circuits while preserving probabilistic interpretability. To demonstrate the capability of this framework, we used the GSE131907 single-cell lung cancer atlas and constructed a dataset consisting of cancer-cell and normal-cell classes. From each sample, 900 gene types were randomly selected and converted into embedding vectors using the Nucleotide Transformer, after which the classification model was trained. We then extracted class-specific probabilistic contributions from the tractable model and defined a contribution score for the cancer-cell class. Genetic profiling was performed based on these scores, providing insights into which genes and biological pathways are most important for the classification task. Notably, 1,524 of the 9,540 observed genes showed contribution scores that contradicted what would be expected from their class-wise occurrence frequencies, suggesting that the profiling goes beyond simple statistics by leveraging biological feature representations encoded by the Nucleotide Transformer. The top-ranked genes among these contradictory cases include several well-studied genes in cancer research (e.g., ITGA5, SIGLEC9, NOTUM, and TP73). Overall, these analyses go beyond traditional statistical or gene-expression-level approaches and provide new academic insights for genetic research.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 3%
10.1%
2
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.1%
9.2%
3
Advanced Science
249 papers in training set
Top 2%
8.4%
4
Nature Communications
4913 papers in training set
Top 32%
4.9%
5
Briefings in Bioinformatics
326 papers in training set
Top 2%
4.0%
6
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.5%
3.6%
7
PLOS ONE
4510 papers in training set
Top 39%
3.6%
8
Scientific Reports
3102 papers in training set
Top 37%
3.6%
9
BMC Bioinformatics
383 papers in training set
Top 3%
3.3%
50% of probability mass above
10
Nucleic Acids Research
1128 papers in training set
Top 7%
2.9%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
12
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
13
iScience
1063 papers in training set
Top 12%
1.9%
14
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.9%
15
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
16
Genome Medicine
154 papers in training set
Top 4%
1.7%
17
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.2%
1.7%
18
Cell Systems
167 papers in training set
Top 7%
1.7%
19
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
20
Patterns
70 papers in training set
Top 0.9%
1.7%
21
Genome Biology
555 papers in training set
Top 4%
1.7%
22
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
23
Journal of Biomedical Informatics
45 papers in training set
Top 0.9%
1.5%
24
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.3%
25
Frontiers in Genetics
197 papers in training set
Top 7%
1.2%
26
BioData Mining
15 papers in training set
Top 0.5%
1.2%
27
Communications Biology
886 papers in training set
Top 19%
0.9%
28
GigaScience
172 papers in training set
Top 3%
0.8%
29
Cell Genomics
162 papers in training set
Top 6%
0.8%
30
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%