Back

RNAGAN: Train One and Get Four, Multipurpose Human RNA-Seq Analysis Tool with Enhanced Interpretability and Small Data Size Capability

HOU, Z.; Lee, V. H.-F.; Kwong, D. L.-W.; Guan, X.; Liu, Z.; Dai, W.

2026-03-20 bioinformatics
10.64898/2026.03.17.712527 bioRxiv
Show abstract

The advent of artificial intelligence (AI) has brought revolutionary tools for biomedical transcriptomic (RNA-level) research. However, there are persistent constraints including limited interpretations with biomedical concepts such as functional pathways, small sample sizes and substantial time and computing power requirements for AI training. To overcome these limitations, we developed RNAGAN (https://github.com/ZhaozhengHou-HKU/RNAGAN-1.0.git), an AI tool with a generative adversarial network (GAN) structure with the objective of enhancing transcriptomic analysis. The network was established based on public human datasets comprising 4.6 million single cells from multiple organs and 5,900 sequenced samples of various cancer types with normal references. A specialized pathway neural layer was embedded to extract activities of predefined pathways from the Human Molecular Signatures Database (MSigDB), or newly learned pathways from single-cell data. The structure of RNAGAN (generator and discriminator) enables four applications after one shared training procedure: 1. single-cell and bulk-level patient stratification or differential diagnosis; 2. analysis of the gene and pathway markers in a selected disease; 3. pseudo data generation when sample size is limited for downstream analysis; 4. vectorization with gene and pathway-level features learned from multiple data sets. RNGAN contributes to the efficient utilization of limited data for transcriptomic studies.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Machine Intelligence
61 papers in training set
Top 0.1%
22.0%
2
Nature Communications
4913 papers in training set
Top 19%
9.9%
3
Advanced Science
249 papers in training set
Top 2%
6.7%
4
Bioinformatics
1061 papers in training set
Top 4%
6.7%
5
Genome Medicine
154 papers in training set
Top 1%
6.2%
50% of probability mass above
6
Briefings in Bioinformatics
326 papers in training set
Top 1%
6.2%
7
Frontiers in Genetics
197 papers in training set
Top 1%
4.7%
8
Patterns
70 papers in training set
Top 0.4%
2.5%
9
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
2.3%
10
Nucleic Acids Research
1128 papers in training set
Top 10%
1.7%
11
iScience
1063 papers in training set
Top 16%
1.7%
12
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
13
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.6%
14
PLOS Computational Biology
1633 papers in training set
Top 18%
1.4%
15
Scientific Reports
3102 papers in training set
Top 63%
1.4%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.2%
17
Nature Methods
336 papers in training set
Top 5%
1.2%
18
Genome Biology
555 papers in training set
Top 7%
0.9%
19
Communications Biology
886 papers in training set
Top 20%
0.9%
20
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.9%
21
GigaScience
172 papers in training set
Top 3%
0.8%
22
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.7%
23
Database
51 papers in training set
Top 1%
0.7%
24
Nature Biomedical Engineering
42 papers in training set
Top 3%
0.6%
25
eLife
5422 papers in training set
Top 62%
0.6%
26
Cell Reports Methods
141 papers in training set
Top 6%
0.6%