Back

PAH-former: Transfer Learning for Efficient Discovery of Pulmonary Arterial Hypertension-Associated Genes

Kawakami, T.; Hosokawa, S.; Masamichi, I.; Kurozumi, A.; Tanaka, R.; Minatsuki, S.; ishida, J.; Isagawa, T.; Kodera, S.; Takeda, N.

2025-06-20 bioinformatics
10.1101/2025.06.16.660027 bioRxiv
Show abstract

Single-cell RNA sequencing (scRNA-seq) of patient samples holds promise for understanding disease mechanisms, but faces the challenge of excessive cost and effort in acquisition, processing, and data analysis, making it essential to leverage existing data. Pulmonary artery hypertension (PAH) is a refractory disease characterized by pulmonary vascular remodeling, and access to patient specimens is limited due to difficulties in tissue collection. In this study, we employed transfer learning with Geneformer, a deep learning algorithm pre-trained with scRNA-seq datasets and fine-tuned it with public PAH lung tissue data to identify the disease-relevant genes. The resulting algorithm, which we named PAH- former, demonstrated that its prediction accuracy varied significantly depending on the dataset used for fine-tuning. PAH-former enabled us to perform in silico perturbation analysis and identified PAH related genes. Loss-of-function PAH related genes in human pulmonary artery endothelial cells increased the expression of SOX18, a signature gene of PAH. This integration of artificial intelligence and biological experiments can significantly advance our understanding of molecular mechanisms of PAH.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Machine Intelligence
61 papers in training set
Top 0.1%
23.0%
2
Advanced Science
249 papers in training set
Top 0.5%
15.0%
3
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
4
Communications Biology
886 papers in training set
Top 2%
3.7%
5
Scientific Reports
3102 papers in training set
Top 35%
3.7%
6
Nature Communications
4913 papers in training set
Top 41%
3.3%
50% of probability mass above
7
Bioinformatics
1061 papers in training set
Top 6%
3.1%
8
iScience
1063 papers in training set
Top 6%
3.1%
9
Genome Medicine
154 papers in training set
Top 3%
2.4%
10
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.4%
11
Cell Genomics
162 papers in training set
Top 3%
1.9%
12
Patterns
70 papers in training set
Top 0.7%
1.8%
13
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
14
Cell Systems
167 papers in training set
Top 7%
1.7%
15
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
16
eLife
5422 papers in training set
Top 46%
1.4%
17
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.2%
18
Arteriosclerosis, Thrombosis, and Vascular Biology
65 papers in training set
Top 2%
1.0%
19
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.9%
20
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
21
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
22
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.8%
23
PLOS Genetics
756 papers in training set
Top 17%
0.7%
24
National Science Review
22 papers in training set
Top 3%
0.7%
25
American Journal of Respiratory and Critical Care Medicine
39 papers in training set
Top 1%
0.7%
26
Molecular Therapy - Nucleic Acids
24 papers in training set
Top 0.5%
0.7%
27
European Respiratory Journal
54 papers in training set
Top 2%
0.7%
28
Nature Methods
336 papers in training set
Top 7%
0.5%
29
Cell
370 papers in training set
Top 19%
0.5%
30
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.5%