Back

Integrated RNA-seq analysis identifies ABC transporters mediating taxane export in Taxus species

Nasiri, J.; Fotuhi Siahpirani, A.; Dong, Y.; Xu, C.; Xia, Y.; Ignea, C.

2026-05-13 bioinformatics
10.64898/2026.05.10.723993 bioRxiv
Show abstract

RNA-seq datasets from medicinal yews are crucial for studying paclitaxel biosynthesis. However, cross-study data analyses are hindered by pronounced batch effects. Here, we compiled 45 RNA-seq samples from three studies across four tissues (bark, leaf, root, stem) and assessed 35 preprocessing pipelines combining six normalization strategies with five batch-effect correction approaches. Unsupervised clustering (HCA, k-means, Grade-of-Membership), evaluated using Jaccard and Adjusted Rand indices, revealed significant variability in batch effect removal. Supervised classification of tissue and project labels (Random Forest and linear/radial SVM) demonstrated improved accuracy in tissue type prediction, highlighting the effectiveness of correction methods. The processed data facilitated the identification of 189 putative ABC transporters across samples, six of which showing a strong correlation to the gene encoding 10-deacetylbaccatin-III-10{beta}-O-acetyltransferase, a key biosynthetic enzyme in the taxol pathway. High expression levels in leaf and bark further support their role in taxane intermediates trafficking in taxol biosynthesis. Structural analysis and molecular docking further supported the selection of these candidates, and the agreement between transcriptomic ranking and docking-based prioritization suggests that these transporters may participate in taxane intermediate recognition, trafficking, or export. These findings demonstrate the importance of normalization and batch effect correction in RNA-seq analysis to advance gene discovery in Taxus species and, more broadly, in plant research. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=152 SRC="FIGDIR/small/723993v1_ufig1.gif" ALT="Figure 1"> View larger version (54K): org.highwire.dtl.DTLVardef@1469162org.highwire.dtl.DTLVardef@1f2c4deorg.highwire.dtl.DTLVardef@15ad821org.highwire.dtl.DTLVardef@123676d_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Plant Communications
35 papers in training set
Top 0.1%
17.3%
2
Horticulture Research
43 papers in training set
Top 0.1%
14.5%
3
The Plant Journal
197 papers in training set
Top 0.5%
8.3%
4
Plant Direct
81 papers in training set
Top 0.3%
6.2%
5
eLife
5422 papers in training set
Top 18%
4.8%
50% of probability mass above
6
Molecular Plant
36 papers in training set
Top 0.4%
3.5%
7
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
3.0%
8
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.0%
9
PLOS ONE
4510 papers in training set
Top 45%
2.6%
10
New Phytologist
309 papers in training set
Top 2%
2.6%
11
Nature Communications
4913 papers in training set
Top 49%
1.9%
12
Physiologia Plantarum
35 papers in training set
Top 0.2%
1.7%
13
Scientific Reports
3102 papers in training set
Top 59%
1.7%
14
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
15
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
16
Communications Biology
886 papers in training set
Top 11%
1.5%
17
Advanced Science
249 papers in training set
Top 14%
1.3%
18
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.3%
19
The Plant Cell
141 papers in training set
Top 2%
1.2%
20
Frontiers in Plant Science
240 papers in training set
Top 4%
1.2%
21
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
22
Plant Biotechnology Journal
56 papers in training set
Top 1%
0.9%
23
BMC Genomics
328 papers in training set
Top 5%
0.9%
24
Scientific Data
174 papers in training set
Top 3%
0.6%