The Paipu framework enables creation of a large-scale mammalian cancer transcriptomics atlas

Smith, B. S.; Smith, L. A.; Lee, J.-H.; Cahill, J. A.; Graim, K.

2026-05-18 bioinformatics

10.64898/2026.05.14.725161 bioRxiv

Show abstract

A plethora of studies have identified shared molecular mechanisms involved in tumor development across humans and other mammalian species. While these two-species analyses advance understanding of human disease, extending them across many species would provide evolutionary insight into molecular mechanisms driving human cancers. However, this expansion requires knowledge transfer and harmonization across species. Genomic differences between species, including variation in genome annotation quality, have historically hindered multi-species large-scale atlas creation. To overcome these challenges, we present Paipu, a comprehensive pipeline designed to streamline querying, preprocessing, harmonization, and retrieval of large-scale RNA-seq data and associated metadata from the NCBI Sequence Read Archive (SRA). Paipu facilitates multi-species analysis by creating a harmonized atlas from user-defined search terms and species. It consists of three components: reference genome preparation, SRA metadata retrieval, and RNA-seq data processing. We apply Paipu to 188 cancer-related terms in 239 non-human mammalian species, creating a harmonized atlas of 3,484 RNA-seq samples spanning 17 species and 35 cancers. This pan-mammalian pan-cancer atlas enables myriad comparative genomics analyses that leverage genetic variation to better understand rare human cancers. As such, Paipu serves as a resource for cross-species cancer genomics and supports atlas creation for any set of species and search terms. Graphical Abstract

The Paipu framework enables creation of a large-scale mammalian cancer transcriptomics atlas

Matching journals