Back

Estimating protein isoform abundances with PAQu

Testa, L.; Klei, L.; Rengle, A.; Yocum, A.; Lewis, D. A.; Devlin, B.; Roeder, K.; MacDonald, M. L.

2026-04-22 genomics
10.64898/2026.04.20.719668 bioRxiv
Show abstract

A single gene can encode multiple versions of a protein, dubbed isoforms, with varying functionality. Cellular control of isoform abundances is critical for multiple aspects of biology and is only partially regulated by transcript levels. While long-read sequencing facilitates transcript quantification, quantifying the resulting protein isoforms on a large scale is a major challenge, complicating biological interpretation of transcript alterations. Standard "bottom up" mass spectrometry can assess only short portions of isoforms called peptides, and these peptides often map onto more than one isoform. We introduce PAQu, a novel Bayesian method that leverages multiomic information from the peptidome and transcriptome to provide accurate estimates of isoform abundance even when peptide mapping is ambiguous. PAQu offers several advantages over existing methods in a unified framework. It provides uncertainty quantification, integrates multiomic information for improved accuracy, and provides a rigorous framework for hypothesis testing. Extensive simulations show that PAQu consistently outperforms competing methods in detecting differentially expressed protein isoforms and estimating their abundances. We use PAQu to investigate differences in isoform abundance levels between people with schizophrenia and control subjects, confirming a long held hypothesis that levels of the C4A isoform of Complement Component 4 are increased in schizophrenia while C4B is not. These results demonstrate that PAQu can identify significant variations in isoform abundance levels not previously possible.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.4%
38.7%
2
Journal of Proteome Research
215 papers in training set
Top 0.6%
4.4%
3
PLOS ONE
4510 papers in training set
Top 35%
4.1%
4
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
50% of probability mass above
5
Cell Reports Methods
141 papers in training set
Top 0.8%
3.7%
6
BMC Bioinformatics
383 papers in training set
Top 3%
3.7%
7
Genetic Epidemiology
46 papers in training set
Top 0.2%
3.3%
8
Nature Methods
336 papers in training set
Top 3%
2.7%
9
Genome Biology
555 papers in training set
Top 4%
1.9%
10
Scientific Reports
3102 papers in training set
Top 55%
1.8%
11
Genome Research
409 papers in training set
Top 2%
1.7%
12
Nature Biotechnology
147 papers in training set
Top 4%
1.7%
13
Metabolites
50 papers in training set
Top 0.5%
1.7%
14
Nature Communications
4913 papers in training set
Top 53%
1.5%
15
Biological Psychiatry
119 papers in training set
Top 2%
1.3%
16
Molecular & Cellular Proteomics
158 papers in training set
Top 1%
1.3%
17
Nature Machine Intelligence
61 papers in training set
Top 3%
1.0%
18
iScience
1063 papers in training set
Top 24%
1.0%
19
Molecular Systems Biology
142 papers in training set
Top 1%
0.9%
20
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
0.9%
21
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
22
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
23
PLOS Genetics
756 papers in training set
Top 14%
0.8%
24
BMC Genomics
328 papers in training set
Top 6%
0.7%
25
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.7%
26
Cell Genomics
162 papers in training set
Top 7%
0.7%
27
eLife
5422 papers in training set
Top 63%
0.5%
28
Bioinformatics Advances
184 papers in training set
Top 5%
0.5%
29
Patterns
70 papers in training set
Top 3%
0.5%
30
mSystems
361 papers in training set
Top 8%
0.5%