Back

Efficient and Tidy Manipulation of Annotated Matrix Data with plyxp

Landis, J. T.; Love, M. I.

2026-05-11 bioinformatics
10.64898/2026.05.06.721669 bioRxiv
Show abstract

Manipulating high-dimensional omics data, such as bulk or single cell gene expression counts matrices, typically requires a bioinformatics analyst to learn domain-specific functions and syntax. These matrix-centric functions and syntax can be less intuitive than working with tidy data analytic principles, as exemplified by tools such as dplyr applied to tabular data. We propose an expressive grammar for manipulating annotated matrix data, with syntax to access, modify, and append matrix data and tabular row and column metadata, including row-wise or columnwise grouped operations. This grammar defines multiple contexts, and providing pronouns for specific recall and assignment within and across these contexts. The plyxp package is an implementation of this grammar for the R/Bioconductor ecosystem, with efficient abstractions for the SummarizedExperiment class. We demonstrate plyxps efficiency compared to alternative approaches on data manipulation tasks requiring computation across contexts.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
22.5%
2
Genome Biology
555 papers in training set
Top 0.8%
7.2%
3
Nature Communications
4913 papers in training set
Top 25%
7.2%
4
PLOS Computational Biology
1633 papers in training set
Top 5%
6.8%
5
Nature Biotechnology
147 papers in training set
Top 1%
6.8%
50% of probability mass above
6
Nature Methods
336 papers in training set
Top 2%
6.8%
7
Cell Systems
167 papers in training set
Top 3%
4.3%
8
PLOS ONE
4510 papers in training set
Top 33%
4.3%
9
BMC Bioinformatics
383 papers in training set
Top 2%
4.0%
10
Nucleic Acids Research
1128 papers in training set
Top 5%
3.7%
11
Bioinformatics Advances
184 papers in training set
Top 1%
3.6%
12
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.6%
13
Genome Research
409 papers in training set
Top 2%
2.4%
14
GigaScience
172 papers in training set
Top 1%
1.9%
15
iScience
1063 papers in training set
Top 26%
0.9%
16
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.8%
17
Scientific Reports
3102 papers in training set
Top 74%
0.7%
18
Journal of Open Source Software
22 papers in training set
Top 0.2%
0.7%
19
Nature Computational Science
50 papers in training set
Top 2%
0.7%
20
Genome Medicine
154 papers in training set
Top 9%
0.7%
21
Journal of Molecular Biology
217 papers in training set
Top 4%
0.6%
22
Communications Biology
886 papers in training set
Top 29%
0.6%
23
Nature Genetics
240 papers in training set
Top 9%
0.6%
24
BMC Genomics
328 papers in training set
Top 7%
0.6%