Back

Glycan Reachability Analysis: A Bottleneck-Aware Frameworkfor Inferring Tissue-Specic Glycan Biosynthetic Potential fromTranscriptomics

Matsui, Y.

2026-03-27 systems biology
10.64898/2026.03.24.714093 bioRxiv
Show abstract

Glycan biosynthesis requires the coordinated expression of glycosyltransferases, modifying enzymes, and nucleotide sugar donor synthesis/transport machinery. Existing computational tools predict glycan structures from gene expression using binary thresholds, losing quantitative information about relative biosynthetic capacity across tissues. Here we present glycan biosynthetic reachability analysis-a framework that quantifies the capacity of a tissue to synthesize a given glycan structure based on the expression of the least-abundant pathway component-integrating expression-based Z-scores across biosynthetic pathway steps using AND/OR logic and a minimum-aggregation (bottleneck) principle to produce continuous, tissue-comparable scores of glycan biosynthetic capacity. Applied to 17,382 RNA-seq samples across 54 human tissues from GTEx v8, we demonstrate that reachability scores capture biologically meaningful tissue-specific variation in sialyl Lewis X (sLeX), ganglioside, and heparan sulfate biosynthetic capacity. Critically, we show that binary threshold approaches lose the quantitative resolution needed to distinguish tissues where all pathway enzymes are detectable yet expressed at uniformly low levels (e.g., pancreas for sLeX: 96% binary-positive, reachability Z = -1.86) from tissues with genuinely high biosynthetic capacity. Systematic comparison of aggregation functions confirms that min-aggregation (bottleneck principle) consistently outperforms naive mean expression in predicting downstream biological outcomes. We emphasize that reachability scores reflect transcriptomic potential -- not proven enzymatic activity or glycan abundance -- as post-transcriptional regulation, protein stability, and substrate competition are not captured. Our method requires only bulk RNA-seq data, no training data or kinetic parameters, and can in principle be applied to other species through ortholog mapping, though cross-species calibration and biological validity remain context-dependent.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.