Back

Bayesian Estimation of Mosaic Loss of Chromosome Y from Bulk RNA Sequencing Data

Lin, J.-R.; Zhang, Z.

2026-05-23 genomics
10.64898/2026.05.20.726153 bioRxiv
Show abstract

Mosaic loss of chromosome Y (LOY) is a common age-associated somatic alteration in men and is typically measured from DNA-based assays. Many cohorts, however, contain bulk RNA-seq data without matched DNA-based LOY measurements. We developed a Bayesian framework to estimate the fraction of cells with LOY from male bulk RNA-seq by modeling reduced Y-linked gene expression relative to expected expression after adjustment for age, expression covariates, and autosomal/X-linked control genes. In 377 male GTEx samples, individual Y-linked genes showed negative correlations with separately obtained DNA-based LOY measurements, supporting a shared Y-expression depletion signal. The primary fast empirical Bayes estimator achieved a Pearson correlation of 0.678 with measured LOY, a mean absolute error of 1.79%, a root mean squared error of 3.72%, and 95.2% empirical coverage of measured LOY. Performance was strongest for identifying large LOY events, with an AUC of 0.964 for measured LOY greater than 20%, while fine ranking among low-LOY samples remained uncertain. A mixture/PCA hierarchical Bayesian sensitivity model provided similar validation performance and interpretable posterior quantities but did not improve point estimation. Leave-one-Y-gene-out and prior-sensitivity analyses showed that the signal was distributed across multiple Y-linked transcripts and that prior shrinkage affected calibration. In an external whole-blood RNA-seq dataset without measured LOY, estimated LOY showed a modest age-related increase, but ex vivo immune stimulation shifted RNA-derived LOY estimates and reduced multiple Y-linked transcripts, indicating transcriptional confounding. These results show that bulk RNA-seq contains usable information about LOY, especially for larger events, but RNA-derived LOY should be interpreted as a probabilistic transcriptome-based estimate rather than a direct substitute for DNA-based mosaicism measurement.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.