Back

Generative AI-assisted Bayesian-frequentist Hybrid Inference in Single-cell RNA Sequencing Analysis for Genes Associated with Alzheimer's Disease

Han, G.; Yuan, A.; Oware, K. D.; Wright, F.; Carroll, R. J.; Smith, M.; Ory, M. G.; Yan, D.; Wang, W.; Sun, Z.; Dai, Q.; Allen, C.; Dang, A.; Liu, Y.

2026-04-20 geriatric medicine
10.64898/2026.04.17.26351142 medRxiv
Show abstract

Alzheimers disease genomics and other high-dimensional omics studies demand powerful statistical methods, yet Bayesian inference remains underutilized despite its advantages in small-sample settings, owing to the prohibitive cost of eliciting reliable priors across thousands or millions of parameters. We propose an AI-assisted Bayesian-frequentist hybrid inference framework that couples large language model based prior elicitation with the hybrid inference theory of Yuan (2009). ChatGPT-4o is queried via a standardized prompt to assess the strength of evidence linking each gene to a disease of interest, and the response is mapped to an informative normal prior via a standardized effect-size calibration. Parameters for covariates of secondary interest are treated as frequentist parameters, preserving efficiency and avoiding sensitivity to mis-specified priors. We derive closed-form hybrid estimators under uniform and conjugate normal priors in linear models, establish their asymptotic equivalence to the frequentist and full Bayes estimators, and show in simulations that hybrid inference using unconditional variance estimation leads to high statistical power while accurately controlling the Type I error rate. Applied to single-cell RNA sequencing data from the ROSMAP cohort for Alzheimers disease as an example, the framework identifies biologically coherent pathways (such as gamma-secretase pathways) previously undetected. The proposed framework offers a principled and computationally scalable approach to genome-wide Bayesian analysis, with potential for broad application across omics platforms and disease settings.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Genetics
240 papers in training set
Top 0.1%
33.2%
2
The Annals of Applied Statistics
15 papers in training set
Top 0.1%
6.9%
3
Bioinformatics
1061 papers in training set
Top 4%
6.4%
4
Biometrics
22 papers in training set
Top 0.1%
4.9%
50% of probability mass above
5
Neuron
282 papers in training set
Top 3%
4.0%
6
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
7
eLife
5422 papers in training set
Top 25%
3.6%
8
Genome Research
409 papers in training set
Top 0.9%
3.6%
9
Nature Computational Science
50 papers in training set
Top 0.4%
2.1%
10
Statistics in Medicine
34 papers in training set
Top 0.1%
2.1%
11
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.9%
12
Genetic Epidemiology
46 papers in training set
Top 0.4%
1.8%
13
PLOS ONE
4510 papers in training set
Top 52%
1.8%
14
Human Brain Mapping
295 papers in training set
Top 3%
1.7%
15
Genetics
225 papers in training set
Top 3%
1.5%
16
Biostatistics
21 papers in training set
Top 0.1%
1.2%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
18
Communications Biology
886 papers in training set
Top 16%
1.1%
19
PLOS Genetics
756 papers in training set
Top 12%
1.0%
20
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.7%
0.9%
21
Nature Communications
4913 papers in training set
Top 61%
0.8%
22
BMC Genomics
328 papers in training set
Top 5%
0.8%
23
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
24
Bioinformatics Advances
184 papers in training set
Top 5%
0.6%
25
iScience
1063 papers in training set
Top 37%
0.6%
26
Cell Systems
167 papers in training set
Top 14%
0.6%
27
GENETICS
189 papers in training set
Top 2%
0.6%
28
Medical Image Analysis
33 papers in training set
Top 1%
0.5%
29
Nature Biotechnology
147 papers in training set
Top 9%
0.5%
30
Biology Methods and Protocols
53 papers in training set
Top 4%
0.5%