Back

STELLAR: A flexible ensemble learning framework integrating rare variants to enhance polygenic risk prediction

Chen, T.; Li, X.; Mazumder, R.; Zhang, H.; Lin, X.

2026-06-09 genetic and genomic medicine
10.64898/2026.06.07.26355109 medRxiv
Show abstract

Whole-exome and whole-genome sequencing technology has enabled the discovery of rare genetic variants associated with human health and diseases. However, existing statistical methods used for rare variant association testing are not well-suited for building genetic risk prediction models that jointly incorporate rare and common variants. We propose STELLAR, a flexible ensemble learning-based approach to compute rare variant polygenic risk scores (PRS) using association summary statistics to enhance conventional common variant PRS. Our method combines burden-based and penalty-based rare variant analysis and leverages functional annotation information to prioritize potentially causal variants within the prediction models. In simulation studies, PRS using STELLAR consistently showed the highest prediction accuracy compared to models using common variants alone or rare variant burdens. Applied to UK Biobank whole-exome sequencing data (n=310,831) across eight continuous and five binary traits, STELLAR significantly improved prediction accuracy, refined stratification of individuals at the highest genetic risk beyond common variants, and prioritized biologically relevant genes. STELLAR provides a scalable strategy to incorporate rare variants into PRS in addition to common variants, advancing precision risk prediction and enabling more comprehensive assessment of genetic contributions to complex diseases.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 4%
22.0%
2
The American Journal of Human Genetics
206 papers in training set
Top 0.3%
14.0%
3
Nature Genetics
240 papers in training set
Top 0.7%
9.9%
4
Genome Medicine
154 papers in training set
Top 0.6%
8.9%
50% of probability mass above
5
Cell Genomics
162 papers in training set
Top 0.9%
4.7%
6
Bioinformatics
1061 papers in training set
Top 6%
3.5%
7
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.5%
8
Genome Biology
555 papers in training set
Top 3%
2.3%
9
Nucleic Acids Research
1128 papers in training set
Top 9%
2.0%
10
Cell Systems
167 papers in training set
Top 6%
2.0%
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 30%
1.8%
12
Frontiers in Genetics
197 papers in training set
Top 4%
1.7%
13
Scientific Reports
3102 papers in training set
Top 59%
1.7%
14
Communications Biology
886 papers in training set
Top 11%
1.4%
15
Nature Medicine
117 papers in training set
Top 3%
1.4%
16
Human Genetics and Genomics Advances
70 papers in training set
Top 0.5%
1.2%
17
Nature Human Behaviour
85 papers in training set
Top 3%
1.2%
18
Nature
575 papers in training set
Top 13%
1.2%
19
Nature Neuroscience
216 papers in training set
Top 6%
0.9%
20
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
21
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.7%
22
Human Genetics
25 papers in training set
Top 0.5%
0.6%
23
PLOS Genetics
756 papers in training set
Top 17%
0.6%
24
PLOS Computational Biology
1633 papers in training set
Top 28%
0.6%
25
Science
429 papers in training set
Top 22%
0.6%