Back

Leveraging cis- and trans-variants to improve protein expression level prediction for proteome-wide association studies

Dong, R.; Lamb, D.; Wang, G.; DeWan, A.; Leal, S. M.

2026-05-28 genetics
10.64898/2026.05.28.728201 bioRxiv
Show abstract

Since genetic effects are often mediated through proteins, the analysis of proteomic data can provide insights into disease etiology. However, most studies lack proteomic data. To address this problem, we developed TransCisPredict to perform proteome-wide association studies (PWAS) at a biobank scale. TransCisPredict reduces computational burden through linkage-disequilibrium block selection which facilitates incorporating cis- and trans-variants to predict protein expression and performs protein-phenotype association analyses. To account for differences in protein regulatory architecture, four prediction methods are used for weight estimation, i.e., BayesR, Elastic Net, LASSO, and SuSiE. Five-fold cross-validation (CV) is used to select the optimal method for each protein. Weight estimation was performed using White British UK Biobank study subjects (N=42,644) with proteomic and genotype array data. Of the 2,920 available protein expression levels, 2,339 could be predicted with a CV-R2>0.05 when cis- and trans-variants were used. Since most methods are limited to cis-variation, for comparison only cis-variants were used for prediction yielding 466 proteins with a CV-R2>0.05. A PWAS was performed for 2,339 predicted protein expression levels and type 2 diabetes (T2D) using White British UK Biobank study subjects without proteomic data (N=364,132) followed by two-sample Mendelian randomization using a method that controls for horizontal pleiotropy for validation. Forty proteins were associated with T2D and validated. For the 466 cis-only predicted protein expression levels, three proteins were associated with T2D and validated. Incorporating both cis- and trans-variation using TransCisPredict facilitates the prediction of many more proteins compared to using cis-only variants thereby increasing the power of PWAS.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
18.7%
2
Nature Communications
4913 papers in training set
Top 5%
18.7%
3
PLOS Computational Biology
1633 papers in training set
Top 7%
4.9%
4
The American Journal of Human Genetics
206 papers in training set
Top 1%
4.3%
5
Genome Medicine
154 papers in training set
Top 2%
3.6%
50% of probability mass above
6
PLOS ONE
4510 papers in training set
Top 42%
3.1%
7
Scientific Reports
3102 papers in training set
Top 41%
3.1%
8
Genetic Epidemiology
46 papers in training set
Top 0.3%
2.5%
9
Journal of Proteome Research
215 papers in training set
Top 1.0%
2.5%
10
BMC Genomics
328 papers in training set
Top 2%
1.9%
11
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
12
Molecular Systems Biology
142 papers in training set
Top 0.6%
1.7%
13
Cell Genomics
162 papers in training set
Top 3%
1.7%
14
Communications Biology
886 papers in training set
Top 8%
1.7%
15
International Journal of Epidemiology
74 papers in training set
Top 1%
1.7%
16
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
17
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.5%
18
Bioinformatics Advances
184 papers in training set
Top 3%
1.5%
19
Human Molecular Genetics
130 papers in training set
Top 2%
1.5%
20
Genome Biology
555 papers in training set
Top 5%
1.3%
21
eLife
5422 papers in training set
Top 47%
1.3%
22
Nucleic Acids Research
1128 papers in training set
Top 13%
1.3%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 39%
1.1%
24
Molecular & Cellular Proteomics
158 papers in training set
Top 2%
0.8%
25
PLOS Genetics
756 papers in training set
Top 14%
0.8%
26
Nature Genetics
240 papers in training set
Top 7%
0.7%
27
Nature Methods
336 papers in training set
Top 6%
0.7%
28
Genome Research
409 papers in training set
Top 5%
0.6%
29
Alzheimer's & Dementia
143 papers in training set
Top 3%
0.6%
30
Frontiers in Genetics
197 papers in training set
Top 11%
0.6%