Back

Design and quality control of large-scale two-sample Mendelian randomisation studies

Fatty Acids in Cancer Mendelian Randomization Collaboration, ; Haycock, P. C.; Borges, M. C.; Burrows, K.; Lemaitre, R. N.; Harrison, S.; Burgess, S.; Chang, X.; Westra, J.; Khankari, N. K.; Tsilidis, K. K.; Gaunt, T.; Hemani, G.; Zheng, J.; Truong, T.; OMara, T.; Spurdle, A. B.; Law, M. H.; Slager, S.; Birmann, B.; Hosnijeh, F. S.; Mariosa, D.; Amos, C. I.; Hung, R. J.; Zheng, W.; Gunter, M. J.; Davey Smith, G.; Relton, C.; Martin, R. M.

2021-08-01 epidemiology
10.1101/2021.07.30.21260578 medRxiv
Show abstract

BackgroundMendelian randomization studies are susceptible to meta-data errors (e.g. incorrect specification of the effect allele column) and other analytical issues that can introduce substantial bias into analyses. We developed a quality control pipeline for the Fatty Acids in Cancer Mendelian Randomization Collaboration (FAMRC) that can be used to identify and correct for such errors. MethodsWe invited cancer GWAS to share summary association statistics with the FAMRC and subjected the collated data to a comprehensive QC pipeline. We identified meta data errors through comparison of study-specific statistics to external reference datasets (the NHGRI-EBI GWAS catalog and 1000 genome super populations) and other analytical issues through comparison of reported to expected genetic effect sizes. Comparisons were based on three sets of genetic variants: 1) GWAS hits for fatty acids, 2) GWAS hits for cancer and 3) a 1000 genomes reference set. ResultsWe collated summary data from six fatty acid and 49 cancer GWAS. Meta data errors and analytical issues with the potential to introduce substantial bias were identified in seven studies (13%). After resolving analytical issues and excluding unreliable data, we created a dataset of 219,842 genetic associations with 87 cancer types. ConclusionIn this large MR collaboration, 13% of included studies were affected by a substantial meta data error or analytical issue. By increasing the integrity of collated summary data prior to their analysis, our protocol can be used to increase the reliability of post-GWAS analyses. Our pipeline is available to other researchers via the CheckSumStats package (https://github.com/MRCIEU/CheckSumStats).

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
International Journal of Epidemiology
74 papers in training set
Top 0.1%
60.4%
50% of probability mass above
2
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.1%
4.0%
3
PLOS ONE
4510 papers in training set
Top 38%
3.7%
4
Trials
25 papers in training set
Top 0.7%
1.9%
5
Nature Communications
4913 papers in training set
Top 53%
1.5%
6
BMC Medicine
163 papers in training set
Top 4%
1.4%
7
American Journal of Epidemiology
57 papers in training set
Top 0.9%
1.4%
8
BMJ Open
554 papers in training set
Top 10%
1.2%
9
International Journal of Cancer
42 papers in training set
Top 0.9%
1.0%
10
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
11
The American Journal of Human Genetics
206 papers in training set
Top 3%
0.8%
12
JAMA Network Open
127 papers in training set
Top 4%
0.8%
13
Genome Biology
555 papers in training set
Top 7%
0.8%
14
Patterns
70 papers in training set
Top 2%
0.8%
15
Frontiers in Oncology
95 papers in training set
Top 3%
0.8%
16
Human Mutation
29 papers in training set
Top 0.7%
0.8%
17
BMC Medical Genomics
36 papers in training set
Top 1%
0.8%
18
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.8%
19
JAMIA Open
37 papers in training set
Top 1%
0.8%
20
Journal of Clinical Epidemiology
28 papers in training set
Top 0.6%
0.8%
21
Nature
575 papers in training set
Top 15%
0.8%
22
European Journal of Epidemiology
40 papers in training set
Top 0.7%
0.8%
23
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.7%
24
Genetic Epidemiology
46 papers in training set
Top 1.0%
0.7%
25
BMC Bioinformatics
383 papers in training set
Top 8%
0.7%
26
F1000Research
79 papers in training set
Top 6%
0.5%
27
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.6%
0.5%
28
BMC Research Notes
29 papers in training set
Top 0.9%
0.5%