Back

The Norwegian Mother, Father, and Child cohort study (MoBa) genotyping data resource: MoBaPsychGen pipeline v.1

Corfield, E. C.; Frei, O.; Shadrin, A. A.; Rahman, Z.; Lin, A.; Athanasiu, L.; Cevdet Akdeniz, B.; Hannigan, L.; Wootton, R. E.; Austerberry, C.; Hughes, A.; Tesli, M.; Westlye, L. T.; Stefansson, H.; Stefansson, K.; Njolstad, P. R.; Magnus, P.; Davies, N. M.; Appadurai, V.; Hemani, G.; Hovig, E.; Zayats, T.; Ask, H.; Reichborn-Kjennerud, T.; Andreassen, O. A.; Havdahl, A.

2022-06-26 genetics
10.1101/2022.06.23.496289 bioRxiv
Show abstract

BackgroundThe Norwegian Mother, Father, and Child Cohort Study (MoBa) is a population-based pregnancy cohort, which includes approximately 114,500 children, 95,200 mothers, and 75,200 fathers. Genotyping of MoBa has been conducted through multiple research projects, spanning several years; using varying selection criteria, genotyping arrays, and genotyping centres. MoBa contains numerous interrelated families, which necessitated the implementation of a family-based quality control (QC) pipeline that verifies and accounts for diverse types of relatedness. MethodsThe MoBaPsychGen pipeline, comprising pre-imputation QC, phasing, imputation, and post-imputation QC, was developed based on current best-practice protocols and implemented to account for the complex structure of the MoBa genotype data. The pipeline includes QC on both single nucleotide polymorphism (SNP) and individual level. Phasing and imputation were performed using the publicly available Haplotype Reference Consortium release 1.1 panel as a reference. Information from the Medical Birth Registry of Norway and MoBa questionnaires were used to identify biological sex, year of birth, reported parent-offspring (PO) relationships, and multiple births (only available in the offspring generation). ResultsIn total, 207,569 unique individuals (90% of the unique individuals included in the study) and 6,981,748 autosomal SNPs passed the MoBaPsychGen pipeline. A further 174,462 chromosome X and 3,200 PAR SNPs are available in a subset of these individuals (N = 204,913 and 135,593, respectively). The relatedness checks performed throughout the pipeline allowed identification of within-generation and across-generation first-degree, second-degree, and third-degree relatives. The individuals passing post-imputation QC comprised 64,471 families ranging in size from singletons to 84 unique individuals (singletons are included as families as other family members may not have been genotyped, imputed, or passed post-imputation QC). The relationships identified include 287 monozygotic twin pairs, 22,884 full siblings, 117,004 PO pairs, 23,299 second-degree relative pairs, and 10,828 third-degree relative pairs. DiscussionMoBa contains a highly complex relatedness structure, with a variety of family structures including singletons, PO duos, full (mother, father, child) PO trios, nuclear families, blended families, and extended families. The availability of robustly quality-controlled genetic data for such a large cohort with a unique extended family structure will allow many novel research questions to be addressed. Furthermore, the MoBaPsychGen pipeline has potential utility in similar cohorts.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
European Journal of Human Genetics
49 papers in training set
Top 0.1%
22.7%
2
International Journal of Epidemiology
74 papers in training set
Top 0.1%
12.4%
3
PLOS ONE
4510 papers in training set
Top 20%
9.2%
4
Human Molecular Genetics
130 papers in training set
Top 0.3%
6.4%
50% of probability mass above
5
Genetics in Medicine
69 papers in training set
Top 0.3%
6.4%
6
Scientific Reports
3102 papers in training set
Top 31%
4.0%
7
Genes
126 papers in training set
Top 0.2%
3.6%
8
Journal of Medical Genetics
28 papers in training set
Top 0.2%
3.1%
9
PLOS Genetics
756 papers in training set
Top 6%
2.9%
10
Human Reproduction
18 papers in training set
Top 0.2%
2.1%
11
Nature Communications
4913 papers in training set
Top 49%
1.8%
12
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
13
Bioinformatics
1061 papers in training set
Top 8%
1.5%
14
BMC Medicine
163 papers in training set
Top 4%
1.3%
15
American Journal of Epidemiology
57 papers in training set
Top 1.0%
1.2%
16
Genetic Epidemiology
46 papers in training set
Top 0.6%
1.2%
17
Psychological Medicine
74 papers in training set
Top 1%
0.8%
18
The American Journal of Human Genetics
206 papers in training set
Top 3%
0.8%
19
BMJ Open
554 papers in training set
Top 12%
0.8%
20
Social Science & Medicine
15 papers in training set
Top 0.9%
0.8%
21
Frontiers in Genetics
197 papers in training set
Top 10%
0.8%
22
Nature
575 papers in training set
Top 15%
0.8%
23
BMC Medical Genomics
36 papers in training set
Top 1%
0.8%
24
JAMA Network Open
127 papers in training set
Top 4%
0.8%
25
Canadian Medical Association Journal
15 papers in training set
Top 0.3%
0.8%
26
Neuron
282 papers in training set
Top 9%
0.7%
27
Behavior Genetics
15 papers in training set
Top 0.1%
0.6%
28
Genes, Brain and Behavior
29 papers in training set
Top 0.5%
0.5%
29
American Journal of Medical Genetics Part A
17 papers in training set
Top 0.4%
0.5%
30
Human Genetics and Genomics Advances
70 papers in training set
Top 1%
0.5%