Back

Integrating enriched case data from national laboratory testing with population-based case-control analyses: a novel statistical likelihood-ratio methodology for PS4 applied to 325,345 breast cancer cases and 671,006 controls

Allen, S.; Rowlands, C. F.; Garrett, A.; Couch, F.; Richardson, M. E.; Pesaran, T.; Pethick, J.; Lavelle, K.; McRonald, F.; Vernon, S.; Torr, B.; Loong, L.; Aungraheeta, R.; Durkie, M.; Burghel, G. J.; Callaway, A.; Robinson, R.; Field, J.; Frugtniet, B.; Palmer-Smith, S.; Grant, J.; Pagan, J.; McDevitt, T.; Snape, K.; Hanson, H.; McVeigh, T.; Loveday, C.; Jones, M.; Hardy, S.; Turnbull, C.; CanVIG-UK,

2026-05-17 genetic and genomic medicine
10.64898/2026.05.13.26353095 medRxiv
Show abstract

Background: For many evidence criteria within v3.0 of the ACMG/AMP guidelines, methodologies have been developed to empower their use outside the stipulated evidence strengths. However, no such methodology has been established for case-control data (PS4). With the release of large-scale unselected case-control datasets and expansion of nationally-collected laboratory datasets enriched for pathogenic variant carriers, there is potential to combine datasets across ascertainment contexts in a more quantitative manner using novel likelihood ratio tools. Methods: Using our published PS4-LR-Calculator, we calculated a combined log likelihood ratio (PS4-LLR) across five datasets (three unselected, and two enriched), and estimated enrichment of pathogenic variants in clinically-ascertained laboratory data using truncating variant prevalence. Results: Data were combined for 10,817 missense variants from 325,345 female breast cancer patients and 671,006 controls of Western European ancestry for five breast cancer susceptibility genes (BRCA1, BRCA2, PALB2, ATM, CHEK2). A combined LLR was produced for 4,690 missense variants; 927 variants received evidence towards pathogenicity (LLR[≥]1), and 3,242 received evidence towards benignity (LLR[≤]-1). Conclusion: This flexible, variant-level methodology combines nationally-collected 'enriched' datasets with unselected case-control cohorts, expanding the available information for case-control analysis, boosting power, enabling exploration of atypical penetrance and empowering variant classification.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Genetics in Medicine
69 papers in training set
Top 0.1%
28.5%
2
Journal of Medical Genetics
28 papers in training set
Top 0.1%
9.4%
3
International Journal of Epidemiology
74 papers in training set
Top 0.2%
7.0%
4
Genome Medicine
154 papers in training set
Top 0.9%
6.6%
50% of probability mass above
5
The American Journal of Human Genetics
206 papers in training set
Top 1%
4.3%
6
Human Mutation
29 papers in training set
Top 0.2%
3.7%
7
European Journal of Human Genetics
49 papers in training set
Top 0.3%
3.7%
8
JAMA Network Open
127 papers in training set
Top 1%
3.2%
9
Nature Communications
4913 papers in training set
Top 45%
2.7%
10
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.3%
1.9%
11
Genetic Epidemiology
46 papers in training set
Top 0.4%
1.8%
12
Scientific Reports
3102 papers in training set
Top 61%
1.5%
13
PLOS ONE
4510 papers in training set
Top 57%
1.4%
14
Breast Cancer Research
32 papers in training set
Top 0.4%
1.4%
15
npj Genomic Medicine
33 papers in training set
Top 0.5%
1.4%
16
Annals of Oncology
13 papers in training set
Top 0.7%
1.1%
17
Human Molecular Genetics
130 papers in training set
Top 2%
1.0%
18
International Journal of Cancer
42 papers in training set
Top 1%
0.8%
19
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
20
JNCI Cancer Spectrum
10 papers in training set
Top 0.6%
0.7%
21
Trials
25 papers in training set
Top 2%
0.7%
22
Nature
575 papers in training set
Top 18%
0.5%
23
BMC Genomics
328 papers in training set
Top 7%
0.5%