Back

A machine learning model to aid detection of familial hypercholesterolaemia

Gratton, J.; Futema, M.; Humphries, S. E.; Hingorani, A. D.; Finan, C.; Schmidt, A. F.

2022-06-17 cardiovascular medicine
10.1101/2022.06.17.22276540 medRxiv
Show abstract

2.TEXT ABSTRACT AND KEYWORDSO_ST_ABSBackground and AimsC_ST_ABSPeople with monogenic familial hypercholesterolaemia (FH) are at an increased risk of premature coronary heart disease and death. Currently there is no population screening strategy for FH, and most carriers are identified late in life, delaying timely and cost-effective interventions. The aim was to derive an algorithm to improve detection of people with monogenic FH. MethodsA penalised (LASSO) logistic regression model was used to identify predictors that most accurately identified people with a higher probability of FH in 139,779 unrelated participants of the UK Biobank, including 488 FH carriers. Candidate predictors included information on medical and family history, anthropometric measures, blood biomarkers, and an LDL-C polygenic score (PGS). Model derivation and evaluation was performed using a random split of 80% training and 20% testing data. ResultsA 14-variable algorithm for FH was derived, where the top five variables included triglyceride, LDL-C, and apolipoprotein A1 concentrations, self-reported statin use, and an LDL-C PGS. Model evaluation in the test data resulted in an area under the curve (AUC) of 0.77 (95% CI: 0.71; 0.83), and appropriate calibration (calibration-in-the-large: -0.07 (95% CI: -0.28; 0.13); calibration slope: 1.02 (95% CI: 0.85; 1.19)). Employing this model to prioritise people with suspected monogenic FH is anticipated to reduce the number of people requiring sequencing by 88% compared to a population-wide sequencing screen, and by 18% compared to prioritisation based on LDL-C and statin use. ConclusionsThe detection of individuals with monogenic FH can be improved with the inclusion of additional non-genetic variables and a PGS for LDL-C.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Circulation: Genomic and Precision Medicine
42 papers in training set
Top 0.1%
42.1%
2
International Journal of Epidemiology
74 papers in training set
Top 0.2%
7.2%
3
Circulation
66 papers in training set
Top 0.5%
6.8%
50% of probability mass above
4
Atherosclerosis
29 papers in training set
Top 0.4%
3.8%
5
The Lancet Digital Health
25 papers in training set
Top 0.1%
3.8%
6
BMC Medicine
163 papers in training set
Top 1%
3.8%
7
Nature Communications
4913 papers in training set
Top 46%
2.2%
8
PLOS ONE
4510 papers in training set
Top 50%
1.9%
9
PLOS Medicine
98 papers in training set
Top 2%
1.8%
10
British Journal of Clinical Pharmacology
21 papers in training set
Top 0.3%
1.8%
11
Frontiers in Cardiovascular Medicine
49 papers in training set
Top 2%
1.6%
12
Genome Medicine
154 papers in training set
Top 5%
1.6%
13
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.4%
1.6%
14
Open Heart
19 papers in training set
Top 0.9%
1.0%
15
European Heart Journal - Digital Health
15 papers in training set
Top 0.5%
1.0%
16
European Journal of Preventive Cardiology
13 papers in training set
Top 0.8%
0.9%
17
European Heart Journal
16 papers in training set
Top 0.6%
0.9%
18
European Journal of Human Genetics
49 papers in training set
Top 1.0%
0.9%
19
Scientific Reports
3102 papers in training set
Top 72%
0.8%
20
Diabetologia
36 papers in training set
Top 0.8%
0.8%
21
BMJ Open
554 papers in training set
Top 12%
0.8%
22
Arteriosclerosis, Thrombosis, and Vascular Biology
65 papers in training set
Top 2%
0.7%
23
Journal of the American College of Cardiology
12 papers in training set
Top 0.7%
0.7%
24
Heart
10 papers in training set
Top 1.0%
0.5%
25
Journal of Clinical Medicine
91 papers in training set
Top 8%
0.5%