Back

Host genetics and COVID-19 severity: increasing the accuracy of latest severity scores by Boolean quantum features

Martelloni, G.; Turchi, A.; Fallerini, C.; Degl'Innocenti, A.; Baldassarri, M.; GEN-COVID Multicenter study, ; Olmi, S.; Furini, S.; Renieri, A.

2023-02-07 genomics
10.1101/2023.02.06.527291 bioRxiv
Show abstract

The impact of common and rare variants in COVID-19 host genetics is widely studied in [16]. Here, common and rare variants were used to define an interpretable machine learning model for predicting COVID-19 severity. Firstly, variants were converted into sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. After that, the Boolean features, selected by these logistic models, were combined into an Integrated PolyGenic Score, the so called IPGS, which offers a very simple description of the contribution of host genetics in COVID-19 severity. IPGS leads to an accuracy of 55-60% on different cohorts and, after a logistic regression with in input both IPGS and the age, it leads to an accuracy of 75%. The goal of this paper is to improve the previous results, using the information on the host organs involved in the disease. We generalized the IPGS adding a statistical weight for each organ, through the transformation of Boolean features into "Boolean quantum features", inspired by the Quantum Mechanics. The organs coefficients were set via the application of the genetic algorithm Pygad and, after that, we defined two new Integrated PolyGenic Score ([Formula] and [Formula]). By applying a logistic regression with both [Formula] (or indifferently [Formula]) and age as input, we reach an accuracy of 84-86%, thus improving the results previously shown in [16] by a factor of 10%.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.8%
18.8%
2
Journal of Personalized Medicine
28 papers in training set
Top 0.1%
8.5%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.3%
7.3%
4
Frontiers in Genetics
197 papers in training set
Top 1%
4.4%
5
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
3.7%
6
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
7
PLOS ONE
4510 papers in training set
Top 38%
3.6%
8
BMC Medical Genomics
36 papers in training set
Top 0.2%
3.1%
50% of probability mass above
9
Heliyon
146 papers in training set
Top 0.6%
2.8%
10
Genes
126 papers in training set
Top 0.6%
2.1%
11
BMC Bioinformatics
383 papers in training set
Top 4%
2.1%
12
Physical Review E
95 papers in training set
Top 0.5%
1.9%
13
Briefings in Bioinformatics
326 papers in training set
Top 3%
1.8%
14
International Journal of Molecular Sciences
453 papers in training set
Top 7%
1.7%
15
Vaccines
196 papers in training set
Top 1%
1.7%
16
Bioinformatics
1061 papers in training set
Top 8%
1.3%
17
Communications Biology
886 papers in training set
Top 12%
1.3%
18
Biology
43 papers in training set
Top 1%
1.2%
19
Life
27 papers in training set
Top 0.3%
0.8%
20
IEEE Access
31 papers in training set
Top 0.8%
0.8%
21
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.5%
0.8%
22
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
23
Journal of Clinical Medicine
91 papers in training set
Top 6%
0.8%
24
Cells
232 papers in training set
Top 6%
0.8%
25
Journal of Biomedical Informatics
45 papers in training set
Top 2%
0.7%
26
iScience
1063 papers in training set
Top 34%
0.7%
27
Mathematical Biosciences and Engineering
23 papers in training set
Top 0.7%
0.7%
28
Epidemiology and Infection
84 papers in training set
Top 3%
0.7%
29
Frontiers in Applied Mathematics and Statistics
10 papers in training set
Top 0.5%
0.7%
30
Frontiers in Immunology
586 papers in training set
Top 9%
0.7%