Back

Biologically informed genetic data transformations improve multi-omic comorbidity prediction in people with HIV

Ryan, B.; Thorball, C. W.; Ait Oumelloul, M.; Kouyos, R.; Tarr, P. E.; Fellay, J.

2026-03-10 hiv aids
10.64898/2026.03.09.26347570 medRxiv
Show abstract

Coronary artery disease (CAD) and chronic kidney disease (CKD) are in part genetically determined and are associated with various omics layers. Methods for integrating genomics data with omics profiles remain to be standardised. This study evaluates biological data transformations to optimise the integration of genomics with other omics for comorbidity prediction in people with HIV (PWH). We trained linear and deep-learning single-omic and multi-omic models on two cohorts of PWH with genotype and one other omics data available. 436 CAD cases and 166 CKD were evenly split across train/validation/test cohorts. Multi-omic integration evaluated feature concatenation against encoder-based architectures and performance was estimated via five-fold cross-validation on fixed patient splits, reporting mean accuracy and standard errors. Genotype data was represented in four ways: (i) raw SNP genotype matrices; (ii) principal component (PCA) embeddings; (iii) polygenic risk scores (PRS); and (iv) AlphaGenome-derived gene-level impact scores. Each genotype representation was compared individually and when integrated in a multi-omics model. The results demonstrate that biologically informed genomic transformations improve prediction in multi-omics models. In both classification tasks, integrating raw SNPs (CAD accuracy = 0.55 {+/-} 0.03; CKD accuracy = 0.63 {+/-} 0.01) or genotype PCs (CAD accuracy = 0.54 {+/-} 0.03; CKD accuracy = 0.62 {+/-} 0.03) with other omics reduced performance relative to the best corresponding single-omics models. By contrast, PRS (CAD accuracy = 0.61 {+/-} 0.03; CKD accuracy = 0.65 {+/-} 0.02) and AlphaGenome (CAD accuracy = 0.57 {+/-} 0.03; CKD accuracy = 0.67 {+/-} 0.02) improved accuracy. As multi-omics analyses become more prominent, methods that integrate genomics effectively without requiring large cohorts will become increasingly valuable; here, we highlight two such approaches.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.1%
44.6%
2
Bioinformatics
1061 papers in training set
Top 5%
4.3%
3
Bioinformatics Advances
184 papers in training set
Top 0.9%
4.3%
50% of probability mass above
4
Nature Communications
4913 papers in training set
Top 35%
4.3%
5
eBioMedicine
130 papers in training set
Top 0.2%
3.9%
6
PLOS ONE
4510 papers in training set
Top 43%
2.8%
7
Communications Biology
886 papers in training set
Top 3%
2.8%
8
eLife
5422 papers in training set
Top 38%
1.9%
9
International Journal of Medical Informatics
25 papers in training set
Top 0.7%
1.8%
10
Genome Medicine
154 papers in training set
Top 4%
1.8%
11
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.8%
12
Journal of The Royal Society Interface
189 papers in training set
Top 2%
1.8%
13
Nature Medicine
117 papers in training set
Top 3%
1.3%
14
BMC Genomics
328 papers in training set
Top 4%
1.2%
15
BMC Medical Genomics
36 papers in training set
Top 0.9%
1.0%
16
Patterns
70 papers in training set
Top 2%
1.0%
17
JAIDS Journal of Acquired Immune Deficiency Syndromes
19 papers in training set
Top 0.3%
0.8%
18
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%
19
BioData Mining
15 papers in training set
Top 0.8%
0.8%
20
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 8%
0.8%
21
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
22
Genetic Epidemiology
46 papers in training set
Top 0.9%
0.7%
23
PLOS Digital Health
91 papers in training set
Top 3%
0.7%
24
European Journal of Epidemiology
40 papers in training set
Top 0.8%
0.7%
25
Database
51 papers in training set
Top 1%
0.5%