Back

Translational bioinformatics and machine learning framework for biomarker discovery, disease prediction, and patient profiling for precision medicine

Ahmed, Z.; Govindareddy, P.; DeGroat, W.; Narayanan, R.; Peker, E.; Zeeshan, S.

2026-05-27 genetic and genomic medicine
10.64898/2026.05.23.26353961 medRxiv
Show abstract

Precision medicine aims to advance our ability from a "one-size-fits-all" approach to personalized and predictive healthcare across diverse populations. It promotes integration of multi-omics and phenotypic data to understand disease mechanisms and discover novel biomarkers and risk factors, which could be used to predict and prevent critical diseases in individual patients across diverse populations. The potential implications of precision medicine approach can accelerate our ability to classify patients at higher risk of developing critical diseases, improve diagnostic capabilities, develop deeper understanding of individual risk, investigate racial differences and demographic characteristics, and find relationships between genetic variants, expressions, and diseases. This study focuses on implementing an innovative and data driven framework of translational bioinformatics and Machine Learning (ML) techniques to analyze multi-omics, including RNA-seq and Whole-Genome Sequencing (WGS) data, generated using blood samples of randomly consented patients. First, we utilized bioinformatics pipelines to identify differentially expressed genes and their pathogenic and likely pathogenic variants for the downstream data analysis, annotation, and visualization. Then, applied a nexus of ML models for multi-omics biomarker discovery, disease prediction, density-based clustering, single-patient profiling, and pathogenicity classification. WGS data analysis supported the exploration of genetic variation and diversity among patients to identify known and novel biomarkers, whereas RNA-seq data analysis improved our understanding of functional and biological pathways that underlying disease states. We classified and clustered pathogenic variants and expressions across various genes and discovered numerous diseases leading risk factors. Our results include gene-disease associations and captured common pathways across the broader population, demonstrating a level of sensitivity and accuracy that has broad clinical implications. We validated our results through clinical records, and state of the science literature. This study delves into the strengths of multi-omics data integration and capabilities of ML application in genetically diverse and complex patient cohorts. Our approach has the potential to elucidate complex gene-disease interactions for genetically diverse populations, which can support earlier diagnoses for patients in many disease realms.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.2%
17.2%
2
BMC Medical Genomics
36 papers in training set
Top 0.1%
12.3%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
8.3%
4
Cell Genomics
162 papers in training set
Top 1%
3.9%
5
iScience
1063 papers in training set
Top 5%
3.5%
6
Journal of Translational Medicine
46 papers in training set
Top 0.2%
3.5%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
50% of probability mass above
8
Scientific Reports
3102 papers in training set
Top 38%
3.5%
9
Frontiers in Genetics
197 papers in training set
Top 2%
3.5%
10
npj Digital Medicine
97 papers in training set
Top 2%
1.9%
11
Heliyon
146 papers in training set
Top 2%
1.8%
12
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
13
Journal of Biomedical Informatics
45 papers in training set
Top 0.8%
1.7%
14
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.5%
15
BMC Genomics
328 papers in training set
Top 3%
1.3%
16
Human Genetics and Genomics Advances
70 papers in training set
Top 0.4%
1.3%
17
Biosensors and Bioelectronics
52 papers in training set
Top 1%
1.2%
18
Cell Reports Medicine
140 papers in training set
Top 6%
1.2%
19
eLife
5422 papers in training set
Top 52%
0.9%
20
International Journal of Molecular Sciences
453 papers in training set
Top 13%
0.9%
21
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.9%
22
Bioinformatics
1061 papers in training set
Top 9%
0.9%
23
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
24
Genome Biology
555 papers in training set
Top 8%
0.7%
25
PLOS ONE
4510 papers in training set
Top 68%
0.7%
26
GigaScience
172 papers in training set
Top 3%
0.7%
27
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
28
Communications Biology
886 papers in training set
Top 27%
0.7%
29
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%
30
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.6%