Back

Clinical and Cross-Domain Validation of an LLM-Guided, Literature-Based Gene Prioritization Framework

khan, t.; A, A.; George, J.; Tomalka, J. A.; Sekaly, R.-P.; Palucka, K.; Chaussabel, D.

2026-01-23 bioinformatics
10.64898/2026.01.22.701191 bioRxiv
Show abstract

BackgroundWe previously published a literature-based pipeline for sepsis gene prioritization (PS3 and candidate genes) using an LLM-enabled retrieval and judging framework. Here, we extend that work to ask whether these prioritized genes show independent clinical validity and whether the same strategy generalizes to a drug-obesity- infection setting. MethodsUsing the original LLM-guided workflow, we evaluated PS3 and the Candidate set in two new settings. First, we tested 28-day mortality prediction in the independent VANISH sepsis trial, benchmarking PS3 and Candidate against two established immune signatures--the Severe-or-Mild (SoM) signature and Immune Health Metric (IHM)--under a uniform logistic-regression framework with clinical covariates. Second, we applied the same genome-wide screening and tiered judging pipeline to GLP-1/obesity/infection biology centered on semaglutide, comparing Tier-1 and Tier-2 gene sets to STEP trial serum proteomics at gene and Hallmark pathway levels. In parallel, we fine-tuned an open-weight GPT-OSS-20B model on curated sepsis justifications to obtain a domain-aware "LLM-as-judge," and compared its scoring behavior with the base model on semaglutide Tier-2 genes. ResultsIn the full VANISH cohort, PS3 and the Candidate set showed moderate discrimination, whereas SoM remained the strongest single predictor of 28-day mortality. In the Critical/High APACHE II subgroup, PS3 achieved ROC and precision-recall performance comparable to, or slightly better than, SoM despite its smaller, knowledge-derived composition, indicating that literature-prioritized genes capture mortality-relevant immune dysregulation under severe illness. In the semaglutide case study, gene-level overlap between LLM-prioritized genes and differentially abundant serum proteins was modest, but Tier-1 genes recapitulated the main semaglutide-responsive metabolic programs from STEP and highlighted additional immune-metabolic pathways relevant to infection, with discordances largely explained by serum proteome coverage. The fine-tuned judge remained moderately concordant with the base GPT-OSS across mechanistic themes, preserving overall ranking while inducing systematic, biologically interpretable shifts in immune and infection-related scores. ConclusionsAn LLM-guided, literature-based gene prioritization framework yields compact gene sets that show independent sepsis mortality signal and pathway-level concordance in a semaglutide/obesity/infection setting, while a sepsis-aware LLM-as-judge provides domain-specific refinements without overturning core rankings. Together, these findings support knowledge-grounded, LLM-derived gene sets and judges as interpretable components for probing immune dysregulation across diseases and therapies. Short abstractWe previously published a literature-based pipeline for sepsis gene prioritization (Priority Set 3, PS3, and a Candidate set) using an LLM-enabled retrieval and judging framework; here, we test whether these prioritized genes show independent clinical validity and whether the same strategy generalizes to a drug-obesity-infection setting. Using the original workflow, we first evaluated 28-day mortality prediction in the independent VANISH sepsis trial, benchmarking PS3 and the Candidate set against two established immune signatures--the Severe-or-Mild (SoM) signature and Immune Health Metric (IHM)--within a uniform logistic-regression framework adjusted for clinical covariates. In the full cohort, PS3 and Candidate showed moderate discrimination, while SoM remained the strongest single predictor; however, in the Critical/High APACHE II subgroup, PS3 achieved ROC and precision-recall performance comparable to, or slightly better than, SoM despite its smaller, knowledge-driven composition, indicating that literature-prioritized genes capture mortality-relevant immune dysregulation under severe illness. We then applied the same genome-wide screening and tiered judging pipeline to GLP-1/obesity/infection biology centered on semaglutide, comparing Tier-1 and Tier-2 genes with STEP trial serum proteomics; although gene-level overlap with differentially abundant proteins was modest, Tier-1 genes recapitulated key semaglutide-responsive metabolic programs and highlighted additional immune-metabolic pathways relevant to infection, with discordances largely attributable to serum proteome coverage. Finally, supervised fine-tuning of an open-weight GPT-OSS model on curated sepsis justifications yielded a domain-aware "LLM-as-judge" that remained broadly concordant with the base model while inducing systematic, interpretable shifts in immune and infection-related scores. Together, these results support LLM-guided, literature-based gene sets and judges as compact, mechanistically interpretable components for probing immune dysregulation across diseases and therapies.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Cell Reports Medicine
140 papers in training set
Top 0.1%
18.9%
2
Nature Communications
4913 papers in training set
Top 13%
12.7%
3
Genome Medicine
154 papers in training set
Top 2%
4.4%
4
Cell Systems
167 papers in training set
Top 3%
4.2%
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
6
npj Digital Medicine
97 papers in training set
Top 1%
3.6%
7
Nature Machine Intelligence
61 papers in training set
Top 0.9%
3.6%
50% of probability mass above
8
Nature Medicine
117 papers in training set
Top 1%
3.1%
9
Bioinformatics
1061 papers in training set
Top 6%
2.1%
10
Molecular Systems Biology
142 papers in training set
Top 0.5%
1.9%
11
eBioMedicine
130 papers in training set
Top 1%
1.7%
12
Patterns
70 papers in training set
Top 0.9%
1.7%
13
eLife
5422 papers in training set
Top 45%
1.5%
14
JCI Insight
241 papers in training set
Top 4%
1.5%
15
Scientific Reports
3102 papers in training set
Top 61%
1.5%
16
Cell Genomics
162 papers in training set
Top 4%
1.5%
17
iScience
1063 papers in training set
Top 17%
1.5%
18
Bioinformatics Advances
184 papers in training set
Top 3%
1.4%
19
Molecular & Cellular Proteomics
158 papers in training set
Top 1%
1.4%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.4%
21
PROTEOMICS
35 papers in training set
Top 0.6%
0.9%
22
Journal of Clinical Investigation
164 papers in training set
Top 5%
0.9%
23
Advanced Science
249 papers in training set
Top 16%
0.9%
24
Cell Reports Methods
141 papers in training set
Top 5%
0.8%
25
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.8%
26
GigaScience
172 papers in training set
Top 3%
0.7%
27
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
28
Communications Medicine
85 papers in training set
Top 1%
0.7%
29
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.7%
30
Communications Biology
886 papers in training set
Top 25%
0.7%