Back

In-Context Learning with Large Language Models for Scalable Glycemic Index Assignment to Food Composition Databases: Development, Validation, and Reproducibility

Della Corte, K. A.; Ebbert, J. L.; Brand-Miller, J.; Atkinson, F.; Della Corte, D.

2026-05-01 nutrition
10.64898/2026.04.23.26351292 medRxiv
Show abstract

Assigning glycemic index (GI) values to food composition databases is a critical bottleneck in nutritional epidemiology. We developed an in-context learning approach using large language models (LLMs), in which a structured knowledge system (termed a skill) loads GI reference databases ([~]11,000 entries), expert decision rules, and error-correction heuristics into the models context window ([~]300,000 tokens). The LLM performs GI assignments without scripted logic, functioning simultaneously as a semantic matching engine, numerical reasoning system, and expert curator. We validated this approach in two experiments. In Validation Study 1, the skill predicted the expert-curated US National GI Database (9,428 foods) using only European reference data, achieving within {+/-}10 agreement of 73.7% without manual review - compared with 31.3% retention of previously published cosine-similarity approach. In Validation Study 2, the skill was augmented with US GIDB and applied to 1,157 European food descriptions classified using the EFSA FoodEx2 system, achieving ICC = 0.79 with the expert (weighted {kappa} = 0.65; triplicate ICC = 0.88). We then applied the skill prospectively to extend US dietary GI and GL surveillance to two additional NHANES cycles (2019-2023), identifying a continued decline in energy-adjusted glycemic load. Reproducibility was assessed through triplicate runs (temperature = 0, pinned model version). The skill architecture is described in sufficient detail to inform future applications of in-context learning for nutritional database construction. STATEMENT OF SIGNIFICANCEThis paper introduces a fundamentally new approach to glycemic index (GI) database construction. Rather than using programmatic text-matching algorithms followed by extensive manual curation, we demonstrate that a large language model (LLM), when loaded with the complete GI reference literature and formalized expert decision rules, can perform one-shot GI assignments at accuracy levels comparable to human expert ratings (ICC = 0.79 with expert, weighted {kappa} = 0.65 for GI category agreement). The approach is validated across two independent food databases spanning US and European food supplies. The method reduces the time required to assign GI values to a new national food database from months of expert labor to hours of computation, while maintaining reproducibility through a structured, versionable skill architecture. This has immediate practical implications for enabling GI-based dietary surveillance and epidemiologic research in countries that currently lack GI databases or need to update existing databases.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
The American Journal of Clinical Nutrition
19 papers in training set
Top 0.1%
12.3%
2
Nature Communications
4913 papers in training set
Top 25%
7.2%
3
eLife
5422 papers in training set
Top 10%
7.2%
4
Bioinformatics Advances
184 papers in training set
Top 0.5%
6.3%
5
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.5%
6.3%
6
Public Health Nutrition
14 papers in training set
Top 0.2%
4.0%
7
npj Digital Medicine
97 papers in training set
Top 1%
3.9%
8
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
50% of probability mass above
9
Science Translational Medicine
111 papers in training set
Top 0.9%
3.6%
10
PLOS ONE
4510 papers in training set
Top 43%
2.9%
11
Scientific Reports
3102 papers in training set
Top 43%
2.7%
12
JAMA Network Open
127 papers in training set
Top 2%
2.1%
13
Science Advances
1098 papers in training set
Top 13%
2.1%
14
Database
51 papers in training set
Top 0.3%
1.9%
15
Nature Human Behaviour
85 papers in training set
Top 2%
1.8%
16
BMC Medicine
163 papers in training set
Top 4%
1.7%
17
Bioinformatics
1061 papers in training set
Top 8%
1.3%
18
Current Developments in Nutrition
15 papers in training set
Top 0.6%
1.3%
19
Journal of Translational Medicine
46 papers in training set
Top 1%
1.3%
20
BMC Bioinformatics
383 papers in training set
Top 5%
1.3%
21
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
22
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.9%
23
Journal of Clinical and Translational Science
11 papers in training set
Top 0.4%
0.8%
24
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
25
Metabolites
50 papers in training set
Top 1%
0.8%
26
The Journal of Nutrition
21 papers in training set
Top 0.6%
0.7%
27
iScience
1063 papers in training set
Top 32%
0.7%
28
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%
29
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
30
GigaScience
172 papers in training set
Top 3%
0.7%