Back

DentaCoPilot: An LLM-Augmented Next-Procedure Recommender for General Dentistry, Designed for Dentist Augmentation

Rodrigues, C. C.; Rebello, S. D.

2026-05-08 dentistry and oral medicine
10.64898/2026.05.07.26352635 medRxiv
Show abstract

BackgroundCommercial dental artificial intelligence in 2026 is over-whelmingly diagnostic: caries, calculus, periapical, and bone-level detection on radiographs. The clinically harder question that follows every diagno-sis -- given a patients chart and most recent procedure, what should the dentist do next -- remains unsolved at general-dentistry scale. The closest published system, MultiTP (Chen et al., 2024), is a CNN-RNN restricted to partial-edentulism cases and provides neither calibrated uncertainty, structured rationale, nor an evaluation that treats the model as decision support rather than as an autonomous classifier. MethodsWe introduce DentaCoPilot, a recommender that, given a structured chart, returns (i) a calibrated top-K probability distribution over Current Dental Terminology (CDT) codes for the next procedure, (ii) a verbalised confidence label, (iii) an explicit abstain flag when context is insufficient, and (iv) a chartgrounded rationale. We compare four classical baselines (frequency bigram, TF-IDF + logistic regression, XGBoost, MultiTP-style CNN-RNN) and six large-language-model (LLM) variants (Claude Haiku, Sonnet + chain-of-thought, Sonnet + retrieval, Opus + chain-of-thought, Sonnet + classical prior, Opus + classical prior) on a synthetic chart corpus of 500 patients (1,284 test examples). All LLM inference is routed through the local Anthropic Claude Code CLI; every call is logged for full audit. ResultsOn apples-to-apples evaluation, classical baselines reach 0.567 top-1 / 0.967 top-5; pure LLM variants trail at 0.267-0.467 top-1. Prompt-conditioning a Sonnet LLM on the classical baselines top-10 candidates (M5) closes the gap: top-5 rises from 0.733 (pure Sonnet + chain-of-thought) to 0.933, matching classical baselines, while preserving rationale and abstention. Increasing the LLM backbone from Sonnet to Opus does not improve accuracy with or without priming. Calibration via temperature scaling and coverage-risk analysis is reported for the baselines. ConclusionPrompt-conditioning a small LLM on a classical baselines top-K is the most cost-effective LLM design we tested for next-procedure recommendation, and the design preserves the augmentation features that distinguish the system from an autonomous classifier. A pre-registered clinician-in-the-loop evaluation at the KLE Vish-wanath Katti Institute of Dental Sciences (Belgaum, India) and a real-data evaluation on the multi-institutional BigMouth dental data repository are the next stage of work.

Matching journals

The top 12 journals account for 50% of the predicted probability mass.

1
Biology Methods and Protocols
53 papers in training set
Top 0.1%
12.8%
2
PLOS ONE
4510 papers in training set
Top 26%
6.5%
3
Journal of Dental Research
13 papers in training set
Top 0.1%
4.3%
4
Biomolecules
95 papers in training set
Top 0.1%
3.8%
5
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
3.7%
6
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.7%
3.7%
7
Frontiers in Public Health
140 papers in training set
Top 2%
3.2%
8
Scientific Reports
3102 papers in training set
Top 42%
3.0%
9
International Journal of Medical Informatics
25 papers in training set
Top 0.5%
2.8%
10
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.8%
11
European Radiology
14 papers in training set
Top 0.3%
2.7%
12
Frontiers in Digital Health
20 papers in training set
Top 0.4%
2.2%
50% of probability mass above
13
Artificial Intelligence in Medicine
15 papers in training set
Top 0.2%
2.1%
14
PLOS Digital Health
91 papers in training set
Top 1%
1.9%
15
iScience
1063 papers in training set
Top 12%
1.8%
16
BioMed Research International
25 papers in training set
Top 1%
1.7%
17
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
18
npj Digital Medicine
97 papers in training set
Top 2%
1.7%
19
Healthcare
16 papers in training set
Top 0.6%
1.7%
20
Cureus
67 papers in training set
Top 3%
1.7%
21
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
1.7%
22
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
23
JMIR Medical Informatics
17 papers in training set
Top 0.7%
1.7%
24
Infection
15 papers in training set
Top 0.1%
1.5%
25
Bioinformatics
1061 papers in training set
Top 8%
1.4%
26
Royal Society Open Science
193 papers in training set
Top 3%
1.4%
27
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
28
Frontiers in Medicine
113 papers in training set
Top 5%
0.9%
29
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
30
Nature Human Behaviour
85 papers in training set
Top 4%
0.8%