Back

Keeping human in the loop: A three-phase generative AI workflow for research integrity in data-intensive science.A methodological case study using elite Ethiopian distance-running data

Galko, P.; Yisamaw, A.; Haugen, T.; Seiler, S.

2026-05-29 sports medicine
10.64898/2026.05.29.26354013 medRxiv
Show abstract

Background: Generative AI tools can support data-intensive research by writing code, drafting prose, searching analytical possibilities, and stress-testing claims. They can also produce false citations, drift between statistical specifications, and lose continuity across long investigations. This paper describes a practical workflow for using AI systems in empirical research while keeping discovery, verification, and accountability inspectable. Methods: We developed and applied a three-phase human-AI workflow to a case study of 14 elite Ethiopian distance runners. The dataset contained 22,605 GPS-segments collected across 97 consecutive days in late 2025, supplemented by venue and athlete metadata collected in the field. Phase 1 used an autonomous data-exploration tool to pre-filter the hypothesis space across five seeded research questions. Phase 2 used an AI system under direct human guidance to construct candidate findings into numerical claims, verification scripts, and draft text. Phase 3 used an independent AI system in an adversarial role to stress-test methods, statistics, prose, figures, and citations. The workflow was informed by Pearl's distinction between association, intervention, and counterfactual reasoning, with human judgement retained for research direction, interpretation, and final claims. Results: The workflow produced three empirical analyses and a documented correction process. The analyses estimated an altitude-to-sea-level pace correction of +0.10 min/km per 1,000 m at matched heart rate, showed why pooled altitude-surface regression was not identifiable within this venue system, documented method-dependence in heart-rate-based intensity classification, characterised within-venue route variation as a 64/36 path-fixed-to-trail-variable split with the Sululta label resolving into two functionally distinct sub-venues, and reframed the cohort's training through a 3x3x3 prescription lattice grounded in Ethiopian coaching practice. The adversarial phase identified several hallucinated citations, a terminology error between HC1 and cluster-robust standard errors, and several inconsistencies between prose, figures, and computed results. Verification scripts re-derived nearly all numerical claims from the cleaned lap-level data. Conclusions: The case study shows how researchers can organise AI-assisted empirical work so that candidate discovery, claim construction, independent stress-testing, and final accountability remain separated. The workflow did not remove the need for domain expertise or human judgement. Its value was in making the route from candidate finding to manuscript claim explicit, reproducible, and open to challenge. Trial registration: Not applicable.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 10%
17.9%
2
Scientific Reports
3102 papers in training set
Top 10%
8.4%
3
JAMA Network Open
127 papers in training set
Top 0.4%
6.4%
4
European Journal of Epidemiology
40 papers in training set
Top 0.1%
3.7%
5
International Journal of Environmental Research and Public Health
124 papers in training set
Top 2%
3.7%
6
Nature Human Behaviour
85 papers in training set
Top 0.9%
3.7%
7
GigaScience
172 papers in training set
Top 0.7%
2.8%
8
Communications Biology
886 papers in training set
Top 3%
2.8%
9
BMJ Open
554 papers in training set
Top 8%
2.1%
50% of probability mass above
10
PLOS Digital Health
91 papers in training set
Top 1%
2.1%
11
eLife
5422 papers in training set
Top 35%
2.1%
12
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
13
Frontiers in Genetics
197 papers in training set
Top 4%
1.8%
14
PLOS Biology
408 papers in training set
Top 9%
1.7%
15
Journal of the American Heart Association
119 papers in training set
Top 3%
1.7%
16
BMC Nephrology
13 papers in training set
Top 0.2%
1.5%
17
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.4%
18
BMC Medicine
163 papers in training set
Top 5%
1.1%
19
Medicine
30 papers in training set
Top 2%
1.0%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
1.0%
21
F1000Research
79 papers in training set
Top 3%
0.9%
22
Research Synthesis Methods
20 papers in training set
Top 0.2%
0.9%
23
The Lancet Digital Health
25 papers in training set
Top 0.8%
0.9%
24
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.9%
25
European Journal of Applied Physiology
12 papers in training set
Top 0.2%
0.8%
26
Wellcome Open Research
57 papers in training set
Top 2%
0.8%
27
FACETS
11 papers in training set
Top 0.2%
0.8%
28
Frontiers in Behavioral Neuroscience
46 papers in training set
Top 1%
0.7%
29
Neuroscience & Biobehavioral Reviews
43 papers in training set
Top 1%
0.7%
30
Peer Community Journal
254 papers in training set
Top 4%
0.7%