Humans use a dual information-seeking policy to improve noisy inferences outside the explore-exploit tradeoff

Cao, Y.; Almeras, C.; Lee, J. K.; Wyart, V.

2025-11-16 neuroscience

10.1101/2025.10.08.681186 bioRxiv

Show abstract

Everyday decisions aim not only to earn rewards but also to learn about the world. Across three studies (total N = 702), we examined how people gather epistemic information stripped of rewarding value, and compared their strategy with reward seeking in otherwise matched conditions. Computational modeling of human behavior revealed a two-stage information-seeking policy, where participants first repeatedly sample each novel option in turn to test provisional hypotheses, a process we call streaking, before transitioning to uncertainty-guided exploration. While artificial neural networks trained to optimize inference accuracy acquired uncertainty-guided exploration but not early streaking, this two-stage policy improves human inference accuracy under noisy belief updating. Streaking and uncertainty-guided exploration tend to be co-expressed in the same individuals but map onto distinct psychological traits. Together, these results offer a novel account of human information seeking, clarifying its motives and benefits in epistemic contexts beyond the reward-centric explore-exploit tradeoff.

Humans use a dual information-seeking policy to improve noisy inferences outside the explore-exploit tradeoff

Matching journals