Humans use a dual information-seeking policy to improve noisy inferences outside the explore-exploit tradeoff
Cao, Y.; Almeras, C.; Lee, J. K.; Wyart, V.
Show abstract
Everyday decisions aim not only to earn rewards but also to learn about the world. Across three studies (total N = 702), we examined how people gather epistemic information stripped of rewarding value, and compared their strategy with reward seeking in otherwise matched conditions. Computational modeling of human behavior revealed a two-stage information-seeking policy, where participants first repeatedly sample each novel option in turn to test provisional hypotheses, a process we call streaking, before transitioning to uncertainty-guided exploration. While artificial neural networks trained to optimize inference accuracy acquired uncertainty-guided exploration but not early streaking, this two-stage policy improves human inference accuracy under noisy belief updating. Streaking and uncertainty-guided exploration tend to be co-expressed in the same individuals but map onto distinct psychological traits. Together, these results offer a novel account of human information seeking, clarifying its motives and benefits in epistemic contexts beyond the reward-centric explore-exploit tradeoff.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.