Dual reinforcement-learning network modules for modeling decision-making with multiple strategies

Maeda, H.; Wang, S.; Funamizu, A.

2026-03-10 neuroscience

10.64898/2026.03.07.709953 bioRxiv

Show abstract

Animals and humans use multiple behavioral strategies to perform tasks. However, neural implementations of multiple strategies remain elusive, as some studies propose distinct pathways, while others observe overlapping brain regions associated with strategies. We propose a hybrid deep reinforcement learning (H-DRL) method, in which one network model implements model-free and inference-based behaviors through synaptic plasticity and recurrent activity. H-DRL uses a single updating rule and switches the strategy according to task demands without an explicit arbitrator. H-DRL reproduced mixed strategies of humans in a two-step task. In the mouse perceptual decision-making task, H-DRL adapted the recurrent dynamics with rich learning when the task condition required inference-based behavior, while adopting model-free behavior with lazy learning for a simple condition. The activity of H-DRL units showed condition-dependent maintenance of previous events, consistent with orbitofrontal cortical activity in mice. Our model provides a unified view that one cortical network automatically determines strategies in use depending on task conditions.

Dual reinforcement-learning network modules for modeling decision-making with multiple strategies

Matching journals