Back

Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning

Liu, D.; Yu, Y.; Wu, Y. N.

2026-05-15 neuroscience
10.64898/2026.05.10.724161 bioRxiv
Show abstract

The success of large language models (LLMs) across diverse NLP tasks has elevated the importance of reasoning chain optimization as a critical step in aligning model behavior with task objectives. Existing reasoning chain tuning methods often rely on black-box heuristics or gradient-free search, which lack interpretability, generalization, and sample efficiency. In this work, we introduce Thoughts-as-Planning, a novel framework that formalizes reasoning chain optimization as a sequential decision-making process over a latent semantic space. We model the LLM as a partially observable environment and learn a latent world model that simulates the effect of reasoning chain edits on downstream outputs. A proximity-preserving embedding space is constructed to encode reasoning chain-response dynamics, enabling planning via gradient descent or reinforcement learning. Our method supports multi-scale abstraction, allowing reasoning chain edits at token, segment, and instruction levels to be integrated into a unified planner. Through extensive experiments on language understanding and generation tasks, we demonstrate that Thoughts-as-Planning outperforms state-of-the-art reasoning chain tuning baselines in efficiency, robustness, and generalization, while offering interpretability through its structured planning trajectory. Our code is available at https://github.com/FastLM/Thoughts-as-Planning.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 18%
10.1%
2
Nature Biotechnology
147 papers in training set
Top 0.7%
10.1%
3
Nature
575 papers in training set
Top 4%
6.8%
4
Bioinformatics
1061 papers in training set
Top 4%
6.8%
5
Nature Methods
336 papers in training set
Top 2%
6.3%
6
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 11%
6.3%
7
Nature Human Behaviour
85 papers in training set
Top 0.4%
6.3%
50% of probability mass above
8
Nature Neuroscience
216 papers in training set
Top 2%
4.8%
9
Nature Computational Science
50 papers in training set
Top 0.2%
3.6%
10
PLOS ONE
4510 papers in training set
Top 39%
3.6%
11
PLOS Computational Biology
1633 papers in training set
Top 12%
2.7%
12
Science
429 papers in training set
Top 12%
2.4%
13
Scientific Reports
3102 papers in training set
Top 58%
1.7%
14
Nature Medicine
117 papers in training set
Top 2%
1.7%
15
npj Digital Medicine
97 papers in training set
Top 2%
1.5%
16
Communications Psychology
20 papers in training set
Top 0.1%
1.3%
17
Neuron
282 papers in training set
Top 7%
1.3%
18
Nucleic Acids Research
1128 papers in training set
Top 14%
1.2%
19
Communications Biology
886 papers in training set
Top 14%
1.2%
20
Advanced Science
249 papers in training set
Top 14%
1.2%
21
iScience
1063 papers in training set
Top 22%
1.2%
22
Genome Biology
555 papers in training set
Top 6%
0.9%
23
Cell Systems
167 papers in training set
Top 11%
0.8%
24
Cognition
44 papers in training set
Top 0.4%
0.8%
25
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
26
eLife
5422 papers in training set
Top 58%
0.7%
27
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.7%
28
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%