Skip to content

Commit 06aee46

Browse files
authored
polish(pu): add recent mcts-related papers (#324)
1 parent 8099be9 commit 06aee46

File tree

2 files changed

+68
-3
lines changed

2 files changed

+68
-3
lines changed

README.md

+34-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
2929
[![discord badge](https://dcbadge.vercel.app/api/server/dkZS2JF56X?style=flat)](https://discord.gg/dkZS2JF56X)
3030

31-
Updated on 2024.12.10 LightZero-v0.1.0
31+
Updated on 2025.02.08 LightZero-v0.1.0
3232

3333
English | [简体中文(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Documentation](https://opendilab.github.io/LightZero) | [LightZero Paper](https://arxiv.org/abs/2310.08348) | [🔥UniZero Paper](https://arxiv.org/abs/2406.10667) | [🔥ReZero Paper](https://arxiv.org/abs/2404.16364)
3434

@@ -361,6 +361,7 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
361361
- [2021 Sampled MuZero: Learning and Planning in Complex Action Spaces](https://arxiv.org/abs/2104.06303)
362362
- [2022 Stochastic MuZero: Planning in Stochastic Environments with A Learned Model](https://openreview.net/pdf?id=X6D9bAHhBQ1)
363363
- [2022 Gumbel MuZero: Policy Improvement by Planning with Gumbel](https://openreview.net/pdf?id=bERaNdoegnO&)
364+
- [2024 UniZero: Generalized and Efficient Planning with Scalable Latent World Models](https://arxiv.org/abs/2406.10667)
364365
365366
#### AlphaGo series
366367
- [2015 _Nature_ AlphaGo Mastering the game of Go with deep neural networks and tree search](https://www.nature.com/articles/nature16961)
@@ -395,6 +396,22 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
395396
<details><summary>Click to expand</summary>
396397
397398
#### ICML
399+
- [Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models](https://icml.cc/virtual/2024/poster/33107) 2024
400+
- Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang
401+
- Key: language models, decision-making, Monte Carlo Tree Search, reasoning, acting, planning
402+
- ExpEnv: HumanEval, WebShop, interactive QA, programming, math
403+
- [Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning](https://proceedings.mlr.press/v235/huang24p.html) 2024
404+
- Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng
405+
- Key: multi-agent reinforcement learning, hierarchical opponent modeling, Monte Carlo Tree Search, few-shot adaptation, mixed-motive environments
406+
- ExpEnv: multi-agent decision-making scenarios, self-play, mixed-motive interactions
407+
- [Accelerating Look-ahead in Bayesian Optimization: Multilevel Monte Carlo is All you Need](https://openreview.net/forum?id=46vXhZn7lN) 2024
408+
- Shangda Yang, Vitaly Zankin, Maximilian Balandat, Stefan Scherer, Kevin Thomas Carlberg, Neil Walton, Kody J. H. Law
409+
- Key: Bayesian optimization, multilevel Monte Carlo, nested expectations, acquisition functions
410+
- ExpEnv: Benchmark examples
411+
- [Accelerated Speculative Sampling Based on Tree Monte Carlo](https://openreview.net/forum?id=stMhi1Sn2G) 2024
412+
- Zhengmian Hu, Heng Huang
413+
- Key: speculative sampling, large language models, tree Monte Carlo, inference acceleration
414+
- ExpEnv: Not specified
398415
- [Scalable Safe Policy Improvement via Monte Carlo Tree Search](https://openreview.net/pdf?id=tevbBSzSfK) 2023
399416
- Alberto Castellini, Federico Bianchi, Edoardo Zorzi, Thiago D. Simão, Alessandro Farinelli, Matthijs T. J. Spaan
400417
- Key: safe policy improvement online using a MCTS based strategy, Safe Policy Improvement with Baseline Bootstrapping
@@ -422,6 +439,22 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
422439
- ExpEnv: USPTO datasets
423440
- [Code](https://github.com/binghong-ml/retro_star)
424441
#### ICLR
442+
- [OptionZero: Planning with Learned Options](https://openreview.net/forum?id=3IFRygQKGL) 2025
443+
- Po-Wei Huang, Pei-Chiun Peng, Hung Guei, Ti-Rong Wu
444+
- Key: Option, Semi-MDP, MuZero, MCTS, Planning, Reinforcement Learning
445+
- ExpEnv: 26 Atari games
446+
- [Monte Carlo Planning with Large Language Model for Text-Based Games](https://openreview.net/forum?id=r1KcapkzCt) 2025
447+
- Zijing Shi, Meng Fang, Ling Chen
448+
- Key: Large language model, Monte Carlo tree search, Text-based games
449+
- ExpEnv: Jericho benchmark
450+
- [Epistemic Monte Carlo Tree Search](https://openreview.net/forum?id=Tb8RiXOc3N) 2025
451+
- Yaniv Oren, Viliam Vadocz, Matthijs T. J. Spaan, Wendelin Boehmer
452+
- Key: model based, epistemic uncertainty, exploration, planning, alphazero, muzero
453+
- ExpEnv: SUBLEQ (Assembly language), Deep Sea
454+
- [Enhancing Software Agents with Monte Carlo Tree Search and Hindsight Feedback](https://openreview.net/forum?id=G7sIFXugTX) 2025
455+
- Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, William Yang Wang
456+
- Key: agents, LLM, SWE-agents, SWE-bench, search, planning, reasoning, self-improvement, open-ended
457+
- ExpEnv: SWE-bench
425458
- [The Update Equivalence Framework for Decision-Time Planning](https://openreview.net/forum?id=JXGph215fL) 2024
426459
- Samuel Sokota, Gabriele Farina, David J Wu, Hengyuan Hu, Kevin A. Wang, J Zico Kolter, Noam Brown
427460
- Key: imperfect-information games, search, decision-time planning, update equivalence

README.zh.md

+34-2
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
[![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
2828
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
2929

30-
最近更新于 2024.12.10 LightZero-v0.1.0
30+
最近更新于 2025.02.08 LightZero-v0.1.0
3131

3232
[English](https://github.com/opendilab/LightZero/blob/main/README.md) | 简体中文 | [文档](https://opendilab.github.io/LightZero) | [LightZero 论文](https://arxiv.org/abs/2310.08348) | [🔥UniZero 论文](https://arxiv.org/abs/2406.10667) | [🔥ReZero 论文](https://arxiv.org/abs/2404.16364)
3333

@@ -334,7 +334,7 @@ LightZero的文档可以在[这里](https://opendilab.github.io/LightZero/)找
334334
- [2021 Sampled MuZero: Learning and Planning in Complex Action Spaces](https://arxiv.org/abs/2104.06303)
335335
- [2022 Stochastic MuZero: Plannig in Stochastic Environments with A Learned Model](https://openreview.net/pdf?id=X6D9bAHhBQ1)
336336
- [2022 Gumbel MuZero: Policy Improvement by Planning with Gumbel](https://openreview.net/pdf?id=bERaNdoegnO&)
337-
337+
- [2024 UniZero: Generalized and Efficient Planning with Scalable Latent World Models](https://arxiv.org/abs/2406.10667)
338338

339339
#### AlphaGo series
340340

@@ -371,6 +371,22 @@ LightZero的文档可以在[这里](https://opendilab.github.io/LightZero/)找
371371
<summary>(点击查看)</summary>
372372

373373
#### ICML
374+
- [Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models](https://icml.cc/virtual/2024/poster/33107) 2024
375+
- Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang
376+
- Key: language models, decision-making, Monte Carlo Tree Search, reasoning, acting, planning
377+
- ExpEnv: HumanEval, WebShop, interactive QA, programming, math
378+
- [Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning](https://proceedings.mlr.press/v235/huang24p.html) 2024
379+
- Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng
380+
- Key: multi-agent reinforcement learning, hierarchical opponent modeling, Monte Carlo Tree Search, few-shot adaptation, mixed-motive environments
381+
- ExpEnv: multi-agent decision-making scenarios, self-play, mixed-motive interactions
382+
- [Accelerating Look-ahead in Bayesian Optimization: Multilevel Monte Carlo is All you Need](https://openreview.net/forum?id=46vXhZn7lN) 2024
383+
- Shangda Yang, Vitaly Zankin, Maximilian Balandat, Stefan Scherer, Kevin Thomas Carlberg, Neil Walton, Kody J. H. Law
384+
- Key: Bayesian optimization, multilevel Monte Carlo, nested expectations, acquisition functions
385+
- ExpEnv: Benchmark examples
386+
- [Accelerated Speculative Sampling Based on Tree Monte Carlo](https://openreview.net/forum?id=stMhi1Sn2G) 2024
387+
- Zhengmian Hu, Heng Huang
388+
- Key: speculative sampling, large language models, tree Monte Carlo, inference acceleration
389+
- ExpEnv: Not specified
374390
- [Scalable Safe Policy Improvement via Monte Carlo Tree Search](https://openreview.net/pdf?id=tevbBSzSfK) 2023
375391
- Alberto Castellini, Federico Bianchi, Edoardo Zorzi, Thiago D. Simão, Alessandro Farinelli, Matthijs T. J. Spaan
376392
- Key: safe policy improvement online using a MCTS based strategy, Safe Policy Improvement with Baseline Bootstrapping
@@ -399,6 +415,22 @@ and internal state transition dynamics,
399415
- ExpEnv: USPTO datasets
400416
- [Code](https://github.com/binghong-ml/retro_star)
401417
#### ICLR
418+
- [OptionZero: Planning with Learned Options](https://openreview.net/forum?id=3IFRygQKGL) 2025
419+
- Po-Wei Huang, Pei-Chiun Peng, Hung Guei, Ti-Rong Wu
420+
- Key: Option, Semi-MDP, MuZero, MCTS, Planning, Reinforcement Learning
421+
- ExpEnv: 26 Atari games
422+
- [Monte Carlo Planning with Large Language Model for Text-Based Games](https://openreview.net/forum?id=r1KcapkzCt) 2025
423+
- Zijing Shi, Meng Fang, Ling Chen
424+
- Key: Large language model, Monte Carlo tree search, Text-based games
425+
- ExpEnv: Jericho benchmark
426+
- [Epistemic Monte Carlo Tree Search](https://openreview.net/forum?id=Tb8RiXOc3N) 2025
427+
- Yaniv Oren, Viliam Vadocz, Matthijs T. J. Spaan, Wendelin Boehmer
428+
- Key: model based, epistemic uncertainty, exploration, planning, alphazero, muzero
429+
- ExpEnv: SUBLEQ (Assembly language), Deep Sea
430+
- [Enhancing Software Agents with Monte Carlo Tree Search and Hindsight Feedback](https://openreview.net/forum?id=G7sIFXugTX) 2025
431+
- Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, William Yang Wang
432+
- Key: agents, LLM, SWE-agents, SWE-bench, search, planning, reasoning, self-improvement, open-ended
433+
- ExpEnv: SWE-bench
402434
- [The Update Equivalence Framework for Decision-Time Planning](https://openreview.net/forum?id=JXGph215fL) 2024
403435
- Samuel Sokota, Gabriele Farina, David J Wu, Hengyuan Hu, Kevin A. Wang, J Zico Kolter, Noam Brown
404436
- Key: imperfect-information games, search, decision-time planning, update equivalence

0 commit comments

Comments
 (0)