|
28 | 28 | [](https://github.com/opendilab/LightZero/blob/master/LICENSE)
|
29 | 29 | [](https://discord.gg/dkZS2JF56X)
|
30 | 30 |
|
31 |
| -Updated on 2024.12.10 LightZero-v0.1.0 |
| 31 | +Updated on 2025.02.08 LightZero-v0.1.0 |
32 | 32 |
|
33 | 33 | English | [简体中文(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Documentation](https://opendilab.github.io/LightZero) | [LightZero Paper](https://arxiv.org/abs/2310.08348) | [🔥UniZero Paper](https://arxiv.org/abs/2406.10667) | [🔥ReZero Paper](https://arxiv.org/abs/2404.16364)
|
34 | 34 |
|
@@ -361,6 +361,7 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
|
361 | 361 | - [2021 Sampled MuZero: Learning and Planning in Complex Action Spaces](https://arxiv.org/abs/2104.06303)
|
362 | 362 | - [2022 Stochastic MuZero: Planning in Stochastic Environments with A Learned Model](https://openreview.net/pdf?id=X6D9bAHhBQ1)
|
363 | 363 | - [2022 Gumbel MuZero: Policy Improvement by Planning with Gumbel](https://openreview.net/pdf?id=bERaNdoegnO&)
|
| 364 | +- [2024 UniZero: Generalized and Efficient Planning with Scalable Latent World Models](https://arxiv.org/abs/2406.10667) |
364 | 365 |
|
365 | 366 | #### AlphaGo series
|
366 | 367 | - [2015 _Nature_ AlphaGo Mastering the game of Go with deep neural networks and tree search](https://www.nature.com/articles/nature16961)
|
@@ -395,6 +396,22 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
|
395 | 396 | <details><summary>Click to expand</summary>
|
396 | 397 |
|
397 | 398 | #### ICML
|
| 399 | +- [Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models](https://icml.cc/virtual/2024/poster/33107) 2024 |
| 400 | + - Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang |
| 401 | + - Key: language models, decision-making, Monte Carlo Tree Search, reasoning, acting, planning |
| 402 | + - ExpEnv: HumanEval, WebShop, interactive QA, programming, math |
| 403 | +- [Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning](https://proceedings.mlr.press/v235/huang24p.html) 2024 |
| 404 | + - Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng |
| 405 | + - Key: multi-agent reinforcement learning, hierarchical opponent modeling, Monte Carlo Tree Search, few-shot adaptation, mixed-motive environments |
| 406 | + - ExpEnv: multi-agent decision-making scenarios, self-play, mixed-motive interactions |
| 407 | +- [Accelerating Look-ahead in Bayesian Optimization: Multilevel Monte Carlo is All you Need](https://openreview.net/forum?id=46vXhZn7lN) 2024 |
| 408 | + - Shangda Yang, Vitaly Zankin, Maximilian Balandat, Stefan Scherer, Kevin Thomas Carlberg, Neil Walton, Kody J. H. Law |
| 409 | + - Key: Bayesian optimization, multilevel Monte Carlo, nested expectations, acquisition functions |
| 410 | + - ExpEnv: Benchmark examples |
| 411 | +- [Accelerated Speculative Sampling Based on Tree Monte Carlo](https://openreview.net/forum?id=stMhi1Sn2G) 2024 |
| 412 | + - Zhengmian Hu, Heng Huang |
| 413 | + - Key: speculative sampling, large language models, tree Monte Carlo, inference acceleration |
| 414 | + - ExpEnv: Not specified |
398 | 415 | - [Scalable Safe Policy Improvement via Monte Carlo Tree Search](https://openreview.net/pdf?id=tevbBSzSfK) 2023
|
399 | 416 | - Alberto Castellini, Federico Bianchi, Edoardo Zorzi, Thiago D. Simão, Alessandro Farinelli, Matthijs T. J. Spaan
|
400 | 417 | - Key: safe policy improvement online using a MCTS based strategy, Safe Policy Improvement with Baseline Bootstrapping
|
@@ -422,6 +439,22 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
|
422 | 439 | - ExpEnv: USPTO datasets
|
423 | 440 | - [Code](https://github.com/binghong-ml/retro_star)
|
424 | 441 | #### ICLR
|
| 442 | +- [OptionZero: Planning with Learned Options](https://openreview.net/forum?id=3IFRygQKGL) 2025 |
| 443 | + - Po-Wei Huang, Pei-Chiun Peng, Hung Guei, Ti-Rong Wu |
| 444 | + - Key: Option, Semi-MDP, MuZero, MCTS, Planning, Reinforcement Learning |
| 445 | + - ExpEnv: 26 Atari games |
| 446 | +- [Monte Carlo Planning with Large Language Model for Text-Based Games](https://openreview.net/forum?id=r1KcapkzCt) 2025 |
| 447 | + - Zijing Shi, Meng Fang, Ling Chen |
| 448 | + - Key: Large language model, Monte Carlo tree search, Text-based games |
| 449 | + - ExpEnv: Jericho benchmark |
| 450 | +- [Epistemic Monte Carlo Tree Search](https://openreview.net/forum?id=Tb8RiXOc3N) 2025 |
| 451 | + - Yaniv Oren, Viliam Vadocz, Matthijs T. J. Spaan, Wendelin Boehmer |
| 452 | + - Key: model based, epistemic uncertainty, exploration, planning, alphazero, muzero |
| 453 | + - ExpEnv: SUBLEQ (Assembly language), Deep Sea |
| 454 | +- [Enhancing Software Agents with Monte Carlo Tree Search and Hindsight Feedback](https://openreview.net/forum?id=G7sIFXugTX) 2025 |
| 455 | + - Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, William Yang Wang |
| 456 | + - Key: agents, LLM, SWE-agents, SWE-bench, search, planning, reasoning, self-improvement, open-ended |
| 457 | + - ExpEnv: SWE-bench |
425 | 458 | - [The Update Equivalence Framework for Decision-Time Planning](https://openreview.net/forum?id=JXGph215fL) 2024
|
426 | 459 | - Samuel Sokota, Gabriele Farina, David J Wu, Hengyuan Hu, Kevin A. Wang, J Zico Kolter, Noam Brown
|
427 | 460 | - Key: imperfect-information games, search, decision-time planning, update equivalence
|
|
0 commit comments