Skip to content

Commit f966d00

Browse files
📝 Update paper with suggestions of @metaylor
1 parent 60996e8 commit f966d00

File tree

1 file changed

+28
-29
lines changed

1 file changed

+28
-29
lines changed

paper.md

+28-29
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'Benchmarking Hierarchical Reasoning with HierarchyCraft'
2+
title: 'HierarchyCraft: A Benchmark builder For Hierarchical Reasoning'
33
tags:
44
- Python
55
- Hierarchy
@@ -20,7 +20,7 @@ authors:
2020
affiliation: "1"
2121
- name: Yuxuan Li
2222
orcid: 0000-0001-5522-312X
23-
affiliation: "1,3"
23+
affiliation: "1, 3"
2424
- name: Matthew E. Taylor
2525
orcid: 0000-0001-8946-0211
2626
affiliation: "1, 3"
@@ -36,9 +36,10 @@ bibliography: paper.bib
3636

3737
---
3838

39+
3940
# Summary
40-
Hierarchical reasoning poses a fundamental challenge in the field of artificial intelligence [@botvinick2014model]. Existing methods may struggle when confronted with hierarchical tasks [@bacon2017option,@heess2016learning,@nachum2018data], yet there is a scarcity of suitable environments or benchmarks designed to comprehend how the structure of the underlying hierarchy influence a task difficulty. Our software represents a crucial initial step in the development of tools aimed at addressing research questions related to hierarchical reasoning.
4141

42+
Hierarchical reasoning poses a fundamental challenge in the field of artificial intelligence [@botvinick2014model]. Existing methods may struggle when confronted with hierarchical tasks [@bacon2017option,@heess2016learning,@nachum2018data], yet there is a scarcity of suitable environments or benchmarks designed to comprehend how the structure of the underlying hierarchy influence a task difficulty. Our software represents an important initial step in the development of tools aimed at addressing research questions related to hierarchical reasoning.
4243
We introduce **HierarchyCraft**, a lightweight environment builder designed for creating hierarchical reasoning tasks that do not necessitate feature extraction. This includes tasks involving pixel images, text, sound, or other data types where deep learning-based feature extraction is commonly employed.
4344
HierarchyCraft serves a dual purpose by offering a set of pre-defined hierarchical environments and simplifying the process of creating customized hierarchical environments.
4445

@@ -47,14 +48,15 @@ HierarchyCraft serves a dual purpose by offering a set of pre-defined hierarchic
4748

4849

4950
# Statement of need
51+
5052
HierarchyCraft is designed as a user-friendly Python library for constructing environments tailored to the study of hierarchical reasoning in the contexts of reinforcement learning, classical planning, and program synthesis as displayed in \autoref{fig:HierachyCraft_domain_position}.
5153

5254
Analysis and quantification of the impacts of diverse hierarchical structures on learning agents is essential for advancing hierarchical reasoning.
5355
However, current hierarchical benchmarks often limit themselves to a single hierarchical structure per benchmark, and present challenges not only due to this inherent hierarchical structure but also because of the necessary representation learning to interpret the inputs.
5456

5557
We argue that arbitrary hierarchical complexity can emerge from simple rules without the need for learning a representation.
56-
To the best of our knowledge, no general frameworks currently exist for constructing environments dedicated to studying the hierarchical structure itself, underscoring the necessity for the development of tools like HierarchyCraft.
57-
We compare five particularly related benchmarks to HierarchyCraft.
58+
To the best of our knowledge, no general frameworks currently exist for constructing environments dedicated to studying the hierarchical structure itself. We next highlight five related benchmarks, underscoring the necessity for the development of a tool like HierarchyCraft.
59+
5860

5961
### GridWorld
6062

@@ -64,31 +66,24 @@ Minigrid [@minigrid] is a user-friendly Python library that not only implements
6466

6567
![Example of Minigrid environments hierarchical structures and their relationships. There is only a few possible sub-tasks and most of them are navigation tasks (in green).\label{fig:MinigridHierarchies}](docs/images/MinigridHierarchies.png){ width=100% }
6668

69+
6770
### Minecraft
6871

6972
An exemplary instance of a hierarchical task is the collection of diamonds in the popular video game Minecraft, as showcased in the MineRL competition [@guss2021minerl2020], where hierarchical reinforcement learning agents have dominated the leaderboard[@milani2020minerl2019].
70-
71-
Due to the sparse rewards, exploration difficulty, and long time horizons in this procedurally generated sandbox environment, DreamerV3 [@dreamerv3] recently became the first algorithm to successfully collect diamonds in Minecraft without prior training or knowledge.
72-
Unfortunately, DreamerV3 required training on an Nvidia V100 GPU for 17 days, gathering around 100 million environmental steps.
73-
Such **substantial computational resources are inaccessible to most researchers**, impeding the overall progress of research on hierarchical reasoning.
74-
73+
Due to sparse rewards, the difficulty of exploration, and long time horizons in this procedurally generated sandbox environment, DreamerV3 [@dreamerv3] recently became the first algorithm to successfully collect diamonds in Minecraft without prior training or knowledge.
74+
Unfortunately, DreamerV3 required training on an Nvidia V100 GPU for 17 days, gathering roughly 100 million environmental steps.
75+
Such **substantial computational resources are unavailable to many researchers**, impeding the overall progress of research on hierarchical reasoning.
7576
Moreover, although Minecraft has an undeniably complex hierarchical structure, **this underlying hierarchical structure is fixed** and cannot be modified without modding the game, a complex task for researchers.
7677

7778

78-
79-
8079
### Crafter
8180

8281
Crafter [@hafner2022benchmarking] presents a lightweight grid-based 2D environment, with game mechanics akin to Minecraft and poses similar challenges including exploration, representation learning, rewards sparsity and long-term reasoning.
83-
Although Crafter offers 22 different tasks displayed in \autoref{fig:CrafterRequirements}, the **fixed underlying hierarchical structure** restricts how researchers can investigate the impacts of changes in this structure.
84-
85-
Furthermore, the tasks considered by the authors do not include navigation subtasks (e.g., find water, look for a cow, wait for a plant to grow, go back to a table...) or certain optional but useful subtasks (e.g., swords and the skill of dodging arrows contribute to making the task of killing skeletons easier), leading to abrupt drops in success rates in the hierarchy instead of a more gradual increase in difficulty.
86-
82+
Although Crafter offers 22 different tasks displayed in \autoref{fig:CrafterRequirements}, the **underlying hierarchical structure is fixed**, restricting how researchers can investigate the impacts of changes to the hirerachy.
83+
Furthermore, the tasks considered by the authors do not include navigation subtasks (e.g., find water, look for a cow, wait for a plant to grow, go back to a table, etc.) or certain optional but useful subtasks (e.g., swords and the skill of dodging arrows contribute to making the task of killing skeletons easier), leading to abrupt drops in success rates in the hierarchy instead of a more gradual increase in difficulty.
8784
![Partial Hierarchical structure of the Crafter environment. Inspired from Figure 4 of [@hafner2022benchmarking]\label{fig:CrafterRequirements}](docs/images/CrafterRequirementsGraph.png){ width=80% }
8885

8986

90-
91-
9287
### PDDLGym
9388

9489
PDDLGym [@PDDLgym] is a Python library that automatically constructs Gym environments from Planning Domain Definition Language (PDDL) domains and problems. PDDL [@PDDL] functions as a problem specification language, facilitating the comparison of different symbolic planners. However, constructing PDDL domains and problems with a hierarchical structure is challenging and time-consuming, especially for researchers unfamiliar with PDDL-like languages.
@@ -100,38 +95,42 @@ Additionally, PDDLGym is **compatible only with PDDL1** and does not support num
10095
The NetHack learning environment [@kuttler2020nethack] is based on the game NetHack, where the observation is a grid composed of hundreds of possible symbols.
10196
Large numbers of items are randomly placed in each level, making NetHack extremely complex and challenging. In fact, NetHack is **too complex for agents to learn**, it requires many environment steps for agents to acquire domain-specific knowledge. 10B steps were required for the NeurIPS 2021 NetHack challenge [@2021NetHack], making it impractically long for a benchmark. Moreover, the NetHack game also has a **fixed underlying hierarchy** that cannot be easily modified. -->
10297

103-
### Arcade Learning Environment (Atari)
104-
105-
The arcade learning environment [@ALE] stands as a standard benchmark in reinforcement learning, encompassing over 55 Atari games. However, **only a few of these games, such as Montezuma's Revenge and Pitfall, necessitate hierarchical reasoning**. Each Atari games has a **fixed hierarchy that cannot be modified** and agents **demand substantial computational resources** to extract relevant features from pixels or memory, significantly slowing down experiments.
10698

99+
### Arcade Learning Environment (Atari)
100+
The arcade learning environment [@ALE] stands as a standard benchmark in reinforcement learning, encompassing over 55 Atari games. However, **only a few of these games, such as Montezuma’s Revenge and Pitfall, necessitate hierarchical reasoning**. Each Atari games has a **fixed hierarchy that cannot be modified** and agents **demand substantial computational resources** to extract relevant features from pixels or memory, significantly slowing down experiments.
107101

108102
## Design goals
109103

110104
HierarchyCraft aims to be a fruitful tool for investigating hierarchical reasoning, focusing on achieving the following four design goals.
111105

112-
### 1. Hierarchical by design
113-
The action space of HierarchyCraft environments consists of sub-tasks, referred to as *Transformations*, as opposed to detailed movements and controls. But each of *Transformations* has specific requirements to be valid (e.g. have enough of an item, be in the right place), and these requirements may necessitate the execution of other *Transformations* first, inherently creating a hierarchical structure in HierarchyCraft environments.
114106

115-
This concept is visually represented by the *Requirements graph* depicting the hierarchical relationships within each HierarchyCraft environment.
116-
The *Requirements graph* is directly constructed from the list of *Transformations* composing the environement, as illustrated in \autoref{fig:TransformationToRequirements}.
107+
### 1. Hierarchical by design
117108

109+
The action space of HierarchyCraft environments consists of sub-tasks, referred to as *Transformations*, as opposed to detailed movements and controls. But each of *Transformations* has specific requirements to be valid (e.g. have enough of an item, be in the right place), and these requirements may necessitate the execution of other *Transformations* first, inherently creating a hierarchical structure in HierarchyCraft environments.
110+
This concept is visually represented by the _Requirements graph_ depicting the hierarchical relationships within each HierarchyCraft environment.
111+
The _Requirements graph_ is directly constructed from the list of *Transformations* composing the environement, as illustrated in \autoref{fig:TransformationToRequirements}.
118112
Requirements graphs should be viewed as a generalization of previously observed graphical representations from related works, including \autoref{fig:CrafterRequirements} and \autoref{fig:MinigridHierarchies}.
119113

120114
![How sub-tasks build a hierarchical structure.\label{fig:TransformationToRequirements}](docs/images/TransformationToRequirementsLarge.png){ width=75% }
121115

116+
122117
### 2. No feature extraction needed
123-
In contrast to benchmarks that yield grids, pixel arrays, text, or sound, HierarchyCraft directly provides a low-dimensional representation that does not require the further features extraction, as depicted in Figure \autoref{fig:HierarchyCraftState}.
124-
This not only saves computational time but also enables researchers to concentrate on hierarchical reasoning while additionally allowing for the utilization of classical planning frameworks such as PDDL [@PDDL] or ANML [@ANML].
125118

119+
In contrast to benchmarks that yield grids, pixel arrays, text, or sound, HierarchyCraft directly provides a low-dimensional representation that does not require the further features extraction, as depicted in Figure \autoref{fig:HierarchyCraftState}.
120+
This not only saves computational time but also enables researchers to concentrate on hierarchical reasoning while additionally leveraging classical planning frameworks such as PDDL [@PDDL] or ANML [@ANML].
126121
![HierarchyCraft state is already a compact representation.\label{fig:HierarchyCraftState}](docs/images/HierarchyCraftStateLarge.png){ width=80% }
127122

123+
128124
### 3. Easy to use and customize
125+
129126
HierarchyCraft is a versatile framework enabling the creation of diverse hierarchical environments.
130127
The library is designed to be simple and flexible, allowing researchers to define their own hierarchical environments with detailed guidance provided in the documentation.
131128
To showcase the range of environments possible within HierarchyCraft, multiple examples are provided.
132129

133-
### 4. Compatible with domains frameworks
134-
HierarchyCraft environments are directly compatible with both reinforcement learning through Gymnasium [@gymnasium] and planning through the Unified Planning Framework [@UPF] (see \autoref{fig:HierarchyCraft-pipeline}).
130+
131+
### 4. Compatible with multiple frameworks
132+
133+
HierarchyCraft environments are directly compatible with both reinforcement learning through OpenAI Gym [@gym] and planning through the Unified Planning Framework [@UPF] (see \autoref{fig:HierarchyCraft-pipeline}).
135134
This compatibility facilitates usage by both the reinforcement learning and planning communities.
136135

137136
![HierarchyCraft pipeline into different representations.\label{fig:HierarchyCraft-pipeline}](docs/images/HierarchyCraft_pipeline.png){ width=80% }

0 commit comments

Comments
 (0)