You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: paper.md
+28-29
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: 'Benchmarking Hierarchical Reasoning with HierarchyCraft'
2
+
title: 'HierarchyCraft: A Benchmark builder For Hierarchical Reasoning'
3
3
tags:
4
4
- Python
5
5
- Hierarchy
@@ -20,7 +20,7 @@ authors:
20
20
affiliation: "1"
21
21
- name: Yuxuan Li
22
22
orcid: 0000-0001-5522-312X
23
-
affiliation: "1,3"
23
+
affiliation: "1,3"
24
24
- name: Matthew E. Taylor
25
25
orcid: 0000-0001-8946-0211
26
26
affiliation: "1, 3"
@@ -36,9 +36,10 @@ bibliography: paper.bib
36
36
37
37
---
38
38
39
+
39
40
# Summary
40
-
Hierarchical reasoning poses a fundamental challenge in the field of artificial intelligence [@botvinick2014model]. Existing methods may struggle when confronted with hierarchical tasks [@bacon2017option,@heess2016learning,@nachum2018data], yet there is a scarcity of suitable environments or benchmarks designed to comprehend how the structure of the underlying hierarchy influence a task difficulty. Our software represents a crucial initial step in the development of tools aimed at addressing research questions related to hierarchical reasoning.
41
41
42
+
Hierarchical reasoning poses a fundamental challenge in the field of artificial intelligence [@botvinick2014model]. Existing methods may struggle when confronted with hierarchical tasks [@bacon2017option,@heess2016learning,@nachum2018data], yet there is a scarcity of suitable environments or benchmarks designed to comprehend how the structure of the underlying hierarchy influence a task difficulty. Our software represents an important initial step in the development of tools aimed at addressing research questions related to hierarchical reasoning.
42
43
We introduce **HierarchyCraft**, a lightweight environment builder designed for creating hierarchical reasoning tasks that do not necessitate feature extraction. This includes tasks involving pixel images, text, sound, or other data types where deep learning-based feature extraction is commonly employed.
43
44
HierarchyCraft serves a dual purpose by offering a set of pre-defined hierarchical environments and simplifying the process of creating customized hierarchical environments.
44
45
@@ -47,14 +48,15 @@ HierarchyCraft serves a dual purpose by offering a set of pre-defined hierarchic
47
48
48
49
49
50
# Statement of need
51
+
50
52
HierarchyCraft is designed as a user-friendly Python library for constructing environments tailored to the study of hierarchical reasoning in the contexts of reinforcement learning, classical planning, and program synthesis as displayed in \autoref{fig:HierachyCraft_domain_position}.
51
53
52
54
Analysis and quantification of the impacts of diverse hierarchical structures on learning agents is essential for advancing hierarchical reasoning.
53
55
However, current hierarchical benchmarks often limit themselves to a single hierarchical structure per benchmark, and present challenges not only due to this inherent hierarchical structure but also because of the necessary representation learning to interpret the inputs.
54
56
55
57
We argue that arbitrary hierarchical complexity can emerge from simple rules without the need for learning a representation.
56
-
To the best of our knowledge, no general frameworks currently exist for constructing environments dedicated to studying the hierarchical structure itself, underscoring the necessity for the development of tools like HierarchyCraft.
57
-
We compare five particularly related benchmarks to HierarchyCraft.
58
+
To the best of our knowledge, no general frameworks currently exist for constructing environments dedicated to studying the hierarchical structure itself. We next highlight five related benchmarks, underscoring the necessity for the development of a tool like HierarchyCraft.
59
+
58
60
59
61
### GridWorld
60
62
@@ -64,31 +66,24 @@ Minigrid [@minigrid] is a user-friendly Python library that not only implements
64
66
65
67
{ width=100% }
66
68
69
+
67
70
### Minecraft
68
71
69
72
An exemplary instance of a hierarchical task is the collection of diamonds in the popular video game Minecraft, as showcased in the MineRL competition [@guss2021minerl2020], where hierarchical reinforcement learning agents have dominated the leaderboard[@milani2020minerl2019].
70
-
71
-
Due to the sparse rewards, exploration difficulty, and long time horizons in this procedurally generated sandbox environment, DreamerV3 [@dreamerv3] recently became the first algorithm to successfully collect diamonds in Minecraft without prior training or knowledge.
72
-
Unfortunately, DreamerV3 required training on an Nvidia V100 GPU for 17 days, gathering around 100 million environmental steps.
73
-
Such **substantial computational resources are inaccessible to most researchers**, impeding the overall progress of research on hierarchical reasoning.
74
-
73
+
Due to sparse rewards, the difficulty of exploration, and long time horizons in this procedurally generated sandbox environment, DreamerV3 [@dreamerv3] recently became the first algorithm to successfully collect diamonds in Minecraft without prior training or knowledge.
74
+
Unfortunately, DreamerV3 required training on an Nvidia V100 GPU for 17 days, gathering roughly 100 million environmental steps.
75
+
Such **substantial computational resources are unavailable to many researchers**, impeding the overall progress of research on hierarchical reasoning.
75
76
Moreover, although Minecraft has an undeniably complex hierarchical structure, **this underlying hierarchical structure is fixed** and cannot be modified without modding the game, a complex task for researchers.
76
77
77
78
78
-
79
-
80
79
### Crafter
81
80
82
81
Crafter [@hafner2022benchmarking] presents a lightweight grid-based 2D environment, with game mechanics akin to Minecraft and poses similar challenges including exploration, representation learning, rewards sparsity and long-term reasoning.
83
-
Although Crafter offers 22 different tasks displayed in \autoref{fig:CrafterRequirements}, the **fixed underlying hierarchical structure** restricts how researchers can investigate the impacts of changes in this structure.
84
-
85
-
Furthermore, the tasks considered by the authors do not include navigation subtasks (e.g., find water, look for a cow, wait for a plant to grow, go back to a table...) or certain optional but useful subtasks (e.g., swords and the skill of dodging arrows contribute to making the task of killing skeletons easier), leading to abrupt drops in success rates in the hierarchy instead of a more gradual increase in difficulty.
86
-
82
+
Although Crafter offers 22 different tasks displayed in \autoref{fig:CrafterRequirements}, the **underlying hierarchical structure is fixed**, restricting how researchers can investigate the impacts of changes to the hirerachy.
83
+
Furthermore, the tasks considered by the authors do not include navigation subtasks (e.g., find water, look for a cow, wait for a plant to grow, go back to a table, etc.) or certain optional but useful subtasks (e.g., swords and the skill of dodging arrows contribute to making the task of killing skeletons easier), leading to abrupt drops in success rates in the hierarchy instead of a more gradual increase in difficulty.
87
84
![Partial Hierarchical structure of the Crafter environment. Inspired from Figure 4 of [@hafner2022benchmarking]\label{fig:CrafterRequirements}](docs/images/CrafterRequirementsGraph.png){ width=80% }
88
85
89
86
90
-
91
-
92
87
### PDDLGym
93
88
94
89
PDDLGym [@PDDLgym] is a Python library that automatically constructs Gym environments from Planning Domain Definition Language (PDDL) domains and problems. PDDL [@PDDL] functions as a problem specification language, facilitating the comparison of different symbolic planners. However, constructing PDDL domains and problems with a hierarchical structure is challenging and time-consuming, especially for researchers unfamiliar with PDDL-like languages.
@@ -100,38 +95,42 @@ Additionally, PDDLGym is **compatible only with PDDL1** and does not support num
100
95
The NetHack learning environment [@kuttler2020nethack] is based on the game NetHack, where the observation is a grid composed of hundreds of possible symbols.
101
96
Large numbers of items are randomly placed in each level, making NetHack extremely complex and challenging. In fact, NetHack is **too complex for agents to learn**, it requires many environment steps for agents to acquire domain-specific knowledge. 10B steps were required for the NeurIPS 2021 NetHack challenge [@2021NetHack], making it impractically long for a benchmark. Moreover, the NetHack game also has a **fixed underlying hierarchy** that cannot be easily modified. -->
102
97
103
-
### Arcade Learning Environment (Atari)
104
-
105
-
The arcade learning environment [@ALE] stands as a standard benchmark in reinforcement learning, encompassing over 55 Atari games. However, **only a few of these games, such as Montezuma's Revenge and Pitfall, necessitate hierarchical reasoning**. Each Atari games has a **fixed hierarchy that cannot be modified** and agents **demand substantial computational resources** to extract relevant features from pixels or memory, significantly slowing down experiments.
106
98
99
+
### Arcade Learning Environment (Atari)
100
+
The arcade learning environment [@ALE] stands as a standard benchmark in reinforcement learning, encompassing over 55 Atari games. However, **only a few of these games, such as Montezuma’s Revenge and Pitfall, necessitate hierarchical reasoning**. Each Atari games has a **fixed hierarchy that cannot be modified** and agents **demand substantial computational resources** to extract relevant features from pixels or memory, significantly slowing down experiments.
107
101
108
102
## Design goals
109
103
110
104
HierarchyCraft aims to be a fruitful tool for investigating hierarchical reasoning, focusing on achieving the following four design goals.
111
105
112
-
### 1. Hierarchical by design
113
-
The action space of HierarchyCraft environments consists of sub-tasks, referred to as *Transformations*, as opposed to detailed movements and controls. But each of *Transformations* has specific requirements to be valid (e.g. have enough of an item, be in the right place), and these requirements may necessitate the execution of other *Transformations* first, inherently creating a hierarchical structure in HierarchyCraft environments.
114
106
115
-
This concept is visually represented by the *Requirements graph* depicting the hierarchical relationships within each HierarchyCraft environment.
116
-
The *Requirements graph* is directly constructed from the list of *Transformations* composing the environement, as illustrated in \autoref{fig:TransformationToRequirements}.
107
+
### 1. Hierarchical by design
117
108
109
+
The action space of HierarchyCraft environments consists of sub-tasks, referred to as *Transformations*, as opposed to detailed movements and controls. But each of *Transformations* has specific requirements to be valid (e.g. have enough of an item, be in the right place), and these requirements may necessitate the execution of other *Transformations* first, inherently creating a hierarchical structure in HierarchyCraft environments.
110
+
This concept is visually represented by the _Requirements graph_ depicting the hierarchical relationships within each HierarchyCraft environment.
111
+
The _Requirements graph_ is directly constructed from the list of *Transformations* composing the environement, as illustrated in \autoref{fig:TransformationToRequirements}.
118
112
Requirements graphs should be viewed as a generalization of previously observed graphical representations from related works, including \autoref{fig:CrafterRequirements} and \autoref{fig:MinigridHierarchies}.
119
113
120
114
{ width=75% }
121
115
116
+
122
117
### 2. No feature extraction needed
123
-
In contrast to benchmarks that yield grids, pixel arrays, text, or sound, HierarchyCraft directly provides a low-dimensional representation that does not require the further features extraction, as depicted in Figure \autoref{fig:HierarchyCraftState}.
124
-
This not only saves computational time but also enables researchers to concentrate on hierarchical reasoning while additionally allowing for the utilization of classical planning frameworks such as PDDL [@PDDL] or ANML [@ANML].
125
118
119
+
In contrast to benchmarks that yield grids, pixel arrays, text, or sound, HierarchyCraft directly provides a low-dimensional representation that does not require the further features extraction, as depicted in Figure \autoref{fig:HierarchyCraftState}.
120
+
This not only saves computational time but also enables researchers to concentrate on hierarchical reasoning while additionally leveraging classical planning frameworks such as PDDL [@PDDL] or ANML [@ANML].
126
121
{ width=80% }
127
122
123
+
128
124
### 3. Easy to use and customize
125
+
129
126
HierarchyCraft is a versatile framework enabling the creation of diverse hierarchical environments.
130
127
The library is designed to be simple and flexible, allowing researchers to define their own hierarchical environments with detailed guidance provided in the documentation.
131
128
To showcase the range of environments possible within HierarchyCraft, multiple examples are provided.
132
129
133
-
### 4. Compatible with domains frameworks
134
-
HierarchyCraft environments are directly compatible with both reinforcement learning through Gymnasium [@gymnasium] and planning through the Unified Planning Framework [@UPF] (see \autoref{fig:HierarchyCraft-pipeline}).
130
+
131
+
### 4. Compatible with multiple frameworks
132
+
133
+
HierarchyCraft environments are directly compatible with both reinforcement learning through OpenAI Gym [@gym] and planning through the Unified Planning Framework [@UPF] (see \autoref{fig:HierarchyCraft-pipeline}).
135
134
This compatibility facilitates usage by both the reinforcement learning and planning communities.
136
135
137
136
{ width=80% }
0 commit comments