You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-12Lines changed: 34 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,34 @@
1
1
## Inverse Constraint Learning
2
-
This repository contains the code for ICL paper. After you run any command, the
3
-
results will be logged to tensorboard.
4
-
5
-
## Before running anything
6
-
* Install OpenMPI.
7
-
* Ensure Mujoco libraries (2.1.0) are installed.
8
-
* Update lines 410-413 in `tools/environments/exiD_environment.py` to reflect the directory of
9
-
ExiD dataset.
2
+
3
+
Paper: [Learning Soft Constraints From Constrained Expert Demonstrations, Gaurav et al. (2023)](https://openreview.net/forum?id=8sSnD78NqTN)
4
+
5
+
This repository contains the code for ICL paper. After you run any command, the results will be logged to tensorboard.
6
+
7
+
## How does it work?
8
+
9
+
Constrained RL takes in reward and constraint(s) and produces an optimal constrained policy.
10
+
11
+
<imgsrc="images/crl.png"width=400>
12
+
13
+
The inverse problem, i.e. Inverse Constrained RL takes in a dataset of trajectories sampled using an optimal expert and produces a set of reward and constraint(s) such that they produce the expert policy when CRL is performed with them.
14
+
15
+
<imgsrc="images/icrl.png"width=400>
16
+
17
+
Due to unidentifiability, Inverse Constrained RL is a difficult problem. Hence, we solve a simplified problem - i.e. we assume the reward is known and that we only need to learn a single constraint.
18
+
19
+
<imgsrc="images/icl.png"width=400>
20
+
21
+
The idea is inspired from the IRL template, which alternates between policy optimization and reward adjustment. In our case, we alternate between constrained policy optimization and constraint function adjustment.
22
+
23
+
<imgsrc="images/template.png"width=400>
24
+
25
+
For further details regarding the optimization and algorithm, please see the [paper](https://openreview.net/forum?id=8sSnD78NqTN).
26
+
27
+
We conduct several experiments across synthetic environments, robotics environments and real world highway environments. The steps to run these experiments are detailed further in this README.
28
+
29
+
## Setup
30
+
* Install OpenMPI and Mujoco 2.1.0
31
+
* Update `tools/__init__.py` constants to have the correct directories for ExiD dataset.
10
32
* Install `tools` package by running `pip install .` in the root directory.
11
33
12
34
## High level workflow
@@ -31,9 +53,9 @@ ExiD dataset.
31
53
data in `tools/assets/exiD`, already provided, which was generated using `prepare_exid_data.py`)
32
54
* Generate for other environments: `python3 -B expert.py -c configs/ENV.json`
0 commit comments