Missing evaluation code for Simpler-env (Subject/Physical/Semantics Generalization)

Hi, thank you for the great work and for open-sourcing the code!

In the paper, four types of evaluations are conducted in the Simpler-env benchmark:  
- **In-Domain**  
- **Subject Generalization**  
- **Physical Generalization**  
- **Semantics Generalization**

However, after checking the provided evaluation scripts in the repository, it seems that all four available tasks correspond to **In-Domain** evaluation only. I could not find any code or instructions for reproducing the other three types of generalization results.

Could you kindly clarify:
1. Are there existing scripts for **Subject**, **Physical**, and **Semantics Generalization** evaluations?
2. If not, could you provide guidance to replicate these generalization results?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing evaluation code for Simpler-env (Subject/Physical/Semantics Generalization) #25

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Missing evaluation code for Simpler-env (Subject/Physical/Semantics Generalization) #25

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions