The results of the pi0 checkpoints are worse than expected

Thanks for your work!

I tested the VLABench/pi0-base-primitive ckpt you released on HF, but the results are not good:

Command | select_painting | select_book | select_drink | select_chemistry_tube | select_poker | select_mahjong | select_toy | select_fruit | add_condiment | insert_flower
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`SR` | 0.4 | 0.0 | 0.0 | 0.1 | 0.2 | 0.22 | 0.5 | 0.4 | 0.0 | 0.0
`PS` | 0.4 | 0.0 | 0.0 | 0.05 | 0.2 | 0.28 | 0.75 | 0.65 | 0.03 | 0.1

Here are the commands I used.

run the server policy:
uv run scripts/serve_policy.py --env VLABENCH policy:checkpoint --policy.config=pi0_vlabench_primitive_lora --policy.dir=checkpoints/pi0-base-primitive

eval pi0:
python examples/vlabench/eval.py --args.host=0.0.0.0 --args.tasks="select_painting select_book select_drink select_chemistry_tube select_poker select_mahjong select_toy select_fruit add_condiment insert_flower" --args.episode_config_path="/data2/shujunyang/VLABench/VLABench/configs/evaluation/tracks/track_1_in_distribution.json" --args.save_dir="data/vlabench/pi0_base_lora/track_1" --args.n_episode=10

Looking forward to your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The results of the pi0 checkpoints are worse than expected #46

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Command	select_painting	select_book	select_drink	select_chemistry_tube	select_poker	select_mahjong	select_toy	select_fruit	add_condiment	insert_flower
`SR`	0.4	0.0	0.0	0.1	0.2	0.22	0.5	0.4	0.0	0.0
`PS`	0.4	0.0	0.0	0.05	0.2	0.28	0.75	0.65	0.03	0.1

The results of the pi0 checkpoints are worse than expected #46

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions