unknown PR #2639

danaaubakirova · 2025-02-03T14:24:09Z

Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.

Preparing the Article

You're not quite done yet, though. Please make sure to follow this process (as documented here):

Add an entry to _blog.yml.
Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
Check you use a short title and blog path.
Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
Ensure the publication date is correct.
Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

Here is an example of a complete PR: #2382

Getting a Review

Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.

Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.

first draft, need to add usage and thumbnails

pcuenca · 2025-02-03T17:22:45Z

assets/192_pi0/fast.png

We store images (except the thumbnail) in this dataset, to keep this repo as lean as possible. Would it be possible to move this one there?

assets/192_pi0/thumbnail_pi0.001.png

pi0.md

pcuenca · 2025-02-03T17:29:35Z

pi0.md

+
+We have ported the first **robot foundation models** to **Hugging Face LeRobot**! Both **π0 and π0-FAST** developed by Physical Intelligence, which are now available in the **LeRobot repository**, bringing generalist robotic intelligence to the Hugging Face ecosystem. If you are curious about how Vision-Language-Action (VLA) models differ from Vision-Language Models (VLMs) and how actions are represented, dive into this blog post to find out! 
+
+[Huggingface collection of Pi0 models](https://huggingface.co/collections/lerobot/pi0-models-67a0f92dc62e52ace7220eba) | [LeRobot] (link)


Still not up, is it?

not yet! 30min

pcuenca · 2025-02-03T17:33:05Z

pi0.md

+
+---
+
+## 🔍 What is π0?


I think there are many bold terms in this section. Personally I find it somewhat distracting, but it's totally your call.

pcuenca · 2025-02-03T17:35:12Z

pi0.md

+
+π0 is trained on data from **7 robotic platforms** and **68 unique tasks**, demonstrating strong **zero-shot** and **fine-tuned performance** on complex, real-world tasks such as **laundry folding, table bussing, grocery bagging, box assembly, and object retrieval**.
+
+Unlike standard robotic policies, **π0 employs flow matching** to produce **smooth, real-time action trajectories at 50Hz**, making it highly **efficient, precise, and adaptable** for real-world deployment.


We don't need to explain how flow matching works, but perhaps it'd be helpful to contextualize a bit as one recent technique behind some great quality improvements in diffusion models.

pcuenca · 2025-02-03T17:37:14Z

pi0.md

+
+## How to Use π0 in LeRobot?
+
+### Fine-tuning the π0 Pretrained Model


Do we always need to fine-tune before use? Otherwise, we could explain how to use a fine-tuned model, and then the benefits of fine-tuning.

pcuenca · 2025-02-03T17:38:38Z

pi0.md

+
+### Fine-tuning the π0 Pretrained Model
+
+To fine-tune the **π0** model using the `pi0_base` checkpoint from `openpi`, run the following command:


Links here would be nice :)

pcuenca · 2025-02-03T17:39:11Z

pi0.md

+--dataset.repo_id=danaaubakirova/koch_test
+```
+
+To fine-tune the π0 neural network with PaliGemma and Expert Gemma, which were pretrained using VLM default parameters before π0 fine-tuning, execute:


Why do we need this?

explained above in suggestion!

pcuenca · 2025-02-03T17:43:46Z

pi0.md

+### **Handling 2D Attention Masks**
+The resulting **2D causal mask** exhibits strong **block sparsity**, but defining the boundaries of each block—especially in a batch of samples—is a bit tricky. We are used to causal masks with triangular structures for autoregressive modeling, but this is not one of these cases. 
+
+As you can see in this example below: the image (first element) has some padding tokens, representing empty cameras. Then, text tokens are added, with text tokens as well. This "prefix" part forms a fully noncausal attention, as in PaliGemma. Then, the suffix (state + action/time tokens) has a causal-block structure. The eager naive implementation matmuls and softmaxes over all of this, which is quite inefficient.


Suggested change

As you can see in this example below: the image (first element) has some padding tokens, representing empty cameras. Then, text tokens are added, with text tokens as well. This "prefix" part forms a fully noncausal attention, as in PaliGemma. Then, the suffix (state + action/time tokens) has a causal-block structure. The eager naive implementation matmuls and softmaxes over all of this, which is quite inefficient.

As you can see in this example below: the image (first element) has some padding tokens, representing empty cameras. Then, text tokens are added, with state tokens as well. This "prefix" part forms a fully noncausal attention, as in PaliGemma. Then, the suffix (state + action/time tokens) has a causal-block structure. The eager naive implementation matmuls and softmaxes over all of this, which is quite inefficient.

I guess? Or should we remove?

I think it"s missing 'text padding tokens' rather

pi0.md

Co-authored-by: Pedro Cuenca <[email protected]>

molbap · 2025-02-03T17:57:15Z

pi0.md

+Unlike standard robotic policies, **π0 employs flow matching** to produce **smooth, real-time action trajectories at 50Hz**, making it highly **efficient, precise, and adaptable** for real-world deployment.
+
+## How to Use π0 in LeRobot?
+


In line with @pcuenca I'd add something like this (needs the snippet @Cadene )

Suggested change

First of all, you need to upgrade your lerobot install, which leverages `transformers` as a dependency now! Simply do after a git clone

```bash

pip install -e ".[pi0]"

π0 models are foundational models that, much like PaliGemma, are made to be adapted to a variety of frameworks, environments, and scene inputs. The base models here are usable as-is, in particular π0.

Inference on π0 pretrained model

add python snippet...

However, the performances are reduced as it's a conversion from jax to torch and from a specific environment. We recommend fine-tuning your own π0 to your own environment, like below.

suggestions are broken with code snippets 💀 but seems fine

Co-authored-by: Pedro Cuenca <[email protected]>

pi0.md

Co-authored-by: Pedro Cuenca <[email protected]>

pi0.md

Co-authored-by: Pablo Montalvo <[email protected]>

pi0.md

Co-authored-by: Pablo Montalvo <[email protected]>

pi0.md

Co-authored-by: Pablo Montalvo <[email protected]>

molbap · 2025-02-03T19:04:57Z

pi0.md

+
+One approach is **semantic action representation**, where actions are described as **high-level concepts** like sub-tasks or keypoints. While this allows for few-shot and zero-shot learning, it often relies on hand-designed low-level controllers, limiting flexibility across different robots. In contrast, low-level control representations map actions directly to motor commands, enabling precise movements but making training **less stable and harder to scale**.
+
+Most existing VLAs use **discrete action tokenization**, converting continuous actions into discrete tokens generated autoregressively. The most common method—per-dimension, per-timestep binning—struggles with high-frequency control tasks, leading to lossy representations and **inefficient training**. Alternatives like vector quantization (VQ) and time-series compression help, but **VQ is sensitive to hyperparameters**, making it less reliable for diverse robot designs.


Suggested change

Most existing VLAs use **discrete action tokenization**, converting continuous actions into discrete tokens generated autoregressively. The most common method—per-dimension, per-timestep binning—struggles with high-frequency control tasks, leading to lossy representations and **inefficient training**. Alternatives like vector quantization (VQ) and time-series compression help, but **VQ is sensitive to hyperparameters**, making it less reliable for diverse robot designs.

Most existing VLAs use **discrete action tokenization**, converting continuous actions into discrete tokens generated autoregressively. The most common method—meaning per-dimension and per-timestep binning—struggles with high-frequency control tasks, leading to lossy representations and **inefficient training**. Alternatives like vector quantization (VQ) and time-series compression help, but **VQ is sensitive to hyperparameters**, making it less reliable for diverse robot designs.

This reverts commit 0029bfa.

danaaubakirova added 7 commits February 2, 2025 19:17

Create pi0.md

d9f8b61

update pi0.md

b321d3d

first draft, need to add usage and thumbnails

Update pi0.md

b63d2be

adding visuals for pi0

b94a58c

update pi0.md

b199b84

adding visuals pi0.md

f6b1ab0

Update pi0.md

d10f2da

danaaubakirova marked this pull request as draft February 3, 2025 15:01

danaaubakirova changed the title ~~Dana/pi0~~ unknown PR Feb 3, 2025

danaaubakirova added 6 commits February 3, 2025 16:14

Add files via upload

a8ebc1b

Update pi0.md

fc06f01

Update pi0.md

34228cd

Update pi0.md

c506a72

Update pi0.md

cfec54c

Update _blog.yml

9384089

danaaubakirova marked this pull request as ready for review February 3, 2025 17:02

danaaubakirova added 2 commits February 3, 2025 18:20

update pi0.md

6c2beba

Update pi0.md

5d8268f

pcuenca approved these changes Feb 3, 2025

View reviewed changes

pcuenca reviewed Feb 3, 2025

View reviewed changes

pi0.md Outdated Show resolved Hide resolved

danaaubakirova added 3 commits February 3, 2025 18:47

Update pi0.md

6812be5

Delete assets/192_pi0/fast.png

3844000

Delete assets/192_pi0/thumbnail_pi0.001.png

e313538

danaaubakirova and others added 2 commits February 3, 2025 18:51

Update pi0.md

9a0ea6e

Co-authored-by: Pedro Cuenca <[email protected]>

Update pi0.md

bc79a04