Add OSWorld VLM training cookbook and integration #1364

jbarnes850 · 2026-01-08T17:23:32Z

This PR adds an OSWorld VLM training cookbook and reference integration for multi‑turn desktop automation in Slime. It is written for a reviewer who has not seen OSWorld before and explains how Slime connects to a VM‑backed GUI environment, why an HTTP bridge is required, and how training is run end‑to‑end without touching Slime core code.

OSWorld depends on desktop‑env (torch 2.5.1), while Slime’s training stack uses torch 2.9+, so the environment runs on the host and training runs in the container.

The cookbook makes that split explicit, proxies environment calls through a stateless HTTP server, and drives a multi‑turn VLM rollout loop that produces Qwen3‑VL multimodal tokens and aligned loss masks. Experience replay is used to avoid advantage collapse in sparse reward tasks, and reward shaping provides partial credit for valid actions, execution, and UI changes.

Reproducible artifacts are referenced to validate the pipeline without hunting for external dependencies: the task registry and replay buffer are in Jarrodbarnes/osworld-union-v1, the SFT warmup checkpoint is Jarrodbarnes/osworld-vlm-sft-step25, and the GSPO checkpoint is Jarrodbarnes/osworld-vlm-gspo-step25. Metrics for the training runs are tracked in W&B at jbarnes850-near-protocol/osworld-grpo.

The README documents the OSWorld subset used for training (Ubuntu only), how to launch the host VM, how the HTTP bridge is wired, and how to run training. Verification was done by running pre‑commit and compiling the example module.

This PR keeps all code confined to examples/osworld/ and does not modify Slime core paths.

GuanxingLu · 2026-01-11T15:02:15Z

This looks awesome. Any chance you could share some training logs (e.g., reward curve) or a quick demo?

- Add warning when CP_SIZE < 2 (VLM sequences can exceed 16K tokens) - Print training configuration summary for easier debugging - Helps diagnose OOM issues during backward pass with long sequences

jbarnes850 marked this pull request as ready for review January 8, 2026 18:46

Add OSWorld VLM GSPO cookbook

983cd40

jbarnes850 force-pushed the feature/osworld-vlm-cookbook branch from edff4f7 to 983cd40 Compare January 8, 2026 23:10

jbarnes850 added 3 commits January 9, 2026 14:13

Fix multimodal loss mask alignment for OSWorld

84a3453

Align replay image resize with online observations

5d31850

Tune OSWorld rewards and evaluation defaults

3b9a06f

jbarnes850 marked this pull request as draft January 12, 2026 01:20

jbarnes850 added 22 commits January 12, 2026 11:20

Fix prompt length filtering for text prompts

9a20917

Parse metadata JSON in dataset loader

520fcc8

Fix rollout logging and raise VLM cache default

6d7ee45

Add CP safety warning and config output for VLM training

e2b5e9a

- Add warning when CP_SIZE < 2 (VLM sequences can exceed 16K tokens) - Print training configuration summary for easier debugging - Helps diagnose OOM issues during backward pass with long sequences

Improve OSWorld rollout efficiency and CP safety

2e9c3d5

Drop aborted OSWorld samples before training

8f598ca

Drop samples missing multimodal inputs

1d93be6

Parallelize OSWorld rollout sampling

d674b84

Parallelize OSWorld env calls via threads

050c0f0

Allow fast-path rollout overrides

43a1125

Guard against image token mismatch

5e83442

Use replay-overlap task subset by default

de6254e

Rename train dataset artifacts and defaults

5c7df3b

osworld: harden multimodal mask alignment and host setup

2c758c3

osworld: clarify host cwd and inference location

0f2c4d4

osworld: default to 8xH100 balanced rollout/train

7a1daf1

osworld: parallelize rollouts across prompts

d71d375

osworld: default rollout batch size to 8

d97117d

Reduce OSWorld rollout load defaults

75f5332

Align loss mask with processor tokenization

082bf91

Account for Qwen3-VL spatial merge in image token check

f15d15e

Fix merge_size attribute lookup for Qwen3-VL

fadf45c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OSWorld VLM training cookbook and integration #1364

Add OSWorld VLM training cookbook and integration #1364

jbarnes850 commented Jan 8, 2026 •

edited

Loading

Uh oh!

GuanxingLu commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add OSWorld VLM training cookbook and integration #1364

Are you sure you want to change the base?

Add OSWorld VLM training cookbook and integration #1364

Conversation

jbarnes850 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GuanxingLu commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jbarnes850 commented Jan 8, 2026 •

edited

Loading