Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion deployments/charts/router/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ helm upgrade my-router ./router -f my-values.yaml

| Parameter | Description | Default |
|-----------|-------------|---------|
| `targetSchema` | pgroll schema version for search_path (e.g., `public_v6_2_0`). Leave empty to use the default `public` schema. | `""` |
| `targetSchema` | Database schema for search_path. Leave empty to use the default `public` schema. | `""` |
| `services.postgres.serviceName` | PostgreSQL service name | `postgres` |
| `services.postgres.port` | PostgreSQL service port | `5432` |
| `services.postgres.db` | PostgreSQL database name | `osmo` |
Expand Down
11 changes: 1 addition & 10 deletions deployments/charts/service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ This Helm chart deploys the OSMO platform with its core services and required si
| Parameter | Description | Default |
|-----------|-------------|---------|
| `services.migration.enabled` | Enable the pgroll migration Job (Helm pre-upgrade hook) | `false` |
| `services.migration.targetSchema` | Target pgroll schema version. Convention: `public_v{MAJOR}_{MINOR}_{PATCH}`. Updated per chart release. Set to `public_v{MAJOR}_{MINOR}_{PATCH}` to use versioned schemas. Versioned schemas ensure zero downtime between pod roll over. Setting to `public` will cause temporary disruption to existing pods as there could be database operations that are incompatible between versions. | `public` |
| `services.migration.targetSchema` | Target pgroll schema. Use `public` (the default). | `public` |
| `services.migration.image` | Container image for the migration Job | `postgres:15-alpine` |
| `services.migration.pgrollVersion` | pgroll release version to download | `v0.16.1` |
| `services.migration.serviceAccountName` | Service account name (defaults to global if empty) | `""` |
Expand All @@ -68,15 +68,6 @@ This Helm chart deploys the OSMO platform with its core services and required si

To add new migrations for future releases, drop JSON files into the chart's `migrations/` directory. They are automatically included via `.Files.Glob`.

#### Choosing `targetSchema`

| Scenario | `targetSchema` value | What happens |
|----------|---------------------|--------------|
| **Zero-downtime upgrades with multiple versions coexisting** | `public_v6_2_0` (default) | Creates a versioned schema with views. Old pods use `public`, new pods use versioned views. Both run simultaneously during gradual rollout. Re-deploys of the same version are instant no-ops (schema already exists). |
| **Simple migration without versioned schemas** | `public` | Applies migrations directly to `public`. No views created. Simpler but old and new pods cannot coexist safely if schema changes are breaking. |

When using the versioned schema (`public_v6_2_0`), also set `targetSchema` in the router chart and `OSMO_SCHEMA_VERSION` will be automatically injected into all service pods when `migration.enabled` is true.

### PostgreSQL Settings

| Parameter | Description | Default |
Expand Down
13 changes: 2 additions & 11 deletions deployments/upgrades/6_0_to_6_2_upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ SPDX-License-Identifier: Apache-2.0

- **New authentication architecture** — oauth2Proxy sidecar + authz sidecar replace the old Envoy-native oauth2Filter
- **RBAC system** — new database tables for users, roles, and role mappings managed by the authz sidecar
- **pgroll database migrations** — zero-downtime schema changes via versioned schemas
- **pgroll database migrations** — automated schema changes
- **Backend operator tokens must be recreated** — the RBAC migration deletes old `SERVICE` type access tokens; new tokens must be created before upgrading backend deployment charts

## Before you start
Expand All @@ -40,7 +40,7 @@ Depending on your deployment, follow the relevant sections:

### How pgroll works

OSMO 6.2 uses [pgroll](https://github.com/xataio/pgroll) for zero-downtime database schema migrations. pgroll applies migrations to the `public` schema and optionally creates a versioned schema (e.g., `public_v6_2_0`) containing views over all tables. Services set their PostgreSQL `search_path` to this versioned schema, allowing old and new versions to coexist during a rolling upgrade.
OSMO 6.2 uses [pgroll](https://github.com/xataio/pgroll) for database schema migrations. pgroll applies migrations directly to the `public` schema.

### Running migrations

Expand All @@ -50,7 +50,6 @@ Enable the migration job in the service chart values:
services:
migration:
enabled: true
targetSchema: public_v6_2_0
```

The migration runs as a Helm pre-upgrade hook before pods are updated. For ArgoCD, add:
Expand All @@ -65,14 +64,6 @@ services:

The database password is read from `OSMO_POSTGRES_PASSWORD` env var, or from the `postgres_password:` field in the file at `OSMO_CONFIG_FILE`.

### Choosing your upgrade path

**Direct upgrade (simpler, requires downtime):**
Set `targetSchema: public`. Migrations apply directly to the `public` schema. All services must be on 6.2 after the upgrade.

**Versioned schema (zero-downtime):**
Set `targetSchema: public_v6_2_0`. Both 6.0 and 6.2 services can run simultaneously. The router chart also needs `targetSchema: public_v6_2_0` set at the top level.

The migration script is idempotent — safe to run multiple times.

### Schema changes in 6.2
Expand Down
22 changes: 22 additions & 0 deletions docs/user_guide/getting_started/install/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,25 @@ After successful authentication, you are logged in. Welcome to OSMO.
:class: no-copybutton

Successfully logged in. Welcome <Your Full Name>.

Agent Skill
-----------

OSMO provides an agent skill that enables AI agents to interact with the OSMO CLI on your behalf.
Once installed, agents in tools such as Claude Code, Cursor and Codex can check GPU resources,
generate and submit workflows, monitor progress, diagnose failures, and orchestrate end-to-end
Physical AI workloads through natural language.

The skill follows the `Agent Skills <https://agentskills.io>`_ open standard and is compatible with
`30+ agent tools <https://skills.sh/>`_.

To install:

.. code-block:: bash

$ npx skills add NVIDIA/osmo

.. seealso::

See the `skills/README <https://github.com/NVIDIA/osmo/tree/main/skills>`_ for detailed
installation options and usage examples.
82 changes: 82 additions & 0 deletions skills/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

SPDX-License-Identifier: Apache-2.0
-->

# Agent Skills

Agent skills for the OSMO platform, built on the [Agent Skills](https://agentskills.io) open standard. Enables AI
agents to check GPU resources, generate and submit workflows, monitor progress, diagnose failures, and orchestrate
end-to-end Physical AI workloads.

Compatible with Claude Code, Cursor, Codex, GitHub Copilot, Gemini CLI, and [30+ other agent tools](https://skills.sh/).

## Prerequisites

The OSMO CLI must be installed and authenticated before using the skill. See the [Getting Started](https://nvidia.github.io/OSMO/main/user_guide/getting_started/install/index.html) guide for instructions.

## Installation

To install:

```bash
npx skills add NVIDIA/osmo
```

To update an existing installation:

```bash
npx skills update
```

To uninstall:

```bash
npx skills remove osmo-agent
```

## Usage

Once installed, the skill activates automatically when the agent detects relevant requests. Example prompts:

| Category | Example |
|----------|---------|
| Resource availability | "What GPUs are available?" |
| Workflow submission | "Submit workflow.yaml to available pool" |
| Monitoring | "What's the status of my last workflow?" |
| Failure diagnosis | "My workflow failed — figure out why and resubmit" |
| End-to-end orchestration | "Create a SDG workflow with Issac Sim, submit and monitor it, and download results when done" |

For complex workflows, the skill spawns specialized sub-agents to handle resource selection, YAML generation, submission, monitoring, logs fetching, failure diagnosis, and retries autonomously.

## Skill Contents

```
skills/osmo-agent/
├── SKILL.md # Main skill instructions
├── LICENSE # Apache-2.0
├── agents/
│ ├── workflow-expert.md # Sub-agent: workflow creation, submission, diagnosis
│ └── logs-reader.md # Sub-agent: log fetching and summarization
└── references/
├── cookbook.md # 40+ real-world workflow templates
├── workflow-patterns.md # Multi-task, parallel, data dependency patterns
└── advanced-patterns.md # Checkpointing, retry logic, node exclusion
```

## License

Apache-2.0 — see [osmo-agent/LICENSE](osmo-agent/LICENSE).
8 changes: 7 additions & 1 deletion skills/osmo-agent/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
name: osmo
name: osmo-agent
description: >
How to use the OSMO CLI to manage cloud compute resources for robotics development.
Use this skill whenever the user asks about available resources, nodes, pools, GPUs,
Expand All @@ -9,6 +9,12 @@ description: >
check the status or logs of a running/completed workflow, list or browse recent
workflow submissions, want to understand what a specific workflow does or is
configured to do, or want to create an OSMO app from a workflow.
license: Apache-2.0
compatibility: >
Requires osmo CLI installed and authenticated (osmo login).
metadata:
author: nvidia
version: "1.0.0"
---

# OSMO CLI Use Cases
Expand Down
1 change: 0 additions & 1 deletion src/cli/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,6 @@ osmo_py_binary(
srcs = ["cli_builder.py"],
deps = [
":cli_lib",
requirement("backports-tarfile"),
requirement("pyinstaller"),
requirement("shtab"),
],
Expand Down
4 changes: 2 additions & 2 deletions src/cli/login.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,8 @@ class UrlValidator(pydantic.BaseModel):
token = args.token
else:
raise osmo_errors.OSMOUserError('Must provide token file with --token_file or --token')
refresh_url = login.construct_token_refresh_url(url, token)
service_client.login_manager.token_login(url, refresh_url)
refresh_url = login.construct_token_refresh_url(url)
service_client.login_manager.token_login(url, refresh_url, token)

# For developers, simply send username as a header
else:
Expand Down
6 changes: 3 additions & 3 deletions src/lib/utils/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,8 +226,8 @@ def dev_login(self, url: str, username: str):
self._login_storage = login.dev_login(url, username)
self._save_login_info(self._login_storage, welcome=True)

def token_login(self, url: str, access_token: str):
self._login_storage = login.token_login(url, access_token, self.user_agent)
def token_login(self, url: str, refresh_url: str, refresh_token: str):
self._login_storage = login.token_login(url, refresh_url, refresh_token, self.user_agent)
self._save_login_info(self._login_storage, welcome=True)

def logout(self):
Expand Down Expand Up @@ -274,7 +274,7 @@ def get_access_token(self) -> str | None:
raise osmo_errors.OSMOUserError('Must login first with "login" command')
if self._login_storage.token_login is None:
raise osmo_errors.OSMOUserError('Must login first with token')
return login.fetch_token_from_refresh_url(self._login_storage.token_login.refresh_url or '')
return self._login_storage.token_login.refresh_token


class ServiceClient():
Expand Down
22 changes: 11 additions & 11 deletions src/lib/utils/login.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
import os
import time
from typing import List, Literal
from urllib.parse import urlencode, urlparse

import pydantic
import requests # type: ignore
Expand Down Expand Up @@ -208,17 +207,19 @@ def owner_password_login(config: LoginConfig,
)


def construct_token_refresh_url(url: str, token: str) -> str:
return os.path.join(url, f'api/auth/jwt/access_token?{urlencode({"access_token": token})}')
def construct_token_refresh_url(url: str) -> str:
return os.path.join(url, 'api/auth/jwt/access_token')


def token_login(url: str,
refresh_url: str,
user_agent: str| None) -> LoginStorage:
refresh_token: str,
user_agent: str | None) -> LoginStorage:
headers = {}
if user_agent:
headers['User-Agent'] = user_agent
result = requests.get(refresh_url, timeout=TIMEOUT, headers=headers)
result = requests.post(refresh_url, json={'token': refresh_token},
timeout=TIMEOUT, headers=headers)
if result.status_code >= 300:
raise osmo_errors.OSMOServerError('Unable to refresh login token (status code ' \
f'{result.status_code}): {result.text}\n' \
Expand All @@ -228,7 +229,8 @@ def token_login(url: str,
url=url,
token_login=TokenLoginStorage(
id_token=result['token'],
refresh_url=refresh_url
refresh_url=refresh_url,
refresh_token=refresh_token
),
osmo_token=True
)
Expand Down Expand Up @@ -258,7 +260,9 @@ def refresh_id_token(config: LoginConfig, user_agent: str | None,
headers['User-Agent'] = user_agent

if osmo_token:
result = requests.get(token_login_storage.refresh_url, timeout=TIMEOUT, headers=headers)
result = requests.post(token_login_storage.refresh_url,
json={'token': token_login_storage.refresh_token},
timeout=TIMEOUT, headers=headers)
else:
result = requests.post(token_endpoint, data={
'grant_type': 'refresh_token',
Expand Down Expand Up @@ -289,7 +293,3 @@ def parse_allowed_pools(allowed_pools_header: str | None) -> List[str]:
return [pool.strip() for pool in allowed_pools_header.split(',') if pool.strip()]


def fetch_token_from_refresh_url(refresh_url: str) -> str | None:
parsed = urlparse(refresh_url)
query_params = dict(param.split('=') for param in parsed.query.split('&'))
return query_params.get('access_token', None)
34 changes: 16 additions & 18 deletions src/locked_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,6 @@ azure-storage-blob==12.26.0 \
--hash=sha256:5dd7d7824224f7de00bfeb032753601c982655173061e242f13be6e26d78d71f \
--hash=sha256:8c5631b8b22b4f53ec5fff2f3bededf34cfef111e2af613ad42c9e6de00a77fe
# via -r requirements.txt
backports-tarfile==1.2.0 \
--hash=sha256:77e284d754527b01fb1e6fa8a1afe577858ebe4e9dad8919e34c862cb399bc34 \
--hash=sha256:d75e02c268746e1b8144c278978b6e98e85de6ad16f8e4b0844a154557eca991
# via -r requirements.txt
boto3==1.38.0 \
--hash=sha256:8b6544eca17e31d1bfd538e5d152b96a68d6c92950352a0cd9679f89d217d53a \
--hash=sha256:96898facb164b47859d40a4271007824a0a791c3811a7079ce52459d753d4474
Expand Down Expand Up @@ -602,7 +598,9 @@ lark==1.1.5 \
macholib==1.16.3 \
--hash=sha256:07ae9e15e8e4cd9a788013d81f5908b3609aa76f9b1421bae9c4d7606ec86a30 \
--hash=sha256:0e315d7583d38b8c77e815b1ecbdbf504a8258d8b3e17b61165c6feb60d18f2c
# via -r requirements.txt
# via
# -r requirements.txt
# pyinstaller
markupsafe==3.0.3 \
--hash=sha256:0303439a41979d9e74d18ff5e2dd8c43ed6c6001fd40e5bf2e43f7bd9bbc523f \
--hash=sha256:068f375c472b3e7acbe2d5318dea141359e6900156b5b2ba06a30b169086b91a \
Expand Down Expand Up @@ -937,19 +935,19 @@ pydantic==1.10.26 \
# via
# -r requirements.txt
# fastapi
pyinstaller==6.12.0 \
--hash=sha256:0c271896a3a168f4f91827145702543db9c5427f4c7372a6df8c75925a3ac18a \
--hash=sha256:0e62d3906309248409f215b386f33afec845214e69cc0f296b93222b26a88f43 \
--hash=sha256:138856a5a503bb69c066377e0a22671b0db063e9cc14d5cf5c798a53561200d3 \
--hash=sha256:1834797be48ce1b26015af68bdeb3c61a6c7500136f04e0fc65e468115dec777 \
--hash=sha256:68f1e4cecf88a6272063977fa2a2c69ad37cf568e5901769d7206d0314c74f47 \
--hash=sha256:83c7f3bde9871b4a6aa71c66a96e8ba5c21668ce711ed97f510b9382d10aac6c \
--hash=sha256:8e92e9873a616547bbabbb5a3a9843d5f2ab40c3d8b26810acdf0fe257bee4cf \
--hash=sha256:a2abf5fde31a8b38b6df7939bcef8ac1d0c51e97e25317ce3555cd675259750f \
--hash=sha256:a69818815c6e0711c727edc30680cb1f81c691b59de35db81a2d9e0ae26a9ef1 \
--hash=sha256:aefe502d55c9cf6aeaed7feba80b5f8491ce43f8f2b5fe2d9aadca3ee5a05bc4 \
--hash=sha256:dac8a27988dbc33cdc34f2046803258bc3f6829de24de52745a5daa22bdba0f1 \
--hash=sha256:fea76fc9b55ffa730fcf90beb897cce4399938460b0b6f40507fbebfc752c753
pyinstaller==6.19.0 \
--hash=sha256:1ec54ef967996ca61dacba676227e2b23219878ccce5ee9d6f3aada7b8ed8abf \
--hash=sha256:3c5c251054fe4cfaa04c34a363dcfbf811545438cb7198304cd444756bc2edd2 \
--hash=sha256:4190e76b74f0c4b5c5f11ac360928cd2e36ec8e3194d437bf6b8648c7bc0c134 \
--hash=sha256:481a909c8e60c8692fc60fcb1344d984b44b943f8bc9682f2fcdae305ad297e6 \
--hash=sha256:4ab2bb52e58448e14ddf9450601bdedd66800465043501c1d8f1cab87b60b122 \
--hash=sha256:8bd68abd812d8a6ba33b9f1810e91fee0f325969733721b78151f0065319ca11 \
--hash=sha256:a0fc5f6b3c55aa54353f0c74ffa59b1115433c1850c6f655d62b461a2ed6cbbe \
--hash=sha256:b5bb6536c6560330d364d91522250f254b107cf69129d9cbcd0e6727c570be33 \
--hash=sha256:c2d5a539b0bfe6159d5522c8c70e1c0e487f22c2badae0f97d45246223b798ea \
--hash=sha256:da6d5c6391ccefe73554b9fa29b86001c8e378e0f20c2a4004f836ba537eff63 \
--hash=sha256:e649ba6bd1b0b89b210ad92adb5fbdc8a42dd2c5ca4f72ef3a0bfec83a424b83 \
--hash=sha256:ec73aeb8bd9b7f2f1240d328a4542e90b3c6e6fbc106014778431c616592a865
# via -r requirements.txt
pyinstaller-hooks-contrib==2026.1 \
--hash=sha256:66ad4888ba67de6f3cfd7ef554f9dd1a4389e2eb19f84d7129a5a6818e3f2180 \
Expand Down
3 changes: 2 additions & 1 deletion src/operator/utils/login.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ def get_login_info(
raise osmo_errors.OSMOUserError('Must provide token')
return login.token_login(
config.service_url,
login.construct_token_refresh_url(config.service_url, token),
login.construct_token_refresh_url(config.service_url),
token,
user_agent=user_agent,
)
else:
Expand Down
4 changes: 1 addition & 3 deletions src/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,7 @@ cryptography==46.0.5
jwcrypto==1.5.6

# Pyinstaller
pyinstaller==6.12.0
# backports-tarfile: required by jaraco.context (vendored in setuptools 82+) on Python < 3.12
backports-tarfile==1.2.0
pyinstaller==6.19.0

# Yaml
pyyaml==6.0.3
Expand Down
Loading