NVIDIA · cypres · Mar 4, 2026 · Mar 3, 2026 · Mar 3, 2026 · Mar 3, 2026
@@ -18,7 +18,7 @@
 .. _keycloak_setup:
 
 ================================================
-Keycloak as an Identity Provider for OSMO
+Keycloak as a sample IdP
 ================================================
 
 This guide describes how to deploy `Keycloak <https://www.keycloak.org/>`_ and configure it as the identity provider (IdP) for OSMO. Keycloak acts as an authentication broker, allowing OSMO to authenticate users through various identity providers (LDAP, SAML, social logins) while providing centralized group and role management.
@@ -396,7 +396,7 @@ The typical workflow for setting up access control is:
 2. Create groups in Keycloak
 3. Assign roles to groups
 4. Add users to groups (manually or via identity provider mappings)
-5. Create matching pools in OSMO
+5. Create matching pools
 6. Verify access
 
 .. _keycloak_create_roles:
@@ -551,7 +551,7 @@ User Cannot Access Pool
 
 **Solutions**:
 
-1. **Verify Role Policy in OSMO**: Ensure the corresponding role has been created in OSMO. Follow the steps in :ref:`troubleshooting_roles_policies`.
+1. **Verify Role Policy**: Ensure the corresponding role has been created. Follow the steps in :ref:`troubleshooting_roles_policies`.
 
 2. **Verify Role Names**: Pool access roles must start with ``osmo-`` prefix (see :ref:`role_naming_for_pools`). Pool names must match the role suffix. Example: Role ``osmo-team1`` will make pools named ``team1*`` visible.
 

@@ -20,8 +20,8 @@ common OSMO CLI use cases.
 
 The `agents/` directory contains instructions for specialized subagents. Read them when you need to spawn the relevant subagent.
 
-- `agents/workflow-expert.md` — expert for workflow generation, resource check, submission, failure diagnosis
-- `agents/logs-reader.md` - expert for fetching and reading logs, extracting important information for monitoring and failure diagnosis.
+- `agents/workflow-expert.md` — workflow generation, resource check, submission, failure diagnosis
+- `agents/logs-reader.md` — log fetching and summarization for monitoring and failure diagnosis
 
 The `references/` directory has additional documentation:
 
@@ -31,6 +31,18 @@ The `references/` directory has additional documentation:
 
 ---
 
+## Intent Routing
+
+- Asks about resources, pools, GPUs, or quota → Check Available Resources
+- Wants to submit a job (simple, no monitoring) → Generate and Submit a Workflow
+- Wants to submit + monitor + handle failures → Orchestrate a Workflow End-to-End
+- Asks about a workflow's status or logs → Check Workflow Status
+- Lists recent workflows → List Workflows
+- Asks what a workflow does → Explain What a Workflow Does
+- Wants to publish a workflow as an app → Create an App
+
+---
+
 ## Use Case: Check Available Resources
 
 **When to use:** The user asks what resources, nodes, GPUs, or pools are available
@@ -115,7 +127,8 @@ Derive GPU type from pool names when possible:
 **When to use:** The user wants to submit a job to run on OSMO (e.g. "submit a workflow
 to run SDG", "run RL training for me", "submit this yaml to OSMO").
 
-Evaluate the complexity of the user's request: if user also wants monitoring, debugging workflows, reporting results, or the workflow complexity is too high, refer to `Orchestrate a Workflow End-to-End` use case to delegate this to a sub-agent instead.
+If the user also wants monitoring, debugging, or reporting results, use the
+"Orchestrate a Workflow End-to-End" use case instead.
 
 ### Steps
 
@@ -234,7 +247,8 @@ Evaluate the complexity of the user's request: if user also wants monitoring, de
 ## Use Case: Check Workflow Status
 
 **When to use:** The user asks about the status or logs of a workflow (e.g. "what's the
-status of workflow abc-123?", "is my workflow done?", "show me the logs for xyz").
+status of workflow abc-123?", "is my workflow done?", "show me the logs for xyz",
+"show me the resource usage for my workflow", "give me the Kubernetes dashboard link").
 Also used as the polling step when monitoring a workflow during end-to-end orchestration.
 
 ### Steps
@@ -243,6 +257,9 @@ Also used as the polling step when monitoring a workflow during end-to-end orche
    ```
    osmo workflow query <workflow name> --format-type json
    ```
+   **Cache the JSON result for the rest of the conversation.** If you have already queried
+   this workflow with `osmo workflow query` earlier in the conversation, reuse that JSON
+   — do not query again just to extract a field.
 
 2. **Get recent logs** — Choose the log-fetching method based on task count
    (this rule applies everywhere logs are needed — monitoring, failure diagnosis, etc.):
@@ -256,82 +273,98 @@ Also used as the polling step when monitoring a workflow during end-to-end orche
    - Concisely summarize what the logs show — what stage the job is at, any errors,
      or what it completed successfully
    - If the workflow failed, highlight the error and suggest next steps if possible
-   - **If the workflow is COMPLETED and has output datasets, you MUST ask this
-     explicit question before ending your response:**
-     `Would you like me to download the output dataset now?`
-     Also ask whether they want a specific output folder (default to `~/` if not).
-     Then run the download yourself:
+   - **Resource usage / Grafana link:** If the user asks about resource usage, GPU
+     utilization, or metrics for this workflow, extract `grafana_url` from the query
+     JSON. If present, render it as a clickable link:
+     `[View resource usage in Grafana](<grafana_url>)`
+     If the field is empty or null, tell the user: "The Grafana resource usage link is
+     not available for this workflow."
+   - **Kubernetes dashboard link:** If the user asks for the Kubernetes dashboard,
+     pod details, or a k8s link, extract `kubernetes_dashboard` from the query JSON.
+     If present, render it as a clickable link:
+     `[Open Kubernetes dashboard](<kubernetes_dashboard>)`
+     If the field is empty or null, tell the user: "The Kubernetes dashboard link is
+     not available for this workflow."
+   - Proactively include both links in any detailed status report (e.g. when the
+     workflow is RUNNING or has just COMPLETED) — users often want them without
+     explicitly asking. If a field is empty or null, note it as not available rather
+     than silently omitting it.
+   - **If PENDING** (or the user asks why it isn't scheduling), run:
      ```
-     osmo dataset download <dataset_name> <path>
+     osmo workflow events <workflow name>
      ```
-     Use `~/` as the output path if the user doesn't specify one.
-
-   - **After the dataset download question above**, if the workflow is COMPLETED,
-     also ask if the user would like to create an
-     OSMO app for it. Suggest a name derived from the workflow name (e.g. workflow
-     `sdg-run-42` → app name `sdg-run-42`) and generate a one-sentence description
-     based on what the workflow does. If the user agrees (or provides their own name),
-     follow the "Create an App" use case below.
-   - **When monitoring multiple workflows** that all complete from the same spec, offer
-     app creation once (not per workflow) after all workflows reach a terminal state.
-     Since they share the same YAML, a single app covers all runs. Do not skip this
-     offer just because you were in a batch monitoring loop.
-
-   **If the workflow is PENDING** (or the user asks why it isn't scheduling), run:
-   ```
-   osmo workflow events <workflow name>
-   ```
-   These are Kubernetes pod conditions and cluster events — translate them into plain
-   language without Kubernetes jargon (e.g. "there aren't enough free GPUs in the pool
-   to schedule your job" rather than "Insufficient nvidia.com/gpu"). Also direct the
-   user to check resource availability in the pool their workflow is waiting in:
+     Translate Kubernetes events into plain language (e.g. "there aren't enough free
+     GPUs in the pool" rather than "Insufficient nvidia.com/gpu"). Also check:
+     ```
+     osmo resource list -p <pool>
+     ```
+   - If COMPLETED, proceed to Step 4.
+
+4. **Handle completed workflows:**
+
+   Offer the output dataset for download:
+   `Would you like me to download the output dataset now?`
+   Ask whether they want a specific output folder (default to `~/`). Then run:
    ```
-   osmo resource list -p <pool>
+   osmo dataset download <dataset_name> <path>
    ```
+
+   Also offer to create an OSMO app. Suggest a name derived from the workflow name
+   (e.g. `sdg-run-42` → app name `sdg-run-42`) and generate a one-sentence description.
+   If the user agrees, follow the "Create an App" use case.
+
+   When monitoring multiple workflows from the same spec, offer app creation once
+   (not per workflow) after all reach a terminal state. Do not skip this offer
+   just because you were in a batch monitoring loop.
+
 ---
 
 ## Use Case: Orchestrate a Workflow End-to-End
 
-**When to use:** The user wants to create workflow, submit and monitor it to completion,
-or requests an autonomous workflow cycle (e.g. "train GR00T on my data", "create a SDG workflow and run it",
-"submit and monitor my workflow", "run end-to-end training", "submit this and
-tell me when it's done").
-
-### Phase-Split Pattern
+**When to use:** The user wants to create a workflow, submit it, and monitor it to
+completion (e.g. "train GR00T on my data", "submit and monitor my workflow",
+"run end-to-end training", "submit this and tell me when it's done").
 
-The lifecycle is split between the `/agents/workflow-expert.md` subagent (workflow generation creation, resource check, submission, failure diagnosis) and **you** (live monitoring so the user sees real-time updates). Follow these steps exactly:
+### Steps
 
-#### Step 1: Spawn a `/agents/workflow-expert.md` subagent for setup and submission
+The lifecycle is split between the `workflow-expert` subagent (workflow generation,
+resource check, submission, failure diagnosis) and **you** (live monitoring so the
+user sees real-time updates).
 
-Spawn the `/agents/workflow-expert.md` subagent. Ask it to **write workflow YAML if needed, check resources and submit the workflow only**. Do NOT ask it to monitor, poll status, or report results — that is your job.
+1. **Spawn the workflow-expert subagent for setup and submission.**
 
-Example prompt:
-> Create a workflow based on user's request, if any. Check resources first, then submit the workflow to an available resource pool. Return the workflow ID when done.
+   Ask it to **write workflow YAML if needed, check resources, and submit only**.
+   Do NOT ask it to monitor, poll status, or report results — that is your job.
 
-The subagent returns: workflow ID, pool name, and OSMO Web link.
+   Example prompt:
+   > Create a workflow based on user's request, if any. Check resources first,
+   > then submit the workflow to an available resource pool. Return the workflow
+   > ID when done.
 
-#### Step 2: Monitor the workflow inline (you do this — user sees live updates)
+   The subagent returns: workflow ID, pool name, and OSMO Web link.
 
-After getting the workflow ID, use the "Check Workflow Status" use case to
-poll and report. Repeat until a terminal state is reached.
+2. **Monitor the workflow inline (you do this — user sees live updates).**
 
-Report each state transition to the user:
-- `Status: SCHEDULING (queued 15s)`
-- `Workflow transitioned: SCHEDULING → RUNNING`
-- `Status: RUNNING (task "train" active, 2m elapsed)`
+   Use the "Check Workflow Status" use case to poll and report. Repeat until a
+   terminal state is reached. Adjust the polling interval based on how long you
+   expect the workflow to take — poll more frequently for short jobs (every 10-15s)
+   and less frequently for long training runs (every 30-60s). Report each state
+   transition to the user:
+   - `Status: SCHEDULING (queued 15s)`
+   - `Workflow transitioned: SCHEDULING → RUNNING`
+   - `Status: RUNNING (task "train" active, 2m elapsed)`
 
-#### Step 3: Handle the outcome
+3. **Handle the outcome.**
 
-**If COMPLETED:** Report results — workflow ID, OSMO Web link, output datasets.
-In the same completion message, ask: `Would you like me to download the output dataset now?`
-Then follow the COMPLETED handling in "Check Workflow Status".
+   **If COMPLETED:** Report results — workflow ID, OSMO Web link, output datasets.
+   Then follow Step 4 of "Check Workflow Status" (download offer + app creation).
 
-**If FAILED:** First, fetch logs using the log-fetching rule from "Check Workflow Status"
-Step 2 (1 task = inline, 2+ tasks = delegate to logs-reader subagents). Then resume the
-`workflow-expert` subagent (use the `resume` parameter with the agent ID from Step 1)
-and pass the logs summary: "Workflow <id> FAILED. Here is the logs summary: <summary>.
-Diagnose and fix." It returns a new workflow ID. Resume monitoring from Step 2. Max 3
-retries before asking the user for guidance.
+   **If FAILED:** First, fetch logs using the log-fetching rule from "Check Workflow
+   Status" Step 2 (1 task = inline, 2+ tasks = delegate to logs-reader subagents).
+   Then resume the `workflow-expert` subagent (use the `resume` parameter with the
+   agent ID from Step 1) and pass the logs summary: "Workflow <id> FAILED. Here is
+   the logs summary: <summary>. Diagnose and fix." It returns a new workflow ID.
+   Resume monitoring from Step 2. Max 3 retries before asking the user for guidance.
 
 ---
 

@@ -1,5 +1,7 @@
 # OSMO Logs Reader Agent
 
+> Spawn a general-purpose subagent and pass these instructions as the prompt.
+
 You are a subagent invoked by the main OSMO agent. Your sole job is to fetch
 and summarize logs for a specific workflow, then return a concise digest that
 the main agent can use without holding large raw logs in context.