Skip to content

Kubernetes BatchSandbox lifecycle may report "Allocated" after execd health checks already pass #505

@ninan-nn

Description

@ninan-nn

Summary

When using the Kubernetes runtime with the batchsandbox workload provider, a sandbox can already be usable through execd health checks while the lifecycle API still reports status.state = "Allocated" instead of "Running".

Observed behavior

In our real Kubernetes E2E flow:

  1. Sandbox.create() / CodeInterpreter.create() succeeds
  2. execd health checks already pass
  3. sandbox endpoints are reachable and usable
  4. but GET /sandboxes/{id} may still return:
{
  "status": {
    "state": "Allocated"
  }
}

This creates a mismatch between "sandbox is already usable" and "lifecycle state is not Running yet".

Why this is problematic

From the public lifecycle schema and SDK model docs, the documented states are:

  • Pending
  • Running
  • Pausing
  • Paused
  • Stopping
  • Terminated
  • Failed

Allocated does not appear to be part of the documented public lifecycle contract, but it is observable from the Kubernetes runtime implementation.

As a result, clients and E2E tests that correctly expect Running after readiness/health checks may still observe Allocated.

Expected behavior

Kubernetes runtime should avoid surfacing Allocated to clients once the sandbox is already considered usable/ready, and return Running instead.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions