Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/real-e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ jobs:
network_mode = "bridge"
[storage]
allowed_host_paths = ["/tmp/opensandbox-e2e"]
[renew_intent]
enabled = true
min_interval_seconds = 60
EOF

./scripts/python-e2e.sh
Expand Down
2 changes: 1 addition & 1 deletion components/ingress/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ wss://ingress.opensandbox.io/my-sandbox/8080/ws

When enabled, the ingress publishes **renew-intent** events to a Redis list on each proxied request (after resolving the sandbox). The OpenSandbox server consumes these events and may extend sandbox expiration for sandboxes that opted in at creation time. See [OSEP-0009](https://github.com/alibaba/opensandbox/blob/main/oseps/0009-auto-renew-sandbox-on-ingress-access.md) for the full design.

**Requirements:** The server must have auto-renew and Redis consumer enabled; the sandbox must be created with `extensions["auto_renew_on_access"]="true"`. This feature is best-effort and disabled by default.
**Requirements:** The server must have `renew_intent` (and Redis consumer for ingress mode) enabled; the sandbox must opt in via `extensions["access.renew.extend.seconds"]` (decimal integer string between **300** and **86400** seconds, see OSEP-0009). This feature is best-effort and disabled by default.

| Flag | Default | Description |
|------|---------|-------------|
Expand Down
76 changes: 37 additions & 39 deletions oseps/0009-auto-renew-sandbox-on-ingress-access.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ title: Auto-Renew Sandbox on Ingress Access
authors:
- "@Pangjiping"
creation-date: 2026-03-15
last-updated: 2026-03-19
status: implementing
last-updated: 2026-03-23
status: implemented
---

# OSEP-0009: Auto-Renew Sandbox on Ingress Access
Expand Down Expand Up @@ -74,7 +74,7 @@ An access-driven renewal mechanism is needed, but it must be strongly rate-contr
- The implementation must work with existing lifecycle API and runtime providers.
- Reverse proxy traffic must be the only trigger source for this proposal.
- Auto-renew must be disabled unless all three conditions are met:
- server supports and enables auto-renew-on-access,
- server supports and enables `renew_intent`,
- ingress supports and enables renew-intent signaling (for ingress mode),
- sandbox creation request explicitly opts in via `extensions`.
- Renewal requests must be bounded by deduplication and throttling controls.
Expand All @@ -88,7 +88,7 @@ Add an "access renew controller" that converts proxy access signals into control

- In server proxy mode, the server path handling proxied traffic submits local renew intents and performs internal renewal calls.
- In ingress gateway mode, ingress publishes renew intents into Redis; OpenSandbox server consumes and executes controlled renewals.
- Both modes share the same renewal gate logic: opt-in check, eligibility window, cooldown, and per-sandbox in-flight deduplication.
- Both modes share the same renewal gate logic: opt-in check, sandbox state, server-side validity for each renew attempt, cooldown, and per-sandbox in-flight deduplication.

At a high level, access traffic indicates activity, but only eligible events produce actual `renew-expiration` operations.

Expand All @@ -103,10 +103,10 @@ At a high level, access traffic indicates activity, but only eligible events pro

| Risk | Mitigation |
| --- | --- |
| Renewal storms under high ingress QPS | Multi-stage gating: renew-window check + cooldown + in-flight dedupe |
| Renewal storms under high ingress QPS | Multi-stage gating: validity checks + cooldown + in-flight dedupe |
| Duplicate renewals across server replicas | Redis lock keys for distributed dedupe in ingress mode; local dedupe in server proxy path |
| Redis backlog growth in traffic spikes | Queue TTL, bounded consumer concurrency, and drop-on-overload policy |
| False negatives (active sandbox not renewed) | Configurable renew window and cooldown; metrics/alerts for missed renew opportunities |
| False negatives (active sandbox not renewed) | Server-side eligibility rules and cooldown; metrics/alerts for missed renew opportunities |
| Added operational complexity | Feature flag rollout, default-off mode, and explicit docs/runbooks |

## Design Details
Expand All @@ -133,41 +133,41 @@ Explicitly unsupported:
This feature uses explicit "three-party handshake" activation.

1. **Server-side capability switch**
- `server.auto_renew_on_access.enabled = true` must be set (stored under `ServerConfig`).
- `renew_intent.enabled = true` must be set (top-level TOML section `[renew_intent]`, model field on root `AppConfig`).
2. **Ingress-side capability switch** (ingress mode only)
- ingress must be configured to publish renew-intents (`server.auto_renew_on_access.redis.enabled = true` and ingress integration enabled).
- ingress must be configured to publish renew-intents (`renew_intent.redis.enabled = true` and ingress integration enabled).
3. **Sandbox-level opt-in and duration**
- sandbox must declare in `CreateSandboxRequest.extensions` how long each automatic renewal extends expiration (see below). Presence of a valid value opts the sandbox in.

If any condition is missing, access events are ignored for renewal.

Given current API schema (`extensions: Dict[str, str]`), this OSEP proposes:

- `extensions["access.renew.extend.seconds"]` = positive integer string (e.g. `"1800"`)
- `extensions["access.renew.extend.seconds"]` = decimal integer **string** in the inclusive range **300–86400** seconds (**5 minutes** to **24 hours**), e.g. `"1800"`.

**Meaning:** When auto-renew on access is triggered for this sandbox, each renewal extends expiration by this many seconds. The key thus both opts the sandbox in and defines the per-renewal extension duration.

**Behavior rules:**

- Missing key or invalid value (non-positive integer string) means no auto-renew on access for that sandbox.
- Valid value (e.g. `"1800"`) enables auto-renew subject to policy gating; each successful renewal uses `new_expires_at = now + (value of access.renew.extend.seconds)`.
- Invalid values are rejected at sandbox creation time with 4xx validation error.
- Missing key means no renew-on-access for that sandbox.
- If the key is present, the value must parse as an integer in **300–86400**; otherwise the create request fails with **400** (validated in the HTTP API layer via `validate_extensions` in `src/extensions/validation.py` before the runtime service runs).
- Valid value enables auto-renew subject to policy gating; each successful renewal uses `new_expires_at = now + (value of access.renew.extend.seconds)`.

### Control Strategy to Prevent Renewal Storms

Both modes share the same strict control policy. An access event triggers renewal only when all checks pass:

1. **Opt-in check**: sandbox has a valid positive `access.renew.extend.seconds` in extensions.
1. **Opt-in check**: sandbox has `access.renew.extend.seconds` in extensions within **300–86400** (validated at creation).
2. **Sandbox state check**: sandbox must be `Running`.
3. **Renew window check**: remaining TTL must be below `before_expiration_seconds`.
3. **Validity check**: server decides whether the renewal attempt should proceed (e.g. `new_expires_at` meaningfully extends current expiration, lifecycle rules). There is **no** separate configurable “remaining TTL must be below N seconds” knob in server config.
4. **Cooldown check**: no successful renewal for this sandbox within `min_interval_seconds`.
5. **In-flight dedupe**: at most one renewal task per sandbox at a time.

If any check fails, the event is acknowledged and dropped without a renewal call.

Renew target time:

- `new_expires_at = now + (value of extensions["access.renew.extend.seconds"])`; server may enforce a cap or default.
- `new_expires_at = now + (value of extensions["access.renew.extend.seconds"])`; the extension duration is taken only from the sandbox `extensions` (no server-side override or default for this value).
- must also satisfy `new_expires_at > current_expires_at` before calling renew API

This guarantees bounded renewal frequency even for very hot sandboxes.
Expand Down Expand Up @@ -270,46 +270,44 @@ Producer (ingress):
Consumer (server):

- One or more workers block with `BRPOP opensandbox:renew:intent <timeout>`.
- On pop: parse payload; if `now - observed_at > event_ttl_seconds`, drop and continue.
- Acquire lock: `SET opensandbox:renew:lock:{sandbox_id} <value> NX EX lock_ttl_seconds`.
- If lock acquired: run gate checks (opt-in, state, window, cooldown) and maybe renew; then lock expires by TTL.
- On pop: parse payload; if the intent is older than a short implementation-defined max age (vs `observed_at`), drop and continue.
- Acquire lock: `SET opensandbox:renew:lock:{sandbox_id} <value> NX EX <ttl>` using a short implementation-defined lock TTL.
- If lock acquired: run gate checks (opt-in, state, validity, cooldown) and maybe renew; then lock expires by TTL.
- If lock not acquired: treat as in-flight dedupe, drop.
- No ack or requeue: if the worker crashes after pop, that intent is lost (best-effort).

Notes:

- Lock TTL must be short and greater than the renew critical section.
- Lock TTL and intent staleness thresholds are fixed in code (not Redis config); lock TTL must be short and greater than the renew critical section.
- Implementations must use Redis List; this LPUSH/BRPOP + lock flow is the only specified processing model.

### Configuration

Use `server` configuration namespace; no independent top-level config block is required:
Use the root config file: lifecycle API settings stay under `[server]`; renew-on-access is a **separate top-level section** `[renew_intent]` (not nested under `[server]`), alongside `[runtime]`, `[docker]`, etc.

```toml
[server]
auto_renew_on_access.enabled = false
auto_renew_on_access.before_expiration_seconds = 300
auto_renew_on_access.extension_seconds = 1800
auto_renew_on_access.min_interval_seconds = 60

# auto-detected by request path:
# - server-proxy path uses local trigger
# - ingress path uses redis trigger

auto_renew_on_access.redis.enabled = false
auto_renew_on_access.redis.url = "redis://127.0.0.1:6379/0"
auto_renew_on_access.redis.queue_key = "opensandbox:renew:intent"
auto_renew_on_access.redis.lock_ttl_seconds = 10
auto_renew_on_access.redis.event_ttl_seconds = 30
auto_renew_on_access.redis.consumer_concurrency = 8
# ... host, port, etc.

# Auto-detected by request path:
# - server-proxy path uses local trigger (no Redis required)
# - ingress path uses Redis consumer when renew_intent.redis is enabled

[renew_intent]
enabled = false
min_interval_seconds = 60
redis.enabled = false
redis.dsn = "redis://127.0.0.1:6379/0"
redis.queue_key = "opensandbox:renew:intent"
redis.consumer_concurrency = 8
```

Configuration rules:

- `server.auto_renew_on_access.enabled=false` means feature fully disabled.
- `renew_intent.enabled=false` means feature fully disabled.
- Ingress path renewal requires Redis block enabled and reachable on the server; the **ingress component** uses its own config (e.g. CLI flags: `--renew-intent-enabled`, `--renew-intent-redis-dsn`, `--renew-intent-queue-key`, `--renew-intent-queue-max-len`, `--renew-intent-min-interval`) to connect to Redis and publish intents. Queue key and default list name should match what the server consumer expects (e.g. `opensandbox:renew:intent`).
- Server proxy path can run without Redis.
- Feature is applied per sandbox only when `extensions["access.renew.extend.seconds"]` is present and a valid positive integer string.
- Per-renewal extension duration is **not** a server setting: it comes only from sandbox `extensions["access.renew.extend.seconds"]` (set at creation to **300–86400** seconds or creation fails with **400**). Omit the key to disable renew-on-access for that sandbox.
- Docker runtime direct mode remains unsupported regardless of this config.

Create request example:
Expand All @@ -329,7 +327,7 @@ Create request example:

- **Unit Tests**
- Extension validation for auto-renew opt-in keys and values
- Renew eligibility function (window/cooldown/state checks)
- Renew eligibility function (validity/cooldown/state checks)
- In-flight dedupe behavior under concurrent signals
- Renew target time calculation and monotonicity checks
- **Integration Tests (Server Proxy)**
Expand Down Expand Up @@ -369,5 +367,5 @@ Success criteria:
2. Enable in server proxy path for canary validation.
3. Enable ingress + Redis path progressively.
- Rollback:
- Disable `server.auto_renew_on_access.enabled` (and `server.auto_renew_on_access.redis.enabled` for ingress mode).
- Disable `renew_intent.enabled` (and `renew_intent.redis.enabled` for ingress mode).
- Existing manual renewal flow remains unchanged.
2 changes: 1 addition & 1 deletion oseps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ This is the complete list of OpenSandbox Enhancement Proposals:
| [OSEP-0006](0006-developer-console.md) | Developer Console for Sandbox Operations | implementable | 2026-03-06 |
| [OSEP-0007](0007-fast-sandbox-runtime-support.md) | Fast Sandbox Runtime Support | provisional | 2026-02-08 |
| [OSEP-0008](0008-pause-resume-rootfs-snapshot.md) | Pause and Resume via Rootfs Snapshot | draft | 2026-03-13 |
| [OSEP-0009](0009-auto-renew-sandbox-on-ingress-access.md) | Auto-Renew Sandbox on Ingress Access | implementing | 2026-03-18 |
| [OSEP-0009](0009-auto-renew-sandbox-on-ingress-access.md) | Auto-Renew Sandbox on Ingress Access | implemented | 2026-03-23 |
31 changes: 31 additions & 0 deletions server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@ The returned endpoint is rewritten to the server proxy route:
Reference runtime compose file:
- `server/docker-compose.example.yaml`

For **experimental** lifecycle options (e.g. auto-renew on access), see [Experimental features](#experimental-features) (after [Configuration reference](#configuration-reference)).

**Sandbox TTL configuration**

- `timeout` requests must be at least 60 seconds.
Expand Down Expand Up @@ -564,6 +566,35 @@ curl -X DELETE \
| `DOCKER_HOST` | Docker daemon URL (e.g., `unix:///var/run/docker.sock`) |
| `PENDING_FAILURE_TTL` | TTL for failed pending sandboxes in seconds (default: 3600) |

## Experimental features

Optional **🧪 experimental** capabilities; **off by default** in `server/example.config.toml` and `example.config.*.toml`. Check release notes before production.

### Auto-renew on access

Extends sandbox TTL when access is observed (via the lifecycle **server proxy** and/or **ingress**). Architecture, data flow, and tuning are in **[OSEP-0009](../oseps/0009-auto-renew-sandbox-on-ingress-access.md)**.

**Server on/off**

| Goal | What to do |
|------|------------|
| **Off (default)** | Keep `[renew_intent] enabled = false` in `~/.sandbox.toml` (see `example.config.toml`). |
| **On** | Set `[renew_intent] enabled = true`. For **ingress + Redis** mode, set `redis.enabled = true` and `redis.dsn` in the same `[renew_intent]` table (see OSEP-0009). |
| **Other keys** | `min_interval_seconds`, `queue_key`, `consumer_concurrency` — see OSEP-0009 and `[renew_intent]` in `example.config.toml`. |

**Per sandbox**

On **create**, set `extensions["access.renew.extend.seconds"]` to a string integer between **300** and **86400** (seconds). Omit the key to opt that sandbox out of renew-on-access (or leave renew_intent disabled globally).

**Clients (SDK / HTTP)**

- **Use the lifecycle server as proxy** so traffic goes to `/v1/sandboxes/{id}/proxy/{port}/...`:
- **REST**: request endpoints with `use_server_proxy=true`, e.g. `GET /v1/sandboxes/{id}/endpoints/{port}?use_server_proxy=true`.
- **SDK**: `ConnectionConfig(use_server_proxy=True)` or `ConnectionConfigSync(use_server_proxy=True)` (see SDK docs for `use_server_proxy`).
- **Ingress / gateway** path: deploy and route per OSEP-0009; clients use the gateway as usual.

**Further reading**: [OSEP-0009](../oseps/0009-auto-renew-sandbox-on-ingress-access.md); sample keys under `[renew_intent]` in `server/example.config.toml`.

## Development

### Code quality
Expand Down
31 changes: 31 additions & 0 deletions server/README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@ curl -H "OPEN-SANDBOX-API-KEY: your-secret-api-key" \
可参考 Compose 运行示例:
- `server/docker-compose.example.yaml`

**实验性**生命周期能力(例如按访问自动续期)见文末 [实验性功能](#实验性功能) 一节(位于 [配置参考](#配置参考) 之后)。

**安全加固(适用于所有 Docker 模式)**
```toml
[docker]
Expand Down Expand Up @@ -539,6 +541,35 @@ curl -X DELETE \
| `DOCKER_HOST` | Docker 守护进程 URL(例如 `unix:///var/run/docker.sock`|
| `PENDING_FAILURE_TTL` | 失败的待处理沙箱的 TTL(秒,默认:3600)|

## 实验性功能

以下为**可选****🧪 实验性**能力;在 `server/example.config.toml` 与各 `example.config.*.toml`**默认关闭**。生产启用前请阅读 **[OSEP-0009](../oseps/0009-auto-renew-sandbox-on-ingress-access.md)** 与发版说明。

### 按访问自动续期

在观测到访问时延长沙箱 TTL(经 Lifecycle **服务端代理** 和/或 **Ingress**)。设计、数据流与调参见 **[OSEP-0009](../oseps/0009-auto-renew-sandbox-on-ingress-access.md)**

**服务端开关**

| 目的 | 操作 |
|------|------|
| **关闭(默认)** | `~/.sandbox.toml` 中保持 `[renew_intent] enabled = false`(见 `example.config.zh.toml`)。 |
| **开启** | 设置 `[renew_intent] enabled = true`。若使用 **Ingress + Redis** 模式,在同一 `[renew_intent]` 表中设置 `redis.enabled = true``redis.dsn`(见 OSEP)。 |
| **其它配置项** | `min_interval_seconds``queue_key``consumer_concurrency` 等见 OSEP 与 `example.config.zh.toml``[renew_intent]`|

**按沙箱接入**

**创建**沙箱时在 `extensions` 中设置 `access.renew.extend.seconds`,值为 **300~86400****字符串**整数(秒)。不设该键(或未开 renew_intent)则该沙箱不按访问续期。

**客户端(SDK / HTTP)**

- **走 Lifecycle 服务端代理**,使请求经过 `/v1/sandboxes/{id}/proxy/{port}/...`
- **REST**:获取端点时加 `use_server_proxy=true`,例如 `GET /v1/sandboxes/{id}/endpoints/{port}?use_server_proxy=true`
- **SDK**`ConnectionConfig(use_server_proxy=True)``ConnectionConfigSync(use_server_proxy=True)`(详见 SDK 文档中的 `use_server_proxy`)。
- **Ingress / 网关** 模式:按 OSEP 部署网关与路由,客户端按网关方式访问即可。

**延伸阅读**[OSEP-0009](../oseps/0009-auto-renew-sandbox-on-ingress-access.md);配置样例见 `server/example.config.zh.toml``[renew_intent]`

## 开发

### 代码质量
Expand Down
9 changes: 9 additions & 0 deletions server/example.config.k8s.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@ port = 8080
log_level = "INFO"
# api_key = "your-secret-api-key" # Optional: Uncomment to enable API key authentication

# 🧪 [EXPERIMENTAL] Renew-on-access. Off by default — see server/README.md.
[renew_intent]
enabled = false
min_interval_seconds = 60
redis.enabled = false
# redis.dsn = "redis://127.0.0.1:6379/0"
redis.queue_key = "opensandbox:renew:intent"
redis.consumer_concurrency = 8

[runtime]
type = "kubernetes"
execd_image = "opensandbox/execd:v1.0.7"
Expand Down
9 changes: 9 additions & 0 deletions server/example.config.k8s.zh.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@ port = 8080
log_level = "INFO"
# api_key = "your-secret-api-key" # Optional: Uncomment to enable API key authentication

# 🧪 [EXPERIMENTAL] 按访问续期。默认关闭 — 见 server/README_zh.md。
[renew_intent]
enabled = false
min_interval_seconds = 60
redis.enabled = false
# redis.dsn = "redis://127.0.0.1:6379/0"
redis.queue_key = "opensandbox:renew:intent"
redis.consumer_concurrency = 8

[runtime]
type = "kubernetes"
execd_image = "sandbox-registry.cn-zhangjiakou.cr.aliyuncs.com/opensandbox/execd:v1.0.7"
Expand Down
9 changes: 9 additions & 0 deletions server/example.config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,15 @@ log_level = "INFO"
# Maximum TTL for sandboxes that specify timeout. Comment out this line to disable the upper bound.
max_sandbox_timeout_seconds = 86400

# 🧪 [EXPERIMENTAL] Renew-on-access (OSEP-0009). Off by default — see server/README.md.
[renew_intent]
enabled = false
min_interval_seconds = 60
redis.enabled = false
# redis.dsn = "redis://127.0.0.1:6379/0"
redis.queue_key = "opensandbox:renew:intent"
redis.consumer_concurrency = 8

[runtime]
# Runtime selection (docker | kubernetes)
# -----------------------------------------------------------------
Expand Down
Loading
Loading