-
Notifications
You must be signed in to change notification settings - Fork 28
feat: ECS refactor #785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Glorf
wants to merge
8
commits into
main
Choose a base branch
from
mbien/ecs-refactor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
feat: ECS refactor #785
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
efd593f
ECS refactor
Glorf 575b916
Pass pk as value for sandbox
Glorf 12ccd41
Applied comments
Glorf 0e8944d
Fix tests
Glorf 1abc89a
Merge remote-tracking branch 'origin' into mbien/ecs-refactor
Glorf e586675
Updated docs
Glorf 4aba0c6
Merge remote-tracking branch 'origin' into mbien/ecs-refactor
Glorf de3f7a2
Post-review fixes
Glorf File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,55 +1,81 @@ | ||
| ``nemo_evaluator.sandbox`` | ||
| ====================================== | ||
|
|
||
| Sandbox implementations used by evaluation harnesses that need a tmux-like interactive session. | ||
| Sandbox implementations used by evaluation harnesses that need isolated | ||
| container environments for command execution, file transfer, and agent hosting. | ||
|
|
||
| This module is designed to keep dependencies **optional**: | ||
|
|
||
| - The ECS Fargate implementation only imports AWS SDKs (``boto3``/``botocore``) when actually used. | ||
| - Using the ECS sandbox also requires the AWS CLI (``aws``) and ``session-manager-plugin`` on the host. | ||
| - Transport is SSH-based; no AWS CLI or session-manager-plugin required on the host. | ||
|
|
||
| Usage (ECS Fargate) | ||
| ------------------- | ||
| Sandbox Protocol | ||
| ---------------- | ||
|
|
||
| Typical usage is: | ||
| All sandbox backends implement the :class:`~nemo_evaluator.sandbox.base.Sandbox` | ||
| protocol, so harnesses can be written backend-agnostically: | ||
|
|
||
| - configure :class:`~nemo_evaluator.sandbox.ecs_fargate.EcsFargateConfig` | ||
| - :meth:`~nemo_evaluator.sandbox.ecs_fargate.EcsFargateSandbox.spin_up` a sandbox context | ||
| - create an interactive :class:`~nemo_evaluator.sandbox.base.NemoSandboxSession` | ||
| - ``start()`` / ``stop()`` — lifecycle | ||
| - ``exec(command)`` — run a shell command | ||
| - ``upload(local_path, remote_path)`` / ``download(remote_path, local_path)`` — file transfer | ||
| - ``is_running`` — health check | ||
| - Context manager (``with sandbox: ...``) for automatic cleanup | ||
|
|
||
| Example:: | ||
| Usage (ECS Fargate — exec-server mode) | ||
| -------------------------------------- | ||
|
|
||
| from nemo_evaluator.sandbox import EcsFargateConfig, EcsFargateSandbox | ||
| For harnesses that drive execution from the orchestrator (e.g. terminal-bench):: | ||
|
|
||
| from nemo_evaluator.sandbox import EcsFargateConfig, EcsFargateSandbox, SshSidecarConfig | ||
|
|
||
| cfg = EcsFargateConfig( | ||
| region="us-west-2", | ||
| cluster="my-ecs-cluster", | ||
| task_definition="my-task-def:1", | ||
| container_name="eval", | ||
| subnets=["subnet-abc"], | ||
| security_groups=["sg-xyz"], | ||
| image_template="123456789.dkr.ecr.us-west-2.amazonaws.com/my-repo:{task_id}", | ||
| s3_bucket="my-staging-bucket", | ||
| ssh_sidecar=SshSidecarConfig( | ||
| public_key_secret_arn="arn:aws:secretsmanager:...:my-pubkey", | ||
| private_key_secret_arn="arn:aws:secretsmanager:...:my-privkey", | ||
| exec_server_port=19542, | ||
| ), | ||
| ) | ||
|
|
||
| with EcsFargateSandbox(cfg, task_id="task-001", run_id="run-001") as sandbox: | ||
| sandbox.start() | ||
| result = sandbox.exec("echo hello") | ||
| print(result.stdout) | ||
|
|
||
| Usage (ECS Fargate — agent-server mode) | ||
| --------------------------------------- | ||
|
|
||
| For harnesses that host an agent inside the container (e.g. openhands), | ||
| omit ``exec_server_port`` and configure reverse/forward tunnels:: | ||
|
|
||
| cfg = EcsFargateConfig( | ||
| ... | ||
| ssh_sidecar=SshSidecarConfig( | ||
| public_key_secret_arn="arn:aws:secretsmanager:...:my-pubkey", | ||
| private_key_secret_arn="arn:aws:secretsmanager:...:my-privkey", | ||
| target_url_env="MODEL_URL", | ||
| ), | ||
| ) | ||
|
|
||
| with EcsFargateSandbox.spin_up( | ||
| cfg=cfg, | ||
| task_id="task-001", | ||
| trial_name="trial-0001", | ||
| run_id="run-2026-01-12", | ||
| ) as sandbox: | ||
| session = sandbox.create_session("main") | ||
| session.send_keys(["echo hello", "Enter"], block=True) | ||
| print(session.capture_pane()) | ||
| with EcsFargateSandbox(cfg, task_id="task-001", run_id="run-001") as sandbox: | ||
| sandbox.start() | ||
| # The agent inside the container can now reach the model via the reverse tunnel. | ||
| # The orchestrator can reach the agent's API via sandbox.local_port. | ||
|
|
||
| Prerequisites / Notes | ||
| --------------------- | ||
|
|
||
| - The harness host must have **AWS CLI** and **session-manager-plugin** installed. | ||
| - If you use S3-based fallbacks (large uploads / long commands), configure ``s3_bucket``. | ||
| - SSH keys must be **pre-provisioned** in AWS Secrets Manager. | ||
| - If you use S3-based file staging (large uploads / downloads), configure ``s3_bucket``. | ||
| - Docker image building via AWS CodeBuild is available through :class:`~nemo_evaluator.sandbox.ecs_fargate.ImageBuilder`. | ||
|
|
||
| .. automodule:: nemo_evaluator.sandbox | ||
| :members: | ||
| :undoc-members: | ||
| :member-order: bysource | ||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Power Review]⚠️ Sandbox protocol claims universal contract but EcsFargateSandbox violates it in agent-server mode · 85% confidence
Per the knowledge document (Inconsistent Error Contract): 'A function documents that it raises ValueError on bad input but actually raises TypeError, or an HTTP endpoint documents 404 but returns 400.' The
Sandboxprotocol docstring says 'Minimal contract every sandbox backend must satisfy' and definesexec(),upload(),download(). However,EcsFargateSandboxraisesRuntimeErrorfor all three in agent-server mode (line 2227-2232). Callers programming to theSandboxprotocol cannot reliably call these methods.💡 Suggestion: Either split the protocol into
Sandbox(lifecycle only) andExecSandbox(Sandbox)(with exec/upload/download), or document clearly in the protocol docstring that exec/upload/download are optional and may raiseNotImplementedErrororRuntimeErrordepending on mode.