Conversation
Signed-off-by: Michal Bien <mbien@nvidia.com>
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
WalkthroughThis refactoring replaces the sandbox module's older abstract base classes (NemoEvaluatorSandbox, NemoSandboxCommand, NemoSandboxSession) with a simpler Protocol-based Sandbox abstraction and an ExecResult dataclass. Associated error classes are removed, while new utility types from ecs_fargate are newly exported. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/nemo-evaluator/src/nemo_evaluator/sandbox/base.py`:
- Line 26: Replace the use of typing.Self with typing_extensions.Self to support
Python 3.10: update the import statement that currently reads "from typing
import Protocol, Self" so that Protocol comes from typing and Self is imported
from typing_extensions (e.g., import Protocol from typing and import Self from
typing_extensions); also ensure the symbol Self is used unchanged elsewhere
(e.g., in any classes or function signatures referencing Self such as those in
this module) and add typing_extensions to project dependencies if not already
present.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/nemo-evaluator/src/nemo_evaluator/sandbox/__init__.pypackages/nemo-evaluator/src/nemo_evaluator/sandbox/base.pypackages/nemo-evaluator/src/nemo_evaluator/sandbox/ecs_fargate.py
| @abstractmethod | ||
| def stop(self) -> None: | ||
| raise NotImplementedError | ||
| from typing import Protocol, Self |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Python version constraints in packaging/config files =="
fd -HI 'pyproject.toml|setup.cfg|setup.py|tox.ini|\.python-version|\.tool-versions|Pipfile' | while read -r f; do
echo "--- $f ---"
rg -n "requires-python|python_requires|python_version|target-version|3\.10|3\.11|3\.12" "$f" || true
done
echo
echo "== Self imports across repository =="
rg -n "from typing import .*\\bSelf\\b|typing\\.Self|from typing_extensions import .*\\bSelf\\b" || trueRepository: NVIDIA-NeMo/Evaluator
Length of output: 667
Fix Python 3.10 compatibility: use typing_extensions.Self instead of typing.Self.
The package declares support for Python 3.10+ (requires-python = ">=3.10,<3.14"), but typing.Self is only available in Python 3.11+. This import will fail at runtime on Python 3.10. Use typing_extensions.Self instead to maintain compatibility.
Required fix
-from typing import Protocol, Self
+from typing import Protocol
+from typing_extensions import Self📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from typing import Protocol, Self | |
| from typing import Protocol | |
| from typing_extensions import Self |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/nemo-evaluator/src/nemo_evaluator/sandbox/base.py` at line 26,
Replace the use of typing.Self with typing_extensions.Self to support Python
3.10: update the import statement that currently reads "from typing import
Protocol, Self" so that Protocol comes from typing and Self is imported from
typing_extensions (e.g., import Protocol from typing and import Self from
typing_extensions); also ensure the symbol Self is used unchanged elsewhere
(e.g., in any classes or function signatures referencing Self such as those in
this module) and add typing_extensions to project dependencies if not already
present.
There was a problem hiding this comment.
Power Review
Reviewed 3 files, left 25 suggestions: 4 errors, 18 warnings, 3 suggestions.
Dimensions Analyzed
| Dimension | 🔴 | 💡 | |
|---|---|---|---|
| anti_patterns | 0 | 3 | 1 |
| architecture | 0 | 5 | 1 |
| hallucination-detection | 1 | 1 | 0 |
| patterns | 3 | 5 | 0 |
| security-risks | 0 | 3 | 0 |
| testing-quality | 0 | 1 | 1 |
| Total | 4 | 18 | 3 |
New logic lands in ecs refactor space,
Our agents checked each file and trace.
Yet automated eyes can overlook a flaw,
So review these notes — that’s the golden law.
Automated review by Power Review — findings are advisory.
packages/nemo-evaluator/src/nemo_evaluator/sandbox/ecs_fargate.py
Outdated
Show resolved
Hide resolved
packages/nemo-evaluator/src/nemo_evaluator/sandbox/ecs_fargate.py
Outdated
Show resolved
Hide resolved
packages/nemo-evaluator/src/nemo_evaluator/sandbox/ecs_fargate.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Michal Bien <mbien@nvidia.com>
Signed-off-by: Michal Bien <mbien@nvidia.com>
|
/ok to test 12ccd41 |
|
/ok to test 1abc89a |
packages/nemo-evaluator/tests/unit_tests/sandbox/test_sandbox_module.py
Outdated
Show resolved
Hide resolved
packages/nemo-evaluator/src/nemo_evaluator/sandbox/ecs_fargate.py
Outdated
Show resolved
Hide resolved
| # ===================================================================== | ||
|
|
||
|
|
||
| def _coerce_list(value: Any, name: str) -> list[str]: |
There was a problem hiding this comment.
name is not needed here at all, raised value will have it's stack trace and taken to extreme, every operator should have name of the variable passed to it :D
x = add_two_integers(y, z, "y", "z")
packages/nemo-evaluator/src/nemo_evaluator/sandbox/ecs_fargate.py
Outdated
Show resolved
Hide resolved
packages/nemo-evaluator/src/nemo_evaluator/sandbox/ecs_fargate.py
Outdated
Show resolved
Hide resolved
| raise | ||
| attempt += 1 | ||
| if 0 < max_retries <= attempt: | ||
| log.error( |
There was a problem hiding this comment.
suggestion: the codebase uses structlog, ask the agent to be good citizen here
packages/nemo-evaluator/src/nemo_evaluator/sandbox/ecs_fargate.py
Outdated
Show resolved
Hide resolved
| code = "" | ||
| if hasattr(exc, "response"): | ||
| code = (exc.response.get("Error") or {}).get("Code", "") # type: ignore[union-attr] | ||
| return code in _RETRYABLE_CODES or any(m in msg for m in _RETRYABLE_MESSAGES) |
There was a problem hiding this comment.
bug: _RETRYABLE_MESSAGES above is not lowercased whily the msg is lowercased in line 326, this is a bug
Signed-off-by: Michal Bien <mbien@nvidia.com>
Signed-off-by: Michal Bien <mbien@nvidia.com>
|
/ok to test de3f7a2 |
pribalta
left a comment
There was a problem hiding this comment.
Power Review
Reviewed 5 files, left 26 suggestions: 1 error, 20 warnings, 5 suggestions.
Dimensions Analyzed
| Dimension | 🔴 | 💡 | |
|---|---|---|---|
| api-contracts | 0 | 2 | 1 |
| boundary-conditions | 0 | 5 | 1 |
| concurrency | 1 | 2 | 1 |
| data-integrity | 0 | 3 | 0 |
| error-handling | 0 | 2 | 1 |
| logic-correctness | 0 | 5 | 1 |
| resource-lifecycle | 0 | 1 | 0 |
| Total | 1 | 20 | 5 |
New logic lands in ecs refactor space,
Our agents checked each file and trace.
Yet automated eyes can overlook a flaw,
So review these notes — that’s the golden law.
⏱️ Reviewed in 4.6 min (est. manual review: ~60 min, 13x faster)
Automated review by Power Review — findings are advisory. Learn more about how it works.
| raise ValueError("container_dir is required") | ||
| if isinstance(paths, Path): | ||
| paths = [paths] | ||
| def _emergency_cleanup() -> None: |
There was a problem hiding this comment.
[Power Review] 🔴 Self-deadlock: _emergency_cleanup holds non-reentrant lock then re-enters it via stop() · 97% confidence
The knowledge document flags deadlock-prone lock ordering as an ERROR (CWE-362). _cleanup_lock is a non-reentrant threading.Lock() (line 1352). _emergency_cleanup() acquires _cleanup_lock (line 1359), then calls sb.stop() (line 1362), which calls _unregister_from_cleanup() (line 1490), which tries to acquire _cleanup_lock again (line 2287). Since threading.Lock is non-reentrant, the same thread will deadlock on itself. This means the atexit handler will hang forever, preventing graceful process exit and leaving orphaned ECS tasks running.
💡 Suggestion: Either (a) change _cleanup_lock = threading.RLock() to allow reentrant acquisition, or (b) refactor _emergency_cleanup to call sb._cleanup() directly instead of sb.stop(), bypassing _unregister_from_cleanup() since all sandboxes are being cleaned up anyway and the dict is about to be discarded.
| ) | ||
| raise EcsExecError( | ||
| f"ECS Exec failed: rc={effective_rc}\nOUTPUT:\n{text_out}" | ||
| cls._build_semaphore = threading.Semaphore(cfg.build_parallelism) |
There was a problem hiding this comment.
[Power Review]
Missing Input Validation (CWE-20): build_parallelism from user config is passed directly to threading.Semaphore() without a lower-bound check. Semaphore(0) starts with an internal counter of 0, so the subsequent .acquire() at line 1161 will block forever, causing a deadlock. Negative values also produce undefined/broken behavior. The knowledge document states: 'validate shape, type, and range at the entry point; reject or sanitise before use.'
💡 Suggestion: Add validation in EcsFargateConfig.from_dict() or at the point of use: build_parallelism = max(1, int(raw.get('build_parallelism', 50))). Alternatively, validate in ensure_image_built before creating the semaphore.
| try: | ||
| ecr.describe_images(repositoryName=repo_name, imageIds=[{"imageTag": tag}]) | ||
| return True | ||
| except ClientError: |
There was a problem hiding this comment.
[Power Review]
Per the knowledge document (Over-Broad Exception Catch — WARNING): catching a parent type when only a specific error is expected masks unrelated bugs. image_exists_in_ecr catches all ClientError and returns False, but only ImageNotFoundException (or similar 404 response) means the image is absent. Throttling errors (ThrottlingException), access-denied errors, or network issues would also return False, causing unnecessary image rebuilds and potentially hiding permission problems.
💡 Suggestion: Inspect e.response['Error']['Code'] and only return False for image-not-found codes (e.g., ImageNotFoundException). Re-raise or propagate other ClientError subtypes.
| ) | ||
| time.sleep(sleep_sec) | ||
| time.sleep(delay) | ||
| raise RuntimeError("register_task_definition failed after max retries") |
There was a problem hiding this comment.
[Power Review]
Per the knowledge document's 'Unreachable or Dead Branch' rule: the raise RuntimeError at line 1944 can never be reached. The for-loop iterates range(1, max_retries + 1). On each iteration, either: (a) success → return arn, (b) non-retryable error → raise RuntimeError(...), or (c) last-attempt retryable error → attempt >= max_retries is True → raise RuntimeError(...). The same issue exists at lines 2017-2019 in _run_task.
💡 Suggestion: Remove the unreachable raise after the loop, or restructure using a while-loop pattern if the intent is to have a fallback error.
| continue | ||
| else: | ||
| time.sleep(delay) | ||
|
|
||
| raise RuntimeError( |
There was a problem hiding this comment.
[Power Review]
Per the knowledge document's 'Unreachable or Dead Branch' rule: the raise RuntimeError at line 2017 is unreachable. In the last loop iteration, either the inner _retry_with_backoff succeeds (then capacity failures or success is handled), or an exception is raised (re-raised by the except clause since attempt >= cfg.run_task_max_retries), or capacity failures trigger a raise (since attempt >= cfg.run_task_max_retries). The loop cannot exit normally.
💡 Suggestion: Remove the unreachable raise after the loop. If defensive coding is desired, add a comment explaining it's a safety net, or restructure with a while-loop.
| "ThrottlingException" in combined | ||
| or "TooManyRequestsException" in combined | ||
| or "Rate exceeded" in combined | ||
| def health(self) -> bool: |
There was a problem hiding this comment.
[Power Review] 💡 Tautological is not None check on bytes return · 85% confidence
Per the knowledge document's 'Redundant or Tautological Condition' rule: ExecClient.health() checks self._request(...) is not None. The _request method always returns bytes on success and raises on failure — it never returns None. The is not None comparison is always True when no exception occurs, making the expression tautological.
💡 Suggestion: Simplify to: self._request(...); return True inside the try block, relying on the except to return False.
| resp = self._ecs.describe_task_definition( | ||
| taskDefinition=cfg.task_definition | ||
| ) | ||
| base = resp["taskDefinition"] |
There was a problem hiding this comment.
[Power Review] 💡 Bare except Exception when describing base task definition hides root cause · 80% confidence
Per the knowledge document (Over-Broad Exception Catch — WARNING): catching Exception when only a specific error is expected masks unrelated bugs. At line 1785, the code catches any Exception from describe_task_definition and falls through to base = None, causing the code to register from scratch. This hides throttling, credential, or network errors that should be surfaced rather than silently triggering a fallback code path that may fail differently.
💡 Suggestion: Catch only ClientError and optionally check for the specific error code (e.g., task definition not found). Let unexpected exceptions propagate.
| ) | ||
| def _wait_for_running(self) -> None: | ||
| cfg = self._cfg | ||
| start = time.time() |
There was a problem hiding this comment.
[Power Review] 💡 time.time() used for elapsed-time checks instead of time.monotonic() · 80% confidence
Date/Time Boundary Error: _wait_for_running and _wait_for_ssh_ready use time.time() for deadline/elapsed calculations. time.time() is subject to NTP adjustments and system clock changes (including backwards jumps), which can cause premature timeouts or infinite waits. The knowledge document cautions against time-related edge cases that cause silent wrong results.
💡 Suggestion: Replace time.time() with time.monotonic() in _wait_for_running (line 2023) and _wait_for_ssh_ready (line 2120) to be immune to clock adjustments.
| subnets = _coerce_list(raw.get("subnets")) | ||
| sgs = _coerce_list(raw.get("security_groups")) | ||
| has_sidecar = isinstance(raw.get("ssh_sidecar"), Mapping) | ||
| assign_public_ip = bool(raw.get("assign_public_ip", False)) or has_sidecar |
There was a problem hiding this comment.
[Power Review] 💡 Undocumented side effect: from_dict silently forces assign_public_ip=True when SSH sidecar is present · 80% confidence
Per the knowledge document (Undocumented Side Effect): 'A function performs a side effect that is not obvious from its name or signature.' EcsFargateConfig.from_dict() silently overrides assign_public_ip to True when an ssh_sidecar mapping is present, even if the caller explicitly sets assign_public_ip: false. This could surprise users in VPC-only environments where public IPs are prohibited by policy.
💡 Suggestion: Either document this override behavior clearly in the SshSidecarConfig docstring, or raise an error/warning when the user explicitly sets assign_public_ip=False while providing an SSH sidecar config, rather than silently overriding.
| """ | ||
| if self._previous_buffer is None: | ||
| return None | ||
| def _free_port() -> int: |
There was a problem hiding this comment.
[Power Review] 💡 TOCTOU in _free_port(): port freed before actual use · 75% confidence
The knowledge document flags TOCTOU as an ERROR (CWE-367). _free_port() binds to port 0 to get an ephemeral port, then closes the socket (exiting the with block) before the port is actually used by the SSH tunnel. Another process can claim the port in the gap. The SSH tunnel retry logic in SshTunnel.open() (line 480: re-allocates on each attempt) partially mitigates this, but it's still a known race window.
💡 Suggestion: Consider using SO_REUSEADDR on the socket and passing the bound socket directly, or accept the TOCTOU and document that retries handle port collisions (which the current code already does).
Summary by CodeRabbit