Conversation
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
WalkthroughThis PR introduces a new GymHarness abstraction framework for nemo-gym integration, including an abstract base class for benchmark harnesses, configuration generation utilities, subprocess management for the resource server, and corresponding unit tests across five new modules. Changes
Sequence DiagramsequenceDiagram
participant Client
participant ResourceServerProcess
participant ResourceServer
participant GymHarness
Client->>ResourceServerProcess: __init__(eval_type, framework_config)
ResourceServerProcess->>ResourceServerProcess: Extract harness_package & config
Client->>ResourceServerProcess: __enter__()
ResourceServerProcess->>ResourceServer: spawn subprocess with config
ResourceServer->>GymHarness: load Harness implementation
GymHarness->>GymHarness: get_dataset()
ResourceServer-->>ResourceServerProcess: ready (health check)
ResourceServerProcess-->>Client: return self with url
Client->>Client: generate_gym_config(resource_server_host, ...)
Client->>Client: write_gym_config(config, output_path)
Client->>ResourceServerProcess: __exit__()
ResourceServerProcess->>ResourceServer: terminate subprocess
ResourceServer-->>ResourceServerProcess: shutdown complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
packages/nemo-evaluator/tests/unit_tests/core/test_resource_server_process.py (1)
30-82: Add lifecycle tests for__enter__failure and cleanup paths.Current coverage is good for config parsing, but it misses subprocess startup-failure behavior (
terminate/timeout/kill), which is where the highest reliability risk sits.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nemo-evaluator/tests/unit_tests/core/test_resource_server_process.py` around lines 30 - 82, Add unit tests for ResourceServerProcess lifecycle to cover failure and cleanup paths: create tests that exercise ResourceServerProcess.__enter__ when the subprocess fails to start (simulate subprocess creation/startup raising or the process exiting immediately) and assert that ResourceServerProcess calls terminate/kill and enforces a timeout; also add tests for the normal __exit__/cleanup path to verify terminate is called and that a forced kill occurs if terminate times out. Use the ResourceServerProcess class and its __enter__, __exit__, terminate, and any internal subprocess handling (mock subprocess.Popen or the method that starts the process) to simulate failures and assert the expected calls and behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/nemo-evaluator/src/nemo_evaluator/core/resource_server_process.py`:
- Around line 82-83: The auto-selected port assignment (self.port = port or
_find_free_port(host)) is susceptible to a TOCTOU race: when port is not
provided, retry binding with a freshly chosen port if the server startup fails;
modify the startup logic in the ResourceServerProcess (the code paths that call
self.port and launch the server—e.g., the constructor assignment and the method
that actually binds/starts the server) to detect bind/start failures and, if the
original port was auto-assigned (port argument was None), loop a few times: pick
a new port via _find_free_port(host), set self.port, and retry startup with a
short backoff before finally raising the error; ensure this retry logic is used
in both the initial constructor/startup and the alternate startup path
referenced around lines 118–138.
- Around line 126-130: The current logger.info call logs the full spawn command
(cmd) and can expose sensitive data in self.harness_kwargs; before logging,
sanitize cmd by removing or redacting the value passed to
"--harness_kwargs_json" (or omit that flag entirely) to produce a safe_cmd, then
log safe_cmd instead of cmd; keep subprocess.Popen(cmd) unchanged so the process
still receives the real arguments.
- Around line 136-138: The startup failure path currently calls
self._process.terminate() and self._process.wait(timeout=5) without handling
subprocess.TimeoutExpired, which can leave the child running; update the failure
cleanup in the method that raises RuntimeError(f"Resource server failed to start
at {self.url}") to mirror the __exit__ pattern: call self._process.terminate(),
catch subprocess.TimeoutExpired from self._process.wait(timeout=5), then call
self._process.kill() and self._process.wait() in the except block before
re-raising the RuntimeError so the process cannot be orphaned (refer to
self._process, the RuntimeError raising location, and the __exit__ cleanup
logic).
In `@packages/nemo-evaluator/tests/unit_tests/core/test_gym_config_generator.py`:
- Around line 61-67: The test test_auto_generates_path leaves a temp YAML file
in /tmp; update it to clean up after itself by deleting the generated file
(os.remove(path)) at the end of the test or switch to using pytest's tmp_path
fixture and write_gym_config so the file is created in a managed temp directory;
reference test_auto_generates_path and write_gym_config to locate the code and
ensure the cleanup (or tmp_path usage) runs even on assertion failures (e.g., in
a finally block or via tmp_path).
---
Nitpick comments:
In
`@packages/nemo-evaluator/tests/unit_tests/core/test_resource_server_process.py`:
- Around line 30-82: Add unit tests for ResourceServerProcess lifecycle to cover
failure and cleanup paths: create tests that exercise
ResourceServerProcess.__enter__ when the subprocess fails to start (simulate
subprocess creation/startup raising or the process exiting immediately) and
assert that ResourceServerProcess calls terminate/kill and enforces a timeout;
also add tests for the normal __exit__/cleanup path to verify terminate is
called and that a forced kill occurs if terminate times out. Use the
ResourceServerProcess class and its __enter__, __exit__, terminate, and any
internal subprocess handling (mock subprocess.Popen or the method that starts
the process) to simulate failures and assert the expected calls and behavior.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
packages/nemo-evaluator/src/nemo_evaluator/__init__.pypackages/nemo-evaluator/src/nemo_evaluator/core/gym_config_generator.pypackages/nemo-evaluator/src/nemo_evaluator/core/resource_server_process.pypackages/nemo-evaluator/src/nemo_evaluator/gym_harness.pypackages/nemo-evaluator/tests/unit_tests/core/test_gym_config_generator.pypackages/nemo-evaluator/tests/unit_tests/core/test_resource_server_process.py
| self.port = port or _find_free_port(host) | ||
| self.data_dir = data_dir |
There was a problem hiding this comment.
Auto-selected port can race and cause flaky startup.
Choosing a port first and binding later in another process is a TOCTOU race; another process can claim that port before the resource server starts.
Consider retrying startup with a newly selected port when port was auto-assigned and startup fails.
Also applies to: 118-138
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/nemo-evaluator/src/nemo_evaluator/core/resource_server_process.py`
around lines 82 - 83, The auto-selected port assignment (self.port = port or
_find_free_port(host)) is susceptible to a TOCTOU race: when port is not
provided, retry binding with a freshly chosen port if the server startup fails;
modify the startup logic in the ResourceServerProcess (the code paths that call
self.port and launch the server—e.g., the constructor assignment and the method
that actually binds/starts the server) to detect bind/start failures and, if the
original port was auto-assigned (port argument was None), loop a few times: pick
a new port via _find_free_port(host), set self.port, and retry startup with a
short backoff before finally raising the error; ensure this retry logic is used
in both the initial constructor/startup and the alternate startup path
referenced around lines 118–138.
| if self.harness_kwargs: | ||
| cmd += ["--harness_kwargs_json", json.dumps(self.harness_kwargs)] | ||
|
|
||
| logger.info("Spawning resource server: %s", " ".join(cmd)) | ||
| self._process = subprocess.Popen(cmd) |
There was a problem hiding this comment.
Avoid logging the full spawn command with raw harness kwargs.
The command log can include --harness_kwargs_json and expose sensitive data (for example credentials embedded in URLs).
🔒 Proposed hardening
- logger.info("Spawning resource server: %s", " ".join(cmd))
+ logger.info(
+ "Spawning resource server (exe=%s, host=%s, port=%s, harness_package=%s, has_harness_kwargs=%s)",
+ exe,
+ self.host,
+ self.port,
+ self.harness_package,
+ bool(self.harness_kwargs),
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/nemo-evaluator/src/nemo_evaluator/core/resource_server_process.py`
around lines 126 - 130, The current logger.info call logs the full spawn command
(cmd) and can expose sensitive data in self.harness_kwargs; before logging,
sanitize cmd by removing or redacting the value passed to
"--harness_kwargs_json" (or omit that flag entirely) to produce a safe_cmd, then
log safe_cmd instead of cmd; keep subprocess.Popen(cmd) unchanged so the process
still receives the real arguments.
| self._process.terminate() | ||
| self._process.wait(timeout=5) | ||
| raise RuntimeError(f"Resource server failed to start at {self.url}") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the startup-failure cleanup path lacks TimeoutExpired handling
rg -n -C5 'self\._process\.terminate\(\)' packages/nemo-evaluator/src/nemo_evaluator/core/resource_server_process.pyRepository: NVIDIA-NeMo/Evaluator
Length of output: 798
🏁 Script executed:
#!/bin/bash
# Check imports in the file
head -50 packages/nemo-evaluator/src/nemo_evaluator/core/resource_server_process.pyRepository: NVIDIA-NeMo/Evaluator
Length of output: 1579
🏁 Script executed:
#!/bin/bash
# Look for TimeoutExpired handling anywhere in the file
rg -n 'TimeoutExpired' packages/nemo-evaluator/src/nemo_evaluator/core/resource_server_process.pyRepository: NVIDIA-NeMo/Evaluator
Length of output: 114
Handle TimeoutExpired in startup failure cleanup to prevent orphaned processes.
Lines 136–138 call wait(timeout=5) without catching subprocess.TimeoutExpired, which can leave the child process running if termination hangs. The __exit__ method (lines 143–147) already implements the correct pattern with force-kill; apply the same approach here.
Proposed fix
self._process.terminate()
- self._process.wait(timeout=5)
+ try:
+ self._process.wait(timeout=5)
+ except subprocess.TimeoutExpired:
+ logger.warning("Force-killing resource server (pid=%s)", self._process.pid)
+ self._process.kill()
+ self._process.wait()
raise RuntimeError(f"Resource server failed to start at {self.url}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self._process.terminate() | |
| self._process.wait(timeout=5) | |
| raise RuntimeError(f"Resource server failed to start at {self.url}") | |
| self._process.terminate() | |
| try: | |
| self._process.wait(timeout=5) | |
| except subprocess.TimeoutExpired: | |
| logger.warning("Force-killing resource server (pid=%s)", self._process.pid) | |
| self._process.kill() | |
| self._process.wait() | |
| raise RuntimeError(f"Resource server failed to start at {self.url}") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/nemo-evaluator/src/nemo_evaluator/core/resource_server_process.py`
around lines 136 - 138, The startup failure path currently calls
self._process.terminate() and self._process.wait(timeout=5) without handling
subprocess.TimeoutExpired, which can leave the child running; update the failure
cleanup in the method that raises RuntimeError(f"Resource server failed to start
at {self.url}") to mirror the __exit__ pattern: call self._process.terminate(),
catch subprocess.TimeoutExpired from self._process.wait(timeout=5), then call
self._process.kill() and self._process.wait() in the except block before
re-raising the RuntimeError so the process cannot be orphaned (refer to
self._process, the RuntimeError raising location, and the __exit__ cleanup
logic).
| def test_auto_generates_path(self): | ||
| config = {"test": "value"} | ||
| path = write_gym_config(config) | ||
| assert path.endswith(".yaml") | ||
| with open(path) as f: | ||
| loaded = yaml.safe_load(f) | ||
| assert loaded == {"test": "value"} |
There was a problem hiding this comment.
test_auto_generates_path should clean up the generated temp file.
This test leaves artifacts in /tmp, which can accumulate across runs.
🧹 Proposed fix
+import os
import yaml
@@
def test_auto_generates_path(self):
config = {"test": "value"}
path = write_gym_config(config)
- assert path.endswith(".yaml")
- with open(path) as f:
- loaded = yaml.safe_load(f)
- assert loaded == {"test": "value"}
+ try:
+ assert path.endswith(".yaml")
+ with open(path) as f:
+ loaded = yaml.safe_load(f)
+ assert loaded == {"test": "value"}
+ finally:
+ if os.path.exists(path):
+ os.remove(path)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_auto_generates_path(self): | |
| config = {"test": "value"} | |
| path = write_gym_config(config) | |
| assert path.endswith(".yaml") | |
| with open(path) as f: | |
| loaded = yaml.safe_load(f) | |
| assert loaded == {"test": "value"} | |
| import os | |
| import yaml | |
| def test_auto_generates_path(self): | |
| config = {"test": "value"} | |
| path = write_gym_config(config) | |
| try: | |
| assert path.endswith(".yaml") | |
| with open(path) as f: | |
| loaded = yaml.safe_load(f) | |
| assert loaded == {"test": "value"} | |
| finally: | |
| if os.path.exists(path): | |
| os.remove(path) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/nemo-evaluator/tests/unit_tests/core/test_gym_config_generator.py`
around lines 61 - 67, The test test_auto_generates_path leaves a temp YAML file
in /tmp; update it to clean up after itself by deleting the generated file
(os.remove(path)) at the end of the test or switch to using pytest's tmp_path
fixture and write_gym_config so the file is created in a managed temp directory;
reference test_auto_generates_path and write_gym_config to locate the code and
ensure the cleanup (or tmp_path usage) runs even on assertion failures (e.g., in
a finally block or via tmp_path).
Signed-off-by: Michal Bien <mbien@nvidia.com>
d3552df to
5492b11
Compare
Summary by CodeRabbit
New Features
Tests