feat: add worker initialization timing collection by yashaswikarnati · Pull Request #1873 · NVIDIA-NeMo/RL

yashaswikarnati · 2026-02-04T00:29:33Z

Add config flag and collection code to aggregate and save worker init timing to JSON

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added optional worker initialization timing collection to track performance metrics during system startup. When enabled through logger configuration, detailed timing data is automatically collected from all workers and exported to JSON files for analysis.
Tests
- Added comprehensive test suite for timer functionality including JSON persistence and metrics aggregation.

Add config flag and collection code to aggregate and save worker init timing to JSON

coderabbitai · 2026-02-04T00:37:43Z

📝 Walkthrough

Walkthrough

This pull request adds initialization timing collection infrastructure throughout the worker initialization pipeline. It introduces Timer utility methods for persistence and aggregation, instruments worker initialization in vLLM and Megatron components, enables collection across distributed worker groups, and integrates optional logging of aggregated timings in the GRPO algorithm.

Changes

Cohort / File(s)	Summary
Timer Utilities `nemo_rl/utils/timer.py`, `tests/unit/utils/test_timer.py`	Added `save_to_json()` method to persist timing metrics with metadata to JSON files, and `aggregate_max()` static method to aggregate timers across workers using a specified reduction operation. Includes comprehensive unit tests for JSON persistence and aggregation scenarios.
vLLM Worker Timing `nemo_rl/models/generation/vllm/vllm_worker.py`	Introduced Timer-based initialization tracking in BaseVllmGenerationWorker and VllmGenerationWorker. Wraps engine creation in a timer context and exposes `get_init_timing()` to retrieve aggregated timing metrics.
Megatron Setup Timing `nemo_rl/models/megatron/setup.py`	Added optional `init_timer` parameter to `setup_model_and_optimizer()` function. Initialization steps (megatron setup, model/optimizer creation, checkpoint loading) now execute conditionally within timer contexts when an init_timer is provided.
Megatron Policy Worker Timing `nemo_rl/models/policy/workers/megatron_policy_worker.py`	Instrumented MegatronPolicyWorker with Timer-based initialization timing. Captures module import duration and wraps key initialization phases (distributed setup, model import, setup_model_and_optimizer, reference model, finalize). Exposes `get_init_timing()` method and propagates timer to setup function.
Distributed Worker Group Collection `nemo_rl/distributed/worker_groups.py`	Added `collect_init_timing()` method to RayWorkerGroup to gather and aggregate initialization timings from all policy workers in parallel using Timer aggregation.
GRPO Algorithm Integration `nemo_rl/algorithms/grpo.py`	Integrated optional worker initialization timing collection in `initialize_generation_with_policy()`. When enabled via logger config, fetches aggregated timings from all workers, saves results to `worker_init_timing.json` in the configured log directory.
Logger Configuration `nemo_rl/utils/logger.py`	Added optional `collect_worker_init_timing` field to LoggerConfig TypedDict to control whether worker initialization timings are collected and persisted.

Sequence Diagram(s)

sequenceDiagram
    participant GRPO as GRPO Algorithm
    participant RayWG as RayWorkerGroup
    participant Worker as MegatronPolicyWorker
    participant Timer as Timer
    participant Logger as Logger
    participant FileSystem as FileSystem

    GRPO->>RayWG: collect_init_timing()
    RayWG->>Worker: get_init_timing.remote()
    Note over Worker: Returns cached<br/>init_timer
    Worker-->>RayWG: init_timer
    RayWG->>Timer: aggregate_max(timers)
    Timer-->>RayWG: aggregated timings dict
    RayWG-->>GRPO: aggregated timings
    
    alt logger.collect_worker_init_timing enabled
        GRPO->>Timer: save_to_json(filepath, metadata)
        Timer->>FileSystem: create directories
        Timer->>FileSystem: write JSON file
        FileSystem-->>Timer: success
        GRPO->>Logger: print confirmation
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

terrykong
yuki-97

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR makes major changes (210+ lines across 8 files with new public APIs) but PR description lacks documentation of test results, validation that the timing feature works correctly, or performance impact analysis.	Add to PR description: results from running new test suite, evidence of end-to-end timing collection validation, confirmation of no regressions, and notes on performance impact from instrumentation overhead.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add worker initialization timing collection' is fully related to the main changeset, which introduces timing instrumentation, aggregation, and persistence for worker initialization across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ft/worker-timing-2026-02-03

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

✨ 136.113.208.247/32 (new)
34.170.211.100/32
35.222.179.152/32

Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_rl/utils/logger.py (1)

77-90: ⚠️ Potential issue | 🟠 Major

Document collect_worker_init_timing config key and add YAML default.

Line 90 introduces a new LoggerConfig key without documenting its purpose, valid values, or recommended default. Additionally, exemplar YAML files under examples/configs/ do not include this key. Per coding guidelines, document the key's purpose and type (currently unclear; appears to control worker initialization timing collection), specify the recommended default (suggest false based on current usage), and add it to exemplar YAMLs like examples/configs/grpo_math_1B.yaml.

🤖 Fix all issues with AI agents

In `@nemo_rl/algorithms/grpo.py`:
- Around line 658-663: The code assumes every worker in
policy.worker_group.workers implements get_init_timing; change the collection to
first filter workers using hasattr(worker, "get_init_timing") and only call
get_init_timing.remote() on those that have it (e.g., build the list
comprehension from workers where hasattr is true), then call ray.get on that
list and assign to worker_timers; also handle the case where the filtered list
is empty by setting worker_timers to an empty list to avoid passing an
empty/invalid remote call list.

In `@nemo_rl/distributed/worker_groups.py`:
- Around line 1041-1065: The current broad except in collect_init_timing hides
Ray and typing failures; narrow it to specific errors (e.g.,
ray.exceptions.RayError and relevant subclasses like RayActorError/RayTaskError
plus TypeError/ValueError) around the ray.get/remote and Timer.aggregate_max
calls, log the caught exception (using self._logger.exception(...) if a logger
is available or logging.exception(...)) and then return {} as before; reference
collect_init_timing, worker.get_init_timing.remote(), ray.get(...), and
Timer.aggregate_max(...) when making the change.

In `@nemo_rl/models/generation/vllm/vllm_worker.py`:
- Around line 441-447: get_init_timing currently returns a dict from
self.init_timer.get_timing_metrics(reduction_op="sum") but upstream aggregation
expects Timer objects; change get_init_timing to return the Timer instance
(self.init_timer) instead of a dict so Timer.aggregate_max and related
aggregation in worker_groups.py/grpo.py work correctly. Locate the
get_init_timing method in vllm_worker.py and return self.init_timer (same
pattern as in megatron_policy_worker.py) so callers receive a Timer object
compatible with Timer.aggregate_max.

In `@nemo_rl/models/policy/workers/megatron_policy_worker.py`:
- Around line 15-19: Rename the module-level timing globals to use the G_ prefix
and upper snake_case (e.g., change _module_import_start_time to
G_MODULE_IMPORT_START_TIME) and apply the same convention to the other timing
globals referenced around lines 112-114 and 203-205; update every use of these
names within the module to the new identifiers to avoid breakage and ensure they
remain module-scoped globals.

In `@nemo_rl/utils/timer.py`:
- Around line 237-271: The JSON dump fails because get_timing_metrics returns
NumPy scalar types (e.g., np.float64) which json.dump cannot serialize; in
save_to_json convert all timing metric values to native Python types before
writing: locate save_to_json and after timing_metrics =
self.get_timing_metrics(...) walk the timing_metrics dict (and any nested
structures) and coerce NumPy floats to float(), NumPy ints/counts to int(), and
NumPy arrays/ndarrays to lists (or pick the appropriate scalar conversion) so
the resulting output dict contains only built-in types before calling json.dump.

🧹 Nitpick comments (1)

nemo_rl/algorithms/grpo.py (1)
673-675: Consider adding a public API to Timer for constructing from aggregated data.

Directly assigning to aggregated_timer._timers at line 674 bypasses the Timer class's public interface. While this pattern is used elsewhere in the codebase (e.g., megatron_policy_worker.py), a public factory method would improve encapsulation.

Consider adding a classmethod to Timer:
`@classmethod`
def from_aggregated(cls, timing_dict: dict) -> "Timer":
    """Create a Timer from pre-aggregated timing data."""
    timer = cls()
    timer._timers = {k: [v] for k, v in timing_dict.items()}
    return timer
Then replace lines 673–674 with:
aggregated_timer = Timer.from_aggregated(max_timing)

nemo_rl/algorithms/grpo.py

coderabbitai · 2026-02-04T00:37:46Z

nemo_rl/distributed/worker_groups.py

+        try:
+            # Check if workers support get_init_timing
+            if not self._workers:
+                return {}
+
+            # Collect timing from all workers
+            timing_futures = []
+            for worker in self._workers:
+                if hasattr(worker, "get_init_timing"):
+                    timing_futures.append(worker.get_init_timing.remote())
+
+            if not timing_futures:
+                return {}
+
+            # Get all timers
+            timers = ray.get(timing_futures)
+
+            # Aggregate using max across workers, sum within each worker
+            aggregated = Timer.aggregate_max(timers, reduction_op="sum")
+
+            return aggregated
+
+        except Exception:
+            # Return empty dict on any error
+            return {}


⚠️ Potential issue | 🟡 Minor

Narrow the exception handling in collect_init_timing.

Catching bare Exception hides Ray failures and makes debugging hard. Please limit to likely Ray/typing errors and log the exception.

🔧 Suggested adjustment

- except Exception: - # Return empty dict on any error - return {} + except (ray.exceptions.RayError, AttributeError, TypeError) as exc: + print(f"Error collecting init timing: {exc}") + return {}

As per coding guidelines, In try-except blocks, limit the except clause to the smallest set of errors possible.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

# Check if workers support get_init_timing

if not self._workers:

return {}

# Collect timing from all workers

timing_futures = []

for worker in self._workers:

if hasattr(worker, "get_init_timing"):

timing_futures.append(worker.get_init_timing.remote())

if not timing_futures:

return {}

# Get all timers

timers = ray.get(timing_futures)

# Aggregate using max across workers, sum within each worker

aggregated = Timer.aggregate_max(timers, reduction_op="sum")

return aggregated

except Exception:

# Return empty dict on any error

return {}

try:

# Check if workers support get_init_timing

if not self._workers:

return {}

# Collect timing from all workers

timing_futures = []

for worker in self._workers:

if hasattr(worker, "get_init_timing"):

timing_futures.append(worker.get_init_timing.remote())

if not timing_futures:

return {}

# Get all timers

timers = ray.get(timing_futures)

# Aggregate using max across workers, sum within each worker

aggregated = Timer.aggregate_max(timers, reduction_op="sum")

return aggregated

except (ray.exceptions.RayError, AttributeError, TypeError) as exc:

print(f"Error collecting init timing: {exc}")

return {}

🤖 Prompt for AI Agents

In `@nemo_rl/distributed/worker_groups.py` around lines 1041 - 1065, The current broad except in collect_init_timing hides Ray and typing failures; narrow it to specific errors (e.g., ray.exceptions.RayError and relevant subclasses like RayActorError/RayTaskError plus TypeError/ValueError) around the ray.get/remote and Timer.aggregate_max calls, log the caught exception (using self._logger.exception(...) if a logger is available or logging.exception(...)) and then return {} as before; reference collect_init_timing, worker.get_init_timing.remote(), ray.get(...), and Timer.aggregate_max(...) when making the change.

coderabbitai · 2026-02-04T00:37:46Z

nemo_rl/models/generation/vllm/vllm_worker.py

+    def get_init_timing(self) -> dict[str, float]:
+        """Get initialization timing metrics.
+
+        Returns:
+            Dictionary mapping timing labels to elapsed times in seconds
+        """
+        return self.init_timer.get_timing_metrics(reduction_op="sum")


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, check the current state of the method cat -n nemo_rl/models/generation/vllm/vllm_worker.py | sed -n '441,447p'

Repository: NVIDIA-NeMo/RL

Length of output: 376

🏁 Script executed:

# Search for all usages of get_init_timing rg -n "get_init_timing" -C 3

Repository: NVIDIA-NeMo/RL

Length of output: 2943

🏁 Script executed:

# Search for collect_init_timing and aggregation patterns rg -n "collect_init_timing|aggregate_max" -C 3

Repository: NVIDIA-NeMo/RL

Length of output: 6943

🏁 Script executed:

# Check the Timer class definition and its methods rg -n "class Timer" -C 5

Repository: NVIDIA-NeMo/RL

Length of output: 646

🏁 Script executed:

# Search for how init_timer is aggregated across workers rg -n "init_timer" -C 2

Repository: NVIDIA-NeMo/RL

Length of output: 6504

Return Timer object instead of dict from get_init_timing.

Line 447 returns a dict from get_timing_metrics(), but the aggregation pipeline in worker_groups.py and grpo.py expects Timer objects. The call to Timer.aggregate_max(timers, ...) will fail when receiving dicts instead of Timer instances. The megatron_policy_worker.py correctly returns the Timer object directly; vllm_worker.py should do the same.

Proposed fix

- def get_init_timing(self) -> dict[str, float]: - """Get initialization timing metrics. - - Returns: - Dictionary mapping timing labels to elapsed times in seconds - """ - return self.init_timer.get_timing_metrics(reduction_op="sum") + def get_init_timing(self) -> Timer: + """Return init timing for controller aggregation.""" + return self.init_timer

🤖 Prompt for AI Agents

In `@nemo_rl/models/generation/vllm/vllm_worker.py` around lines 441 - 447, get_init_timing currently returns a dict from self.init_timer.get_timing_metrics(reduction_op="sum") but upstream aggregation expects Timer objects; change get_init_timing to return the Timer instance (self.init_timer) instead of a dict so Timer.aggregate_max and related aggregation in worker_groups.py/grpo.py work correctly. Locate the get_init_timing method in vllm_worker.py and return self.init_timer (same pattern as in megatron_policy_worker.py) so callers receive a Timer object compatible with Timer.aggregate_max.

coderabbitai · 2026-02-04T00:37:46Z

nemo_rl/models/policy/workers/megatron_policy_worker.py

+# Measure module import time (import time first for measurement)
+import time
+
+_module_import_start_time = time.perf_counter()
+


⚠️ Potential issue | 🟡 Minor

Rename module-level timing globals to G_ upper snake_case.*

Module-level globals should use the G_ prefix and upper snake_case.

🔤 Suggested rename

-_module_import_start_time = time.perf_counter() +G_MODULE_IMPORT_START_TIME = time.perf_counter() ... -_module_import_duration = time.perf_counter() - _module_import_start_time +G_MODULE_IMPORT_DURATION = time.perf_counter() - G_MODULE_IMPORT_START_TIME ... - if "_module_import_duration" in globals(): - self.init_timer._timers["module_import"] = [_module_import_duration] + if "G_MODULE_IMPORT_DURATION" in globals(): + self.init_timer._timers["module_import"] = [G_MODULE_IMPORT_DURATION]

As per coding guidelines, Use upper snake_case with `G` prefix for global variables, e.g., `G_MY_GLOBAL`.
Also applies to: 112-114, 203-205

🤖 Prompt for AI Agents

In `@nemo_rl/models/policy/workers/megatron_policy_worker.py` around lines 15 - 19, Rename the module-level timing globals to use the G_ prefix and upper snake_case (e.g., change _module_import_start_time to G_MODULE_IMPORT_START_TIME) and apply the same convention to the other timing globals referenced around lines 112-114 and 203-205; update every use of these names within the module to the new identifiers to avoid breakage and ensure they remain module-scoped globals.

coderabbitai · 2026-02-04T00:37:46Z

nemo_rl/utils/timer.py

+    def save_to_json(
+        self,
+        filepath: Union[str, Path],
+        reduction_op: str = "sum",
+        metadata: Optional[dict] = None,
+    ) -> None:
+        """Save timing measurements to a JSON file.
+
+        Args:
+            filepath: Path where the JSON file will be saved
+            reduction_op: Reduction operation to apply to all timing measurements.
+                         Valid options are: "mean", "median", "min", "max", "std", "sum", "count"
+            metadata: Optional dictionary of metadata to include in the JSON file
+
+        Raises:
+            ValueError: If an invalid reduction operation is provided
+        """
+        filepath = Path(filepath)
+
+        # Get timing metrics with the specified reduction
+        timing_metrics = self.get_timing_metrics(reduction_op=reduction_op)
+
+        # Build the output dictionary
+        output = {
+            "timings": timing_metrics,
+            "reduction_op": reduction_op,
+        }
+
+        if metadata is not None:
+            output["metadata"] = metadata
+
+        # Write to JSON file
+        filepath.parent.mkdir(parents=True, exist_ok=True)
+        with open(filepath, "w") as f:
+            json.dump(output, f, indent=2)


⚠️ Potential issue | 🟠 Major

Convert NumPy scalars before JSON serialization.

get_timing_metrics() uses NumPy reductions, which return np.float64 values. json.dump can’t serialize these and will raise TypeError at runtime. Coerce to built-in types before dumping.

🛠️ Proposed fix

# Get timing metrics with the specified reduction timing_metrics = self.get_timing_metrics(reduction_op=reduction_op) + def _to_jsonable(value: object) -> object: + if isinstance(value, np.generic): + return value.item() + if isinstance(value, list): + return [_to_jsonable(v) for v in value] + return value + + timing_metrics = {k: _to_jsonable(v) for k, v in timing_metrics.items()} + # Build the output dictionary output = { "timings": timing_metrics, "reduction_op": reduction_op, }

🤖 Prompt for AI Agents

In `@nemo_rl/utils/timer.py` around lines 237 - 271, The JSON dump fails because get_timing_metrics returns NumPy scalar types (e.g., np.float64) which json.dump cannot serialize; in save_to_json convert all timing metric values to native Python types before writing: locate save_to_json and after timing_metrics = self.get_timing_metrics(...) walk the timing_metrics dict (and any nested structures) and coerce NumPy floats to float(), NumPy ints/counts to int(), and NumPy arrays/ndarrays to lists (or pick the appropriate scalar conversion) so the resulting output dict contains only built-in types before calling json.dump.

github-actions · 2026-02-04T06:05:38Z

ℹ️ File Consistency Check

Check based on commit: b678d6d (PR #1873 from ft/worker-timing-2026-02-03)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-02-04T06:33:33Z

ℹ️ File Consistency Check

Check based on commit: d45b2a5 (PR #1873 from ft/worker-timing-2026-02-03)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

…ng logic

github-actions · 2026-02-04T07:11:41Z

ℹ️ File Consistency Check

Check based on commit: 59c2193 (PR #1873 from ft/worker-timing-2026-02-03)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ykarnati <[email protected]>

github-actions · 2026-02-04T07:20:15Z

ℹ️ File Consistency Check

Check based on commit: 5233a32 (PR #1873 from ft/worker-timing-2026-02-03)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ykarnati <[email protected]>

github-actions · 2026-02-04T07:21:08Z

ℹ️ File Consistency Check

Check based on commit: a50defa (PR #1873 from ft/worker-timing-2026-02-03)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

feat: add worker initialization timing collection

18cd6a7

Add config flag and collection code to aggregate and save worker init timing to JSON

yashaswikarnati requested review from a team as code owners February 4, 2026 00:29

chore: remove unnecessary comments

40827fe

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

yashaswikarnati added 3 commits February 3, 2026 20:36

fix ai comments

ee4b0f4

add timing to dtensor workers

729d2b2

also save vllm worker timing

5a99799

finalize functional test

d45b2a5

yashaswikarnati force-pushed the ft/worker-timing-2026-02-03 branch from b678d6d to d45b2a5 Compare February 4, 2026 06:33

yashaswikarnati added CI:L0 Run doctests and unit tests CI:L1 Run doctests, unit tests, and functional tests and removed CI:L0 Run doctests and unit tests labels Feb 4, 2026

yashaswikarnati had a problem deploying to nemo-ci February 4, 2026 06:38 — with GitHub Actions Error

yashaswikarnati temporarily deployed to nemo-ci February 4, 2026 06:38 — with GitHub Actions Inactive

yashaswikarnati added 2 commits February 3, 2026 22:47

refactor: use worker_group.collect_init_timing() instead of duplicati…

d2a5631

…ng logic

refactor save to json method

59c2193

chore: empty commit for DCO

5233a32

Signed-off-by: ykarnati <[email protected]>

fix: remove extra blank lines

a50defa

Signed-off-by: ykarnati <[email protected]>

yashaswikarnati added the CI:L0 Run doctests and unit tests label Feb 4, 2026

yashaswikarnati temporarily deployed to nemo-ci February 4, 2026 07:30 — with GitHub Actions Inactive

yashaswikarnati temporarily deployed to nemo-ci February 4, 2026 09:47 — with GitHub Actions Inactive

Conversation

yashaswikarnati commented Feb 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 4, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Review ran into problems

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 4, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Feb 4, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Feb 4, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Feb 4, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Feb 4, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yashaswikarnati commented Feb 4, 2026 •

edited by coderabbitai bot

Loading