Add nemo-skills-core subpackage for lightweight installs by gwarmstrong · Pull Request #1229 · NVIDIA-NeMo/Skills

gwarmstrong · 2026-02-10T22:14:36Z

Adds a lightweight nemo-skills-core subpackage (core/ subdirectory)
with only inference, evaluation, and tool calling deps. Default
pip install nemo-skills is unchanged (installs everything).

Changes

core/pyproject.toml + core/requirements.txt: New subpackage installable via pip install ./core or git URL with #subdirectory=core. Single source of truth for core deps, referenced by both core and root pyproject.toml.
nemo_skills/pipeline/__init__.py: Import guard using importlib.metadata -- importing pipeline modules with only core installed raises a clear ImportError instead of a cryptic ModuleNotFoundError.
nemo_skills/_cli_stub.py: Stub ns CLI entry point for core-only installs that prints a helpful message.
nemo_skills/evaluation/evaluator/__init__.py: Lazy evaluator registry using string paths instead of eager imports, so core-only installs don't fail on benchmark-specific deps (faiss, func_timeout, etc.).
nemo_skills/dataset/utils.py + nemo_skills/pipeline/dataset.py: Moved cluster-dependent dataset logic into pipeline module to keep core free of nemo_run imports.
requirements/pipeline.txt: New requirements file for pipeline-only deps (nemo_run, typer, etc.).
.github/workflows/tests.yml: Install uv in CI for use with testing installation.
Docs: Added installation guide, updated CONTRIBUTING.md with dependency placement guidance.

Summary by CodeRabbit

New Features
- Lightweight core package with a minimal CLI message and local-first dataset loading (cluster workflows remain available via the pipeline).
- Evaluators now load lazily on demand to reduce startup cost.
Documentation
- Expanded installation & dependency guide explaining full vs core installs, extras, and the Core/Pipeline boundary.
Chores
- Packaging split into core/pipeline/dev dependency groups; added pipeline dependencies and core requirements.
- CI workflow adds a UV setup step.

Signed-off-by: George Armstrong <georgea@nvidia.com>

greptile-apps

_{13 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-13T01:01:20Z

nemo_skills/evaluation/evaluator/__init__.py

+    _EVALUATOR_MAP_PATHS[eval_type] = None
+    _resolved_evaluator_map[eval_type] = eval_fn


Setting _EVALUATOR_MAP_PATHS[eval_type] = None creates a fragile state. If _resolved_evaluator_map is ever cleared or doesn't contain the eval_type, _get_evaluator_fn will call _resolve(None) and crash.

Suggested change

_EVALUATOR_MAP_PATHS[eval_type] = None

_resolved_evaluator_map[eval_type] = eval_fn

# Store function directly, bypassing the lazy resolution path

_resolved_evaluator_map[eval_type] = eval_fn

Good catch, switched to a "<dynamically-registered>" sentinel to be safe.

Actually, reverting this back to None. The _resolved_evaluator_map cache is internal and never cleared, so this scenario cannot happen in practice. Per our project guidelines: "Don't add error handling, fallbacks, or validation for scenarios that can't happen." If the cache were somehow corrupted, crashing is the correct signal.

coderabbitai · 2026-02-13T01:03:48Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a lightweight core package and requirements, documents installation and the Core/Pipeline dependency boundary, reorganizes optional extras in packaging, adds a CI step for uv, implements lazy evaluator resolution, refactors dataset loading to prefer local modules and delegates cluster handling to a new pipeline dataset module, and adds a runtime guard for pipeline imports.

Changes

Cohort / File(s)	Summary
CI Workflow `\.github/workflows/tests.yml`	Adds an "Install uv" step (`astral-sh/setup-uv@v4`) before dependency installation.
Docs & Contribution Guidance `CONTRIBUTING.md`, `docs/basics/installation.md`, `mkdocs.yml`	Adds installation docs and a new "Respect the Core / Pipeline dependency boundary" section; registers new docs nav entry.
Core package metadata & deps `core/pyproject.toml`, `core/requirements.txt`	Adds pyproject for `nemo-skills-core` and a core requirements file listing core runtime dependencies.
Root packaging & extras `pyproject.toml`, `requirements/pipeline.txt`, `requirements/main.txt`	Splits optional-dependencies into `core`, `pipeline`, `dev`; adds pipeline deps (nemo_run, wandb, typer, constrained click) and adds `rich` to main requirements.
CLI stub `nemo_skills/_cli_stub.py`	Adds minimal CLI stub `main()` that prints a lightweight-mode message and exits.
Dataset loading (core) `nemo_skills/dataset/utils.py`	Reworks `get_dataset_module` to prefer local imports, remove prior cluster helpers, add extra_datasets handling, and emit DeprecationWarning + delegate to pipeline loader when `cluster_config` is provided.
Dataset loading (pipeline) `nemo_skills/pipeline/dataset.py`	Adds pipeline-aware dataset loader supporting local, local-unmounted, and remote cluster downloads; exposes `get_dataset_module` for cluster-enabled resolution.
Evaluator lazy loading `nemo_skills/evaluation/evaluator/__init__.py`	Replaces eager evaluator imports with path-based maps and runtime import resolution (lazy loading) for evaluator functions/classes; adds resolver utilities and caches.
Pipeline import guard `nemo_skills/pipeline/__init__.py`	Adds runtime guard using package metadata to raise ImportError when pipeline is imported but only core is installed.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Core as nemo_skills.dataset.utils
    participant Pipeline as nemo_skills.pipeline.dataset
    participant Cluster

    rect rgba(100, 150, 200, 0.5)
    Note over User,Core: Local-only flow (default)
    User->>Core: get_dataset_module(dataset, data_dir=None)
    Core->>Core: import from nemo_skills.dataset or local path
    Core-->>User: return dataset module
    end

    rect rgba(200, 100, 150, 0.5)
    Note over User,Cluster: Cluster flow (deprecated in Core)
    User->>Core: get_dataset_module(dataset, cluster_config=...)
    Core->>Core: emit DeprecationWarning
    Core->>Pipeline: delegate get_dataset_module(...)
    Pipeline->>Cluster: fetch / download cluster module (remote)
    Cluster-->>Pipeline: module content / init.py
    Pipeline->>Core: imported module
    Core-->>User: return dataset module
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Evaluation on OJBench #848 — touches the evaluator registry; directly related to the lazy/path-based evaluator changes.
Fix run.Script refactor #1133 — large pipeline scripting/orchestration refactor; related to pipeline dataset and orchestration changes.
Revert "Use run.Script for generate pipeline (#1052)" #1125 — pipeline orchestration and script/command abstraction changes; related to pipeline-layer refactors in this PR.

Suggested reviewers

Kipok
activatedgeek

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely summarizes the main change: adding a lightweight nemo-skills-core subpackage for installations that don't require the full pipeline dependencies.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch georgea/refactor-separable-pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@nemo_skills/evaluation/evaluator/__init__.py`:
- Around line 113-117: The error message incorrectly labels
`_EVALUATOR_MAP_PATHS.keys()` as "All supported types" when it only contains
function-based evaluators; update the ValueError text in the raise block (the
code that references eval_type) to clearly distinguish class-based vs
function-based types by either listing both maps together (combine
`_EVALUATOR_CLASS_MAP_PATHS.keys()` and `_EVALUATOR_MAP_PATHS.keys()`) or
renaming the second label to "Function-based evaluator types" so users see
accurate descriptions of `_EVALUATOR_CLASS_MAP_PATHS` and
`_EVALUATOR_MAP_PATHS`.
- Around line 106-107: register_evaluator currently stores None into
_EVALUATOR_MAP_PATHS[eval_type], which will cause AttributeError when code later
iterates or calls _resolve expecting a path string; change register_evaluator so
it stores a sentinel string (e.g. "<dynamic>") into
_EVALUATOR_MAP_PATHS[eval_type] instead of None, and ensure any
resolution/display logic in _resolve/_get_evaluator_fn treats that sentinel as a
dynamic entry (or filters it out) so rsplit is only called on real path strings;
update references to _EVALUATOR_MAP_PATHS, register_evaluator,
_resolved_evaluator_map, _get_evaluator_fn, and _resolve accordingly.
- Line 137: Remove the leftover debug print statement print(f"evaluator:
{evaluator}") from the module (it should not be in production code); either
delete that line or replace it with an appropriate logger.debug call using the
module logger (e.g., logger.debug("evaluator: %s", evaluator)) so diagnostics
use the configured logging system and not stdout—locate the print by searching
for the exact string and update in the __init__ module where the evaluator
variable is in scope.
- Around line 93-94: Remove the debug print statement print(f"evaluator:
{evaluator}") from the module so it no longer emits debug output; locate the
temporary print in the evaluator initialization block near where EVALUATOR_MAP
and EVALUATOR_CLASS_MAP are set and delete that single line, leaving the maps
and the helper functions (_get_evaluator_fn, _get_evaluator_cls, evaluate,
get_evaluator_class) intact so iteration via EVALUATOR_MAP/EVALUATOR_CLASS_MAP
still works per the documented design.

In `@nemo_skills/pipeline/dataset.py`:
- Around line 60-62: The check uses cluster_config.get("executor") which masks a
missing-key error; change it to access the key directly
(cluster_config["executor"]) so missing executor raises immediately, and keep
the logic that if cluster_config is None or cluster_config["executor"] in (None,
"none") then return _get_local_dataset_module(dataset, data_dir); update any
related code paths that assume executor exists (e.g., the code around
get_unmounted_path in nemo_skills/pipeline/utils/mounts.py) to rely on the same
direct-access semantics to fail fast on misconfiguration.

🧹 Nitpick comments (6)

CONTRIBUTING.md (1)
56-59: Fenced code block missing language specifier.

Minor nit from markdownlint — adding a language (e.g., text) would silence MD040.
Proposed fix
-```
+```text
 Pipeline can import from Core.
 Core CANNOT import from Pipeline.
-```
+```
core/requirements.txt (1)

17-27: Section label "math evaluation" is misleading — several packages below it aren't math-specific.

mcp, numpy, openai, requests, rich, tqdm, and transformers are general-purpose dependencies, not math evaluation specific. Consider either reorganizing sections or using a broader label like # --- general / shared ---.
nemo_skills/pipeline/dataset.py (3)
39-51: Imported module outlives its backing file.

import_from_path is called inside a TemporaryDirectory context manager. Once the with block exits, the downloaded init.py is deleted, but the module object (and its __file__ attribute) still references the now-removed path. This works at runtime because CPython caches the compiled bytecode in memory, but it can cause confusing errors if any downstream code inspects module.__file__ or attempts a reload.

Consider moving the temp directory lifecycle to the caller or keeping it alive longer if module introspection is needed.

44-50: Chain the re-raised exception for clearer tracebacks.

Per the static analysis hint (B904), raise ... from err preserves the original traceback context.
Proposed fix
         try:
             cluster_download_file(cluster_config, cluster_dataset_path, tmp_path)
-        except FileNotFoundError:
-            raise RuntimeError(
+        except FileNotFoundError as err:
+            raise RuntimeError(
                 f"Init file {mounted_path} not found on the cluster. "
                 f"Please check the dataset name you're using. Did you forget to run prepare data commands?"
-            )
+            ) from err
109-113: Chain the re-raised RuntimeError for clearer tracebacks.

Same B904 pattern — add from err to preserve the original ModuleNotFoundError context.
Proposed fix
-        except ModuleNotFoundError:
-            raise RuntimeError(
+        except ModuleNotFoundError as err:
+            raise RuntimeError(
                 f"Dataset {dataset} not found in any of the searched locations: "
                 f"{data_dir if data_dir else 'nemo_skills.dataset'}, {extra_datasets}"
-            )
+            ) from err
nemo_skills/dataset/utils.py (1)
116-135: Chain re-raised exceptions for clearer tracebacks.

Same pattern as flagged in pipeline/dataset.py — the raise RuntimeError(...) statements at Lines 120 and 126 inside except clauses should use from to preserve the original exception context.
Proposed fix
     except ModuleNotFoundError:
         dataset = dataset.replace(".", "/")
         extra_datasets = extra_datasets or os.environ.get("NEMO_SKILLS_EXTRA_DATASETS")
         if extra_datasets is None:
-            raise RuntimeError(f"Dataset {dataset} not found in {data_dir if data_dir else 'nemo_skills.dataset'}")
+            raise RuntimeError(
+                f"Dataset {dataset} not found in {data_dir if data_dir else 'nemo_skills.dataset'}"
+            ) from None
         if extra_datasets_type == ExtraDatasetType.local or extra_datasets_type is None:
             with add_to_path(extra_datasets):
                 try:
                     dataset_module = importlib.import_module(dataset)
-                except ModuleNotFoundError:
-                    raise RuntimeError(
+                except ModuleNotFoundError as err:
+                    raise RuntimeError(
                         f"Dataset {dataset} not found in any of the searched locations: "
                         f"{data_dir if data_dir else 'nemo_skills.dataset'}, {extra_datasets}"
-                    )
+                    ) from err

nemo_skills/evaluation/evaluator/__init__.py

nemo_skills/pipeline/dataset.py

greptile-apps

_{13 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-13T01:36:48Z

nemo_skills/evaluation/evaluator/__init__.py

+    _EVALUATOR_MAP_PATHS[eval_type] = None
+    _resolved_evaluator_map[eval_type] = eval_fn


Setting _EVALUATOR_MAP_PATHS[eval_type] = None is fragile. If _resolved_evaluator_map gets cleared or doesn't contain eval_type, _get_evaluator_fn will call _resolve(None) and crash with ValueError: not enough values to unpack.

The current implementation works only because the function is immediately added to _resolved_evaluator_map, but this implicit dependency is error-prone. Consider either:

Not setting _EVALUATOR_MAP_PATHS[eval_type] at all (just use _resolved_evaluator_map)

Setting it to a sentinel string that provides a better error message if accidentally resolved

Good catch, switched to a "<dynamically-registered>" sentinel to be safe.

Actually, reverting this back to None. The _resolved_evaluator_map cache is internal and never cleared, so this scenario cannot happen in practice. Per our project guidelines: "Don't add error handling, fallbacks, or validation for scenarios that can't happen." If the cache were somehow corrupted, crashing is the correct signal.

greptile-apps · 2026-02-13T01:36:50Z

Additional Comments (1)

requirements/main.txt
rich is in core/requirements.txt but missing from requirements/main.txt, violating the rule that "all core and pipeline deps must also appear in requirements/main.txt"

rich

Signed-off-by: George Armstrong <georgea@nvidia.com>

greptile-apps

_{14 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@nemo_skills/pipeline/dataset.py`:
- Around line 54-62: The function _get_default_dataset_module currently drops
extra_datasets and extra_datasets_type when cluster_config is None by delegating
to _get_local_dataset_module(dataset, data_dir); update
_get_default_dataset_module to forward extra_datasets and extra_datasets_type
into the local call (e.g., call _get_local_dataset_module(dataset, data_dir,
extra_datasets=..., extra_datasets_type=...)) so get_dataset_module's outer
ModuleNotFoundError path remains reachable and callers' extra_datasets are
honored; ensure the function signature for _get_default_dataset_module accepts
the extra_* params and that _get_local_dataset_module is invoked with those
parameters.

🧹 Nitpick comments (3)

nemo_skills/pipeline/dataset.py (2)
39-51: Chain exception context with from when re-raising.

Static analysis (B904) correctly flags that re-raising inside except without from loses the original traceback context. This applies here and at lines 109-113.
Proposed fix
-        except FileNotFoundError:
-            raise RuntimeError(
+        except FileNotFoundError as exc:
+            raise RuntimeError(
                 f"Init file {mounted_path} not found on the cluster. "
                 f"Please check the dataset name you're using. Did you forget to run prepare data commands?"
-            )
+            ) from exc
91-113: Chain the inner RuntimeError re-raise with from.

Same B904 issue as above — preserve context for debugging.
Proposed fix
-        except ModuleNotFoundError:
-            raise RuntimeError(
+        except ModuleNotFoundError as exc:
+            raise RuntimeError(
                 f"Dataset {dataset} not found in any of the searched locations: "
                 f"{data_dir if data_dir else 'nemo_skills.dataset'}, {extra_datasets}"
-            )
+            ) from exc
nemo_skills/evaluation/evaluator/__init__.py (1)

93-94: Semantic change in EVALUATOR_MAP / EVALUATOR_CLASS_MAP values.

These aliases now expose dotted-path strings instead of resolved callables/classes. Any downstream code (external plugins, scripts) that iterates .values() expecting callables will break silently. The comment on lines 90-92 documents the intent, and the repo itself only uses these for key enumeration, so this is safe internally. Just worth noting for external consumers if this is a public API.

nemo_skills/pipeline/dataset.py

gwarmstrong · 2026-02-13T21:48:16Z

Added rich to requirements/main.txt, thanks.

Signed-off-by: George Armstrong <georgea@nvidia.com>

Kipok · 2026-02-14T05:42:00Z

core/requirements.txt

+# No cluster orchestration deps (nemo_run, typer, etc.)
+
+
+# --- code evaluation ---


are you sure this covers all benchmarks? Generally, we should move to keeping this reqs really simple and move most benchmark-specific requirements to install at runtime, but for now probably we might need some more packages here? E.g. datasets is almost certainly needed and then other benchmark specific things, like sacrebleu, etc.

I revisited the separation. This should contain all the reqs not needed for cluster orchestration now.

docs/basics/installation.md

Kipok · 2026-02-14T05:52:58Z

requirements/pipeline.txt

+nemo-evaluator-launcher<0.1.47
+nemo_run @ git+https://github.com/NVIDIA-NeMo/Run
+typer >= 0.13
+wandb


this is actually a core dependency, it's being used in summarize-results, which is required for core functionality. Currently summarize-results is kind of in a weird half-pipeline state, but we should fix it to cleanly separate it into pipeline and non-pipeline components via #779 (comment)

Kipok · 2026-02-14T05:54:29Z

CONTRIBUTING.md

+| CLI commands, cluster orchestration, experiment tracking | `requirements/pipeline.txt` |
+| Everything else (dataset-specific deps, benchmark-specific packages) | `requirements/main.txt` only |
+
+Dependencies in `core/requirements.txt` should be things that a typical `GenerationTask` run with PythonTool would need. Dataset-specific or benchmark-specific packages (e.g., `faiss-cpu`, `sacrebleu`, `func-timeout`) go only in `requirements/main.txt`.


this part I don't fully understand - I think benchmark-specific packages should go to core for now as otherwise the code will fail when those benchmarks are used e.g. in evaluator. Eventually we should migrate to jit install, but it's not done yet, so I'd put those into core

Fair. My original scope was pretty PythonTool specific, but I think we can come up with something that makes a little more sense in terms of aligning the core code with core dependencies.

yeah it's now in core and there is a clearer description of what's in pipeline vs core

Kipok · 2026-02-14T05:55:57Z

CONTRIBUTING.md

+
+Dependencies in `core/requirements.txt` should be things that a typical `GenerationTask` run with PythonTool would need. Dataset-specific or benchmark-specific packages (e.g., `faiss-cpu`, `sacrebleu`, `func-timeout`) go only in `requirements/main.txt`.
+
+All core and pipeline deps must also appear in `requirements/main.txt` (the monolithic file used for default installs).


can we not link multiple requirements listed in pyproject.toml? We duplicate?

we should be able to do that. It should be implemented that way now--I updated this at one point so it links against the file rather than duplicating. Will fix.

CONTRIBUTING.md

Kipok · 2026-02-14T05:58:31Z

CONTRIBUTING.md

+**When writing new core code:**
+
+- If you need something from `nemo_skills.pipeline`, your code probably belongs in pipeline, not core. Move it.
+- If you have a function that works locally but *also* needs a cluster variant, put the local version in core and a cluster-aware wrapper in `nemo_skills/pipeline/` (see `pipeline/dataset.py` for the pattern).


I actually think that if we have a case like this, it means we need to redesign something. Ideally separation should be clean, and we shouldn't need to duplicate functionality. E.g. the dataset module part is a bit messy and there is probably a way to do it better, such that there is a pipeline level that only manages pulling from cluster and then there is a local level that always assumes things are present locally and is being called inside pipeline directly

Makes sense, updated the docs here to reflect that and made the implementation more consistent with the guidance here/there.

Per review feedback: all benchmark-specific packages should go to core for now since JIT install is not yet implemented. Previously only PythonTool-specific deps were in core while benchmark deps like datasets, sacrebleu, faiss-cpu, etc. were only in main.txt. This led to an inconsistent boundary where math grader deps were in core but BFCL deps were not, despite both being benchmark-specific. Addresses review comments #1, #4, #6 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>

pyproject.toml now composes default dependencies from core/requirements.txt + requirements/pipeline.txt instead of maintaining a separate monolithic main.txt that duplicated both. This ensures a single source of truth for each dependency: it lives in exactly one requirements file, and pyproject.toml references both. Addresses review comment #5 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>

Creates the test file referenced in docs/basics/installation.md that verifies the core/pipeline dependency boundary. Tests import each core module in a subprocess where nemo_run and nemo_skills.pipeline are blocked, ensuring core has no top-level pipeline dependencies. Addresses review comment #2 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>

Rewrite the dependency boundary section to: - Define core as "everything needed for inference + evaluation" (not just PythonTool-specific deps) - Remove references to deleted requirements/main.txt - Clarify that all benchmark evaluator deps go to core until JIT install is implemented - Improve dataset module separation guidance (pipeline = cluster I/O only, core = all local logic) - Add note about summarize-results refactor (issue #779) Addresses review comments #3, #4, #6, #7 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>

Refactor pipeline/dataset.py so it ONLY handles cluster I/O (SSH downloads, mount path resolution) and delegates all local import/resolution logic to core's dataset/utils.py. Key changes: - Extract cluster-specific loading into _get_cluster_dataset_module() - For local extra_datasets fallback, delegate to core instead of reimplementing add_to_path + import_module - For non-cluster cases, delegate entirely to core from the start - Remove duplicated local import logic that was parallel to core Addresses review comment #7 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>

The section labels (agent runtime, math evaluation, code evaluation, benchmark evaluator deps) were misleading since many deps span multiple categories. Keep it as a flat alphabetical list. Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com>

Pipeline no longer calls importlib.import_module or add_to_path directly — all import/module-resolution logic lives in core. Pipeline's only responsibilities are now: - Local executor: unmount paths via get_unmounted_path, then delegate to core, then map returned paths back to mounted form - Remote executor: SSH download via cluster_download_file for custom data_dir or cluster-type extra_datasets Addresses review comment #7 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>

Collapse 3 helper functions into one download helper + one main function. Pipeline only does two things: unmount paths (local executor) and SSH download (remote executor). All import logic delegates to core. 140 -> 78 lines. Signed-off-by: George Armstrong <georgea@nvidia.com>

wandb is used by summarize-results which is core functionality, not just pipeline/orchestration. Move it to core/requirements.txt. Signed-off-by: George Armstrong <georgea@nvidia.com>

The Dockerfile referenced the deleted requirements/main.txt. Update to install from core/requirements.txt + pipeline.txt, matching how pyproject.toml now composes dependencies. Signed-off-by: George Armstrong <georgea@nvidia.com>

The Docker container installs deps from requirements files without running `pip install .`, so package metadata is not available. Checking for nemo_run import instead correctly detects whether pipeline deps are installed. Signed-off-by: George Armstrong <georgea@nvidia.com>

Kipok · 2026-02-18T01:45:41Z

core/requirements.txt

@@ -0,0 +1,39 @@
+# Core dependencies for inference, evaluation, tool calling, and all benchmark evaluators.


can we keep this inside requirements/core.txt? Would be simpler for people to only look in a single folder for all reqs

It has to be in the same directory as the pyproject.toml for the install of nemo-skills-core to work. But I can put a symlink in the requirements/ directory so it is at least clear that this file exists if someone goes to look in requirements.txt for it?

Kipok · 2026-02-18T01:46:13Z

requirements/pipeline.txt

@@ -0,0 +1,7 @@
+# Pipeline/orchestration dependencies (CLI, cluster management, experiment tracking).
+# These are additional to core.txt.


comment should be updated or we should move core/requirement.txt into core.txt in this folder

Kipok · 2026-02-18T01:46:50Z

tests/test_dependency_isolation.py

+import pytest
+
+# Core modules that must be importable without nemo_run / pipeline
+CORE_MODULES = [


can we dynamically find everything inside nemo_skills except pipeline subfolder?

Kipok · 2026-02-18T01:50:25Z

CONTRIBUTING.md

  the dataset into slurm tests. This is the most comprehensive test we can do by running full
  evaluation on cluster with arbitrary model and check that results are as expected.

+### Respect the Core / Pipeline dependency boundary


can we maybe keep the part in here brief, just summarize the logic in a few sentences / bullet points. And then the full description we move to another .md file? I think the full description is quite helpful, but it's a bit too detailed for the guidelines section, which I hope we can keep relatively short

sure. made it brief and moved the bulk of the guidelines to core/README.md, which is referenced here now

Kipok · 2026-02-18T01:53:49Z

.github/workflows/tests.yml

      with:
        python-version: "3.10"
        cache: pip
+    - name: Install uv


do we use it anywhere?

good catch, it was from an old testing strategy, removed

Kipok · 2026-02-18T01:54:18Z

docs/basics/installation.md

+from core, but core modules must not import from pipeline.
+
+This boundary is enforced by `tests/test_dependency_isolation.py` which creates
+fresh virtualenvs and verifies that core modules import successfully without


is this true? I don't think we create fresh envs in tests?

yeah this is outdated from old tests, good cattch

Kipok · 2026-02-18T01:55:33Z

docs/basics/installation.md

+pip install -e ".[dev]"
+```
+
+## Core / Pipeline architecture boundary


I'd maybe move this part somewhere else (e.g. only keep in contributing.md or better in a new .md where extra details from contributing can go as well). It's helpful, but probably a bit too dense for the "basics" part of the docs. More oriented towards people who'd need to modify our code

okay, put it in the core/README.md

The BFCL eval venv uses --system-site-packages and pins huggingface_hub<1, which downgrades the system's huggingface_hub 1.x to 0.x. This breaks transformers (from system packages) which needs is_offline_mode only available in huggingface_hub>=1.0. Gorilla's own BFCL does not pin huggingface_hub, so removing the constraint is safe. Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com>

Move detailed core/pipeline boundary docs from CONTRIBUTING.md and installation.md into docs/core-pipeline-boundary.md. Add symlink at requirements/core.txt pointing to core/requirements.txt for discoverability. Signed-off-by: George Armstrong <georgea@nvidia.com>

…rable-pipeline Signed-off-by: George Armstrong <georgea@nvidia.com> # Conflicts: # nemo_skills/dataset/utils.py # nemo_skills/evaluation/evaluator/__init__.py

Signed-off-by: George Armstrong <georgea@nvidia.com>

Directories with hyphens (e.g., answer-judge, math-500, llama3-instruct) cannot be imported via `import` statement. Use importlib.import_module() which handles arbitrary module names correctly. Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong force-pushed the georgea/refactor-separable-pipeline branch from 8fa5c7d to 4e2fad9 Compare February 10, 2026 22:16

gwarmstrong changed the title ~~maint: separate dependencies for different Skills components~~ Add nemo-skills-core subpackage and separate core/pipeline dependencies Feb 12, 2026

gwarmstrong force-pushed the georgea/refactor-separable-pipeline branch from d22246e to 76c2a18 Compare February 12, 2026 21:46

Add nemo-skills-core subpackage for lightweight installs

f0eb8d0

Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong changed the title ~~Add nemo-skills-core subpackage and separate core/pipeline dependencies~~ Add nemo-skills-core subpackage for lightweight installs Feb 13, 2026

gwarmstrong force-pushed the georgea/refactor-separable-pipeline branch from a2751f3 to f0eb8d0 Compare February 13, 2026 00:38

gwarmstrong marked this pull request as ready for review February 13, 2026 00:58

gwarmstrong requested review from Kipok February 13, 2026 00:58

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

Merge branch 'main' into georgea/refactor-separable-pipeline

9adc401

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

Suggestions from code review addressed

0a4f056

Signed-off-by: George Armstrong <georgea@nvidia.com>

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

nemo_skills/pipeline/dataset.py Outdated Show resolved Hide resolved

gwarmstrong added 2 commits February 13, 2026 13:50

Revert sentinel string back to None in register_evaluator

e2361e6

Signed-off-by: George Armstrong <georgea@nvidia.com>

Fix extra_datasets silently ignored when cluster_config is None

cc70501

Signed-off-by: George Armstrong <georgea@nvidia.com>

Kipok reviewed Feb 14, 2026

View reviewed changes

gwarmstrong added 8 commits February 13, 2026 22:35

Remove summarize-results note from CONTRIBUTING.md

49aed5e

Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong added 3 commits February 17, 2026 13:01

Move wandb from pipeline to core

c5d12cf

wandb is used by summarize-results which is core functionality, not just pipeline/orchestration. Move it to core/requirements.txt. Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong added the run GPU tests label Feb 18, 2026

Kipok reviewed Feb 18, 2026

View reviewed changes

gwarmstrong added run GPU tests and removed run GPU tests labels Feb 18, 2026

gwarmstrong added run GPU tests and removed run GPU tests labels Feb 19, 2026

gwarmstrong added 7 commits February 19, 2026 17:22

address feedback from code review

6255b1a

Signed-off-by: George Armstrong <georgea@nvidia.com>

Merge remote-tracking branch 'origin/main' into georgea/refactor-sepa…

49f3cde

…rable-pipeline Signed-off-by: George Armstrong <georgea@nvidia.com> # Conflicts: # nemo_skills/dataset/utils.py # nemo_skills/evaluation/evaluator/__init__.py

maint: add back core dependency guidelines and adjust links

305cde7

Signed-off-by: George Armstrong <georgea@nvidia.com>

fix dataset resolution logic after merge

28de048

Signed-off-by: George Armstrong <georgea@nvidia.com>

remove boundary arch mention completely

ded4a2c

Signed-off-by: George Armstrong <georgea@nvidia.com>

		_EVALUATOR_MAP_PATHS[eval_type] = None
		_resolved_evaluator_map[eval_type] = eval_fn

		# No cluster orchestration deps (nemo_run, typer, etc.)


		# --- code evaluation ---


		Dependencies in `core/requirements.txt` should be things that a typical `GenerationTask` run with PythonTool would need. Dataset-specific or benchmark-specific packages (e.g., `faiss-cpu`, `sacrebleu`, `func-timeout`) go only in `requirements/main.txt`.

		All core and pipeline deps must also appear in `requirements/main.txt` (the monolithic file used for default installs).

		@@ -0,0 +1,39 @@
		# Core dependencies for inference, evaluation, tool calling, and all benchmark evaluators.

		@@ -0,0 +1,7 @@
		# Pipeline/orchestration dependencies (CLI, cluster management, experiment tracking).
		# These are additional to core.txt.

Comments

Conversation

gwarmstrong commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Summary by CodeRabbit

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 13, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gwarmstrong commented Feb 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gwarmstrong Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gwarmstrong commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 13, 2026 •

edited

Loading

gwarmstrong Feb 14, 2026 •

edited

Loading

gwarmstrong Feb 20, 2026 •

edited

Loading