Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
f0eb8d0
Add nemo-skills-core subpackage for lightweight installs
gwarmstrong Feb 13, 2026
9adc401
Merge branch 'main' into georgea/refactor-separable-pipeline
gwarmstrong Feb 13, 2026
0a4f056
Suggestions from code review addressed
gwarmstrong Feb 13, 2026
e2361e6
Revert sentinel string back to None in register_evaluator
gwarmstrong Feb 13, 2026
cc70501
Fix extra_datasets silently ignored when cluster_config is None
gwarmstrong Feb 13, 2026
a0a2aa7
Add all benchmark evaluator deps to core/requirements.txt
gwarmstrong Feb 14, 2026
7cf5fb6
Eliminate requirements/main.txt duplication
gwarmstrong Feb 14, 2026
9e84d39
Add tests/test_dependency_isolation.py
gwarmstrong Feb 14, 2026
ae191da
Revise CONTRIBUTING.md dependency boundary guidance
gwarmstrong Feb 14, 2026
8118b5a
Simplify pipeline/dataset.py to eliminate local logic duplication
gwarmstrong Feb 14, 2026
fdcfdb1
Remove section comments from core/requirements.txt
gwarmstrong Feb 17, 2026
49aed5e
Remove summarize-results note from CONTRIBUTING.md
gwarmstrong Feb 17, 2026
d1c4195
Properly eliminate duplicated import logic from pipeline/dataset.py
gwarmstrong Feb 17, 2026
b7e258f
Simplify pipeline/dataset.py to a single thin function
gwarmstrong Feb 17, 2026
c5d12cf
Move wandb from pipeline to core
gwarmstrong Feb 17, 2026
3b025ec
Fix Dockerfile.nemo-skills to use core + pipeline requirements
gwarmstrong Feb 17, 2026
836160f
fix: use nemo_run import guard instead of package metadata
gwarmstrong Feb 18, 2026
cf4b94c
fix: remove huggingface_hub<1 pin from BFCL requirements
gwarmstrong Feb 19, 2026
6255b1a
address feedback from code review
gwarmstrong Feb 20, 2026
b76b8bd
add dependency boundary guide and requirements symlink
gwarmstrong Feb 20, 2026
49f3cde
Merge remote-tracking branch 'origin/main' into georgea/refactor-sepa…
gwarmstrong Feb 20, 2026
305cde7
maint: add back core dependency guidelines and adjust links
gwarmstrong Feb 20, 2026
28de048
fix dataset resolution logic after merge
gwarmstrong Feb 20, 2026
ded4a2c
remove boundary arch mention completely
gwarmstrong Feb 20, 2026
d173121
fix: use importlib for hyphenated module names in isolation test
gwarmstrong Feb 20, 2026
d91add1
Merge origin/main: add critpt/dsbench evaluators and pandas deps
gwarmstrong Feb 24, 2026
97bac87
fix: add torchcodec to core requirements for numb3rs dataset
gwarmstrong Feb 24, 2026
cc4118c
bump CACHEBUST to rebuild container with torchcodec
gwarmstrong Feb 24, 2026
1c0eb66
Merge branch 'main' into georgea/refactor-separable-pipeline
gwarmstrong Feb 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,17 @@ The following things are required when adding new benchmarks
the dataset into slurm tests. This is the most comprehensive test we can do by running full
evaluation on cluster with arbitrary model and check that results are as expected.

### Respect the Core / Pipeline dependency boundary

NeMo Skills is split into **Core** (inference, evaluation, tools, benchmarks) and **Pipeline** (CLI, cluster orchestration). The one-way rule:

- **Pipeline** can import from **Core**
- **Core** CANNOT import from **Pipeline** (no `nemo_run`, no `nemo_skills.pipeline`)

When adding dependencies: inference/evaluation/benchmark deps go in `core/requirements.txt`, orchestration deps go in `requirements/pipeline.txt`. This boundary is enforced by `tests/test_dependency_isolation.py`.

For full details (examples, common patterns, what to avoid), see [Dependency Boundary Guide](core/README.md).

### Keep the code elegant
When adding new features, try to keep the code simple and elegant.
- Can you reuse / extend an existing functionality?
Expand Down
64 changes: 64 additions & 0 deletions core/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Core / Pipeline Dependency Boundary

NeMo Skills is split into **Core** (agent runtime) and **Pipeline** (orchestration). The rule is simple:

```
Pipeline can import from Core.
Core CANNOT import from Pipeline.
```

Core modules are everything under `nemo_skills/` **except** `nemo_skills/pipeline/`. They must never have top-level imports from `nemo_skills.pipeline` or `nemo_run`. This boundary is enforced by `tests/test_dependency_isolation.py` which verifies that core modules import successfully when `nemo_run` is blocked.

## Dependency placement

When adding a new dependency, put it in the right requirements file:

| If the dependency is needed for... | Add it to |
|---|---|
| Inference, evaluation, tool calling, any benchmark evaluator | `core/requirements.txt` |
| CLI commands (`ns`), cluster orchestration, experiment tracking | `requirements/pipeline.txt` |

There is no separate `main.txt` — `pyproject.toml` composes the default install from `core/requirements.txt` + `requirements/pipeline.txt`. Each dependency lives in exactly one file.

**Boundary definition:**

- **Core** = everything needed to run inference + evaluation locally (including all benchmark evaluator deps)
- **Pipeline** = orchestration-only deps (`nemo_run`, `typer`, `click`, `nemo-evaluator-launcher`)

All benchmark-specific dependencies (e.g., `faiss-cpu`, `sacrebleu`, `datasets`, `func-timeout`) go in `core/requirements.txt`. Eventually these should migrate to JIT (just-in-time) install so that benchmark deps are installed on demand at runtime, but until that is implemented, they must be in core so evaluators do not crash at runtime.

## Examples of correct placement

- `httpx` -> `core/requirements.txt` (used by model inference clients)
- `sympy` -> `core/requirements.txt` (used by math graders)
- `sacrebleu` -> `core/requirements.txt` (used by translation benchmark evaluator)
- `faiss-cpu` -> `core/requirements.txt` (used by BFCL benchmark evaluator)
- `nemo_run` -> `requirements/pipeline.txt` (cluster job orchestration)
- `wandb` -> `core/requirements.txt` (used by summarize-results)

## Examples of mistakes to avoid

- Adding `nemo_run` to `core/requirements.txt` -- it is a pipeline/orchestration dependency, core must not depend on it.
- Adding `typer` to `core/requirements.txt` -- it is the CLI framework, only used by the pipeline layer.

## Writing new core code

- If you need something from `nemo_skills.pipeline`, your code probably belongs in pipeline, not core. Move it.
- If you have a function that works locally but *also* needs a cluster variant, keep both paths in the same function but use a **lazy import** for the pipeline code inside the branch that needs it (see `dataset/utils.py:get_dataset_module` for the pattern). Never add a top-level import.
- The pipeline layer (`nemo_skills/pipeline/`) can provide thin wrappers or re-exports for convenience (see `pipeline/dataset.py`), but all local logic should live in core.

## Dataset loading example

The boundary shows up concretely in dataset loading:

```python
# Core: local-only dataset loading (no cluster deps)
from nemo_skills.dataset.utils import get_dataset_module
module, data_path = get_dataset_module("gsm8k")

# Pipeline: cluster-aware wrapper (SSH downloads, mount resolution)
from nemo_skills.pipeline.dataset import get_dataset_module
module, data_path = get_dataset_module("gsm8k", cluster_config=cfg)
```

The core version has zero pipeline imports. The pipeline wrapper delegates to core for local resolution and only adds cluster-specific logic (mount-path unmounting, SSH file downloads) when needed.
53 changes: 53 additions & 0 deletions core/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[build-system]
requires = [
"setuptools",
"wheel"
]
build-backend = "setuptools.build_meta"

[project]
dynamic = ["version", "dependencies"]

name = "nemo-skills-core"
description = "NeMo Skills core runtime -- inference, evaluation, and tool calling"
readme = {text = "NeMo Skills core runtime for inference, evaluation, and tool calling. See https://nvidia-nemo.github.io/Skills for full documentation.", content-type = "text/plain"}
classifiers = [
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
]
requires-python = ">=3.10"

[project.urls]
homepage = "https://nvidia-nemo.github.io/Skills"
source = "https://github.com/NVIDIA-NeMo/Skills"
issues = "https://github.com/NVIDIA-NeMo/Skills/issues"

[project.scripts]
ns = "nemo_skills._cli_stub:main"

[tool.setuptools]
include-package-data = true

[tool.setuptools.packages.find]
where = [".."]
exclude = ["tests", "tests.*", "core", "core.*"]

[tool.setuptools.dynamic]
version = { attr = "nemo_skills.version.__version__" }
dependencies = {file = ["requirements.txt"]}
43 changes: 43 additions & 0 deletions core/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Core dependencies for inference, evaluation, tool calling, and all benchmark evaluators.
# No cluster orchestration deps (nemo_run, typer, etc.)
# NOTE: benchmark-specific deps are included here because JIT install is not yet implemented.
# Once JIT install is ready, benchmark deps can be moved to per-benchmark extras.

bs4
compute-eval @ git+https://github.com/NVIDIA/compute-eval.git@2d14770
datasets
editdistance
evalplus @ git+https://github.com/evalplus/evalplus@c91370f
faiss-cpu
fire
flask
func-timeout
gradio
httpx
huggingface_hub
hydra-core
ipython
iso639-lang
langcodes
language-data
litellm[caching]
math-verify[antlr4_9_3]
mcp
numpy
openai
openpyxl>=3.1.0
pandas>=2.0.0
pyxlsb>=1.0.10
pyyaml
rank_bm25
requests
rich
sacrebleu
scikit-learn
sentence_transformers
serpapi
sympy
torchcodec
tqdm
transformers
wandb
7 changes: 4 additions & 3 deletions dockerfiles/Dockerfile.nemo-skills
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,13 @@ RUN pip install langdetect absl-py immutabledict nltk ipython && \

# we aren't copying main nemo_skills folder as it will always be mounted from host
# but we do want to install all requirements in the container directly
RUN mkdir -p /opt/NeMo-Skills/requirements
RUN mkdir -p /opt/NeMo-Skills/requirements /opt/NeMo-Skills/core
COPY pyproject.toml README.md /opt/NeMo-Skills/
COPY requirements /opt/NeMo-Skills/requirements/
COPY core/requirements.txt /opt/NeMo-Skills/core/requirements.txt
# installing sdp in container only
RUN pip install git+https://github.com/NVIDIA/NeMo-speech-data-processor@29b9b1ec0ceaf3ffa441c1d01297371b3f8e11d2
ARG CACHEBUST=3
RUN pip install --no-cache-dir -r /opt/NeMo-Skills/requirements/main.txt
ARG CACHEBUST=4
RUN pip install --no-cache-dir -r /opt/NeMo-Skills/core/requirements.txt -r /opt/NeMo-Skills/requirements/pipeline.txt
# Fix http mismatch between lepton and dggs by manually downloading dggs here
RUN pip install ddgs
48 changes: 48 additions & 0 deletions docs/basics/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Installation & Dependency Groups

NeMo Skills provides two installable packages:

- **`nemo-skills`** (root) -- full install with CLI, cluster orchestration, all benchmarks
- **`nemo-skills-core`** (`core/` subdirectory) -- lightweight runtime only

## Default installation

`pip install nemo-skills` gives you **everything** (inference, evaluation, CLI,
cluster orchestration, benchmarks):

```bash
pip install git+https://github.com/NVIDIA-NeMo/Skills.git
# or, from a local clone:
pip install -e .
```

## Lightweight installation

If you only need inference, evaluation, and tool calling (no cluster orchestration):

```bash
pip install "nemo-skills-core @ git+https://github.com/NVIDIA-NeMo/Skills.git#subdirectory=core"
# or, from a local clone:
pip install -e core/
```

## Extras (dependency groups)

| Extra | Requirements file | What it provides |
|-------|-------------------|------------------|
| `core` | `core/requirements.txt` | Agent runtime: inference, evaluation, tool calling (MCP), prompt formatting, math/code grading. No cluster orchestration. |
| `pipeline` | `requirements/pipeline.txt` | CLI (`ns` command), cluster management, experiment tracking (`nemo_run`, `typer`, `wandb`). |
| `dev` | `requirements/common-tests.txt`, `requirements/common-dev.txt` | Development and testing tools (`pytest`, `ruff`, `pre-commit`). |

### Examples

```bash
# Full install (default)
pip install -e .

# Core only -- lightweight runtime for downstream integrations
pip install -e core/

# Development (everything + dev tools)
pip install -e ".[dev]"
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ nav:
- Nemo-Skills: index.md
- Getting started:
- basics/index.md
- Installation & Dependencies: basics/installation.md
- Cluster configs: basics/cluster-configs.md
- Code packaging: basics/code-packaging.md
- Prompt format: basics/prompt-format.md
Expand Down
20 changes: 20 additions & 0 deletions nemo_skills/_cli_stub.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import sys


def main():
print("nemo-skills-core is installed (lightweight mode).\nFor the full ns CLI, run: pip install nemo-skills")
sys.exit(1)
38 changes: 8 additions & 30 deletions nemo_skills/dataset/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,13 @@
import json
import os
import sys
import tempfile
import time
import urllib.request
from pathlib import Path
from typing import Dict
from urllib.error import URLError

from nemo_skills.evaluation.math_grader import extract_answer
from nemo_skills.pipeline.utils import cluster_download_file, get_unmounted_path


def locate(path):
Expand Down Expand Up @@ -92,20 +90,6 @@ def add_to_path(p):
sys.path = old_path


def _get_dataset_module_from_cluster(cluster_config, mounted_path):
with tempfile.TemporaryDirectory() as tmpdir:
tmp_path = str(Path(tmpdir) / "init.py")
cluster_dataset_path = get_unmounted_path(cluster_config, mounted_path)
try:
cluster_download_file(cluster_config, cluster_dataset_path, tmp_path)
except FileNotFoundError:
raise RuntimeError(
f"Init file {mounted_path} not found on the cluster. "
f"Please check the dataset name you're using. Did you forget to run prepare data commands?"
)
return import_from_path(tmp_path)


def get_dataset_name(dataset):
"""Extract the canonical dataset name from a dataset identifier (short name or path)."""
if "/" in dataset:
Expand Down Expand Up @@ -181,7 +165,7 @@ def get_default_dataset_module(dataset):
return dataset_module, data_path


def get_dataset_module(dataset, data_dir=None, cluster_config=None, extra_benchmark_map=None):
def get_dataset_module(dataset, data_dir=None, extra_benchmark_map=None):
"""Get dataset module from nemo_skills.dataset, extra benchmark map, or a directory path.

Resolution order:
Expand All @@ -191,7 +175,10 @@ def get_dataset_module(dataset, data_dir=None, cluster_config=None, extra_benchm
- If found in exactly one, use it.
- If found in neither, fall back to data_dir if provided.
3. If data_dir is provided and previous resolution failed, try to load the module
from data_dir (locally or by downloading from cluster).
from data_dir locally.

For cluster-aware loading (SSH downloads, mount resolution), use
nemo_skills.pipeline.dataset.get_dataset_module instead.

Args:
extra_benchmark_map: Either a dict mapping short names to directory paths,
Expand Down Expand Up @@ -224,19 +211,10 @@ def get_dataset_module(dataset, data_dir=None, cluster_config=None, extra_benchm
if found_builtin:
return dataset_module, data_path

# Fall back to data_dir if provided
# Fall back to data_dir if provided (local only)
if data_dir:
dataset_as_path = dataset.replace(".", "/")
if cluster_config is None or cluster_config["executor"] == "none":
with add_to_path(data_dir):
dataset_module = importlib.import_module(dataset)
elif cluster_config["executor"] == "local":
with add_to_path(get_unmounted_path(cluster_config, data_dir)):
dataset_module = importlib.import_module(dataset)
else:
dataset_module = _get_dataset_module_from_cluster(
cluster_config, f"{data_dir}/{dataset_as_path}/__init__.py"
)
with add_to_path(data_dir):
dataset_module = importlib.import_module(dataset)
return dataset_module, data_dir

map_path = (
Expand Down
Loading