-
Notifications
You must be signed in to change notification settings - Fork 155
Add nemo-skills-core subpackage for lightweight installs #1229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
f0eb8d0
Add nemo-skills-core subpackage for lightweight installs
gwarmstrong 9adc401
Merge branch 'main' into georgea/refactor-separable-pipeline
gwarmstrong 0a4f056
Suggestions from code review addressed
gwarmstrong e2361e6
Revert sentinel string back to None in register_evaluator
gwarmstrong cc70501
Fix extra_datasets silently ignored when cluster_config is None
gwarmstrong a0a2aa7
Add all benchmark evaluator deps to core/requirements.txt
gwarmstrong 7cf5fb6
Eliminate requirements/main.txt duplication
gwarmstrong 9e84d39
Add tests/test_dependency_isolation.py
gwarmstrong ae191da
Revise CONTRIBUTING.md dependency boundary guidance
gwarmstrong 8118b5a
Simplify pipeline/dataset.py to eliminate local logic duplication
gwarmstrong fdcfdb1
Remove section comments from core/requirements.txt
gwarmstrong 49aed5e
Remove summarize-results note from CONTRIBUTING.md
gwarmstrong d1c4195
Properly eliminate duplicated import logic from pipeline/dataset.py
gwarmstrong b7e258f
Simplify pipeline/dataset.py to a single thin function
gwarmstrong c5d12cf
Move wandb from pipeline to core
gwarmstrong 3b025ec
Fix Dockerfile.nemo-skills to use core + pipeline requirements
gwarmstrong 836160f
fix: use nemo_run import guard instead of package metadata
gwarmstrong cf4b94c
fix: remove huggingface_hub<1 pin from BFCL requirements
gwarmstrong 6255b1a
address feedback from code review
gwarmstrong b76b8bd
add dependency boundary guide and requirements symlink
gwarmstrong 49f3cde
Merge remote-tracking branch 'origin/main' into georgea/refactor-sepa…
gwarmstrong 305cde7
maint: add back core dependency guidelines and adjust links
gwarmstrong 28de048
fix dataset resolution logic after merge
gwarmstrong ded4a2c
remove boundary arch mention completely
gwarmstrong d173121
fix: use importlib for hyphenated module names in isolation test
gwarmstrong d91add1
Merge origin/main: add critpt/dsbench evaluators and pandas deps
gwarmstrong 97bac87
fix: add torchcodec to core requirements for numb3rs dataset
gwarmstrong cc4118c
bump CACHEBUST to rebuild container with torchcodec
gwarmstrong 1c0eb66
Merge branch 'main' into georgea/refactor-separable-pipeline
gwarmstrong File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| # Core / Pipeline Dependency Boundary | ||
|
|
||
| NeMo Skills is split into **Core** (agent runtime) and **Pipeline** (orchestration). The rule is simple: | ||
|
|
||
| ``` | ||
| Pipeline can import from Core. | ||
| Core CANNOT import from Pipeline. | ||
| ``` | ||
|
|
||
| Core modules are everything under `nemo_skills/` **except** `nemo_skills/pipeline/`. They must never have top-level imports from `nemo_skills.pipeline` or `nemo_run`. This boundary is enforced by `tests/test_dependency_isolation.py` which verifies that core modules import successfully when `nemo_run` is blocked. | ||
|
|
||
| ## Dependency placement | ||
|
|
||
| When adding a new dependency, put it in the right requirements file: | ||
|
|
||
| | If the dependency is needed for... | Add it to | | ||
| |---|---| | ||
| | Inference, evaluation, tool calling, any benchmark evaluator | `core/requirements.txt` | | ||
| | CLI commands (`ns`), cluster orchestration, experiment tracking | `requirements/pipeline.txt` | | ||
|
|
||
| There is no separate `main.txt` — `pyproject.toml` composes the default install from `core/requirements.txt` + `requirements/pipeline.txt`. Each dependency lives in exactly one file. | ||
|
|
||
| **Boundary definition:** | ||
|
|
||
| - **Core** = everything needed to run inference + evaluation locally (including all benchmark evaluator deps) | ||
| - **Pipeline** = orchestration-only deps (`nemo_run`, `typer`, `click`, `nemo-evaluator-launcher`) | ||
|
|
||
| All benchmark-specific dependencies (e.g., `faiss-cpu`, `sacrebleu`, `datasets`, `func-timeout`) go in `core/requirements.txt`. Eventually these should migrate to JIT (just-in-time) install so that benchmark deps are installed on demand at runtime, but until that is implemented, they must be in core so evaluators do not crash at runtime. | ||
|
|
||
| ## Examples of correct placement | ||
|
|
||
| - `httpx` -> `core/requirements.txt` (used by model inference clients) | ||
| - `sympy` -> `core/requirements.txt` (used by math graders) | ||
| - `sacrebleu` -> `core/requirements.txt` (used by translation benchmark evaluator) | ||
| - `faiss-cpu` -> `core/requirements.txt` (used by BFCL benchmark evaluator) | ||
| - `nemo_run` -> `requirements/pipeline.txt` (cluster job orchestration) | ||
| - `wandb` -> `core/requirements.txt` (used by summarize-results) | ||
|
|
||
| ## Examples of mistakes to avoid | ||
|
|
||
| - Adding `nemo_run` to `core/requirements.txt` -- it is a pipeline/orchestration dependency, core must not depend on it. | ||
| - Adding `typer` to `core/requirements.txt` -- it is the CLI framework, only used by the pipeline layer. | ||
|
|
||
| ## Writing new core code | ||
|
|
||
| - If you need something from `nemo_skills.pipeline`, your code probably belongs in pipeline, not core. Move it. | ||
| - If you have a function that works locally but *also* needs a cluster variant, keep both paths in the same function but use a **lazy import** for the pipeline code inside the branch that needs it (see `dataset/utils.py:get_dataset_module` for the pattern). Never add a top-level import. | ||
| - The pipeline layer (`nemo_skills/pipeline/`) can provide thin wrappers or re-exports for convenience (see `pipeline/dataset.py`), but all local logic should live in core. | ||
|
|
||
| ## Dataset loading example | ||
|
|
||
| The boundary shows up concretely in dataset loading: | ||
|
|
||
| ```python | ||
| # Core: local-only dataset loading (no cluster deps) | ||
| from nemo_skills.dataset.utils import get_dataset_module | ||
| module, data_path = get_dataset_module("gsm8k") | ||
|
|
||
| # Pipeline: cluster-aware wrapper (SSH downloads, mount resolution) | ||
| from nemo_skills.pipeline.dataset import get_dataset_module | ||
| module, data_path = get_dataset_module("gsm8k", cluster_config=cfg) | ||
| ``` | ||
|
|
||
| The core version has zero pipeline imports. The pipeline wrapper delegates to core for local resolution and only adds cluster-specific logic (mount-path unmounting, SSH file downloads) when needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| [build-system] | ||
| requires = [ | ||
| "setuptools", | ||
| "wheel" | ||
| ] | ||
| build-backend = "setuptools.build_meta" | ||
|
|
||
| [project] | ||
| dynamic = ["version", "dependencies"] | ||
|
|
||
| name = "nemo-skills-core" | ||
| description = "NeMo Skills core runtime -- inference, evaluation, and tool calling" | ||
| readme = {text = "NeMo Skills core runtime for inference, evaluation, and tool calling. See https://nvidia-nemo.github.io/Skills for full documentation.", content-type = "text/plain"} | ||
| classifiers = [ | ||
| "Programming Language :: Python :: 3", | ||
| "Programming Language :: Python :: 3.10", | ||
| "License :: OSI Approved :: Apache Software License", | ||
| "Operating System :: OS Independent", | ||
| ] | ||
| requires-python = ">=3.10" | ||
|
|
||
| [project.urls] | ||
| homepage = "https://nvidia-nemo.github.io/Skills" | ||
| source = "https://github.com/NVIDIA-NeMo/Skills" | ||
| issues = "https://github.com/NVIDIA-NeMo/Skills/issues" | ||
|
|
||
| [project.scripts] | ||
| ns = "nemo_skills._cli_stub:main" | ||
|
|
||
| [tool.setuptools] | ||
| include-package-data = true | ||
|
|
||
| [tool.setuptools.packages.find] | ||
| where = [".."] | ||
| exclude = ["tests", "tests.*", "core", "core.*"] | ||
|
|
||
| [tool.setuptools.dynamic] | ||
| version = { attr = "nemo_skills.version.__version__" } | ||
| dependencies = {file = ["requirements.txt"]} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| # Core dependencies for inference, evaluation, tool calling, and all benchmark evaluators. | ||
Kipok marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # No cluster orchestration deps (nemo_run, typer, etc.) | ||
| # NOTE: benchmark-specific deps are included here because JIT install is not yet implemented. | ||
| # Once JIT install is ready, benchmark deps can be moved to per-benchmark extras. | ||
|
|
||
| bs4 | ||
| compute-eval @ git+https://github.com/NVIDIA/compute-eval.git@2d14770 | ||
| datasets | ||
| editdistance | ||
| evalplus @ git+https://github.com/evalplus/evalplus@c91370f | ||
| faiss-cpu | ||
| fire | ||
| flask | ||
| func-timeout | ||
| gradio | ||
| httpx | ||
| huggingface_hub | ||
| hydra-core | ||
| ipython | ||
| iso639-lang | ||
| langcodes | ||
| language-data | ||
| litellm[caching] | ||
| math-verify[antlr4_9_3] | ||
| mcp | ||
| numpy | ||
| openai | ||
| openpyxl>=3.1.0 | ||
| pandas>=2.0.0 | ||
| pyxlsb>=1.0.10 | ||
| pyyaml | ||
| rank_bm25 | ||
| requests | ||
| rich | ||
| sacrebleu | ||
| scikit-learn | ||
| sentence_transformers | ||
| serpapi | ||
| sympy | ||
| torchcodec | ||
| tqdm | ||
| transformers | ||
| wandb | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| # Installation & Dependency Groups | ||
|
|
||
| NeMo Skills provides two installable packages: | ||
|
|
||
| - **`nemo-skills`** (root) -- full install with CLI, cluster orchestration, all benchmarks | ||
| - **`nemo-skills-core`** (`core/` subdirectory) -- lightweight runtime only | ||
|
|
||
| ## Default installation | ||
|
|
||
| `pip install nemo-skills` gives you **everything** (inference, evaluation, CLI, | ||
| cluster orchestration, benchmarks): | ||
|
|
||
| ```bash | ||
| pip install git+https://github.com/NVIDIA-NeMo/Skills.git | ||
| # or, from a local clone: | ||
| pip install -e . | ||
| ``` | ||
|
|
||
| ## Lightweight installation | ||
|
|
||
| If you only need inference, evaluation, and tool calling (no cluster orchestration): | ||
|
|
||
| ```bash | ||
| pip install "nemo-skills-core @ git+https://github.com/NVIDIA-NeMo/Skills.git#subdirectory=core" | ||
| # or, from a local clone: | ||
| pip install -e core/ | ||
| ``` | ||
|
|
||
| ## Extras (dependency groups) | ||
|
|
||
| | Extra | Requirements file | What it provides | | ||
| |-------|-------------------|------------------| | ||
| | `core` | `core/requirements.txt` | Agent runtime: inference, evaluation, tool calling (MCP), prompt formatting, math/code grading. No cluster orchestration. | | ||
| | `pipeline` | `requirements/pipeline.txt` | CLI (`ns` command), cluster management, experiment tracking (`nemo_run`, `typer`, `wandb`). | | ||
| | `dev` | `requirements/common-tests.txt`, `requirements/common-dev.txt` | Development and testing tools (`pytest`, `ruff`, `pre-commit`). | | ||
|
|
||
| ### Examples | ||
|
|
||
| ```bash | ||
| # Full install (default) | ||
| pip install -e . | ||
|
|
||
| # Core only -- lightweight runtime for downstream integrations | ||
| pip install -e core/ | ||
|
|
||
| # Development (everything + dev tools) | ||
| pip install -e ".[dev]" | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import sys | ||
|
|
||
|
|
||
| def main(): | ||
| print("nemo-skills-core is installed (lightweight mode).\nFor the full ns CLI, run: pip install nemo-skills") | ||
| sys.exit(1) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.