-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Maarten/aarch64 spark support #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
25c634f
27db97f
3e2d052
c7f1cf7
2688050
84a5419
83089d1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # Dockerfile for NeMo Safe Synthesizer on DGX Spark (aarch64) | ||
| # | ||
| # Base: NVIDIA vLLM container with torch + vLLM + Triton pre-installed | ||
| # | ||
| # Build: | ||
| # docker build -f containers/Dockerfile.cuda-aarch64 -t nss-spark . | ||
| # | ||
| # Run: | ||
| # docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ | ||
| # -it nss-spark | ||
| # | ||
| FROM nvcr.io/nvidia/vllm:26.02-py3 | ||
|
|
||
| ENV TRITON_CACHE_DIR=/workspace/.triton_cache | ||
| ENV BNB_CUDA_VERSION=130 | ||
|
|
||
| WORKDIR /workspace/Safe-Synthesizer | ||
| COPY . . | ||
|
|
||
| # 1. Install NSS package (no deps — we manage them explicitly) | ||
| RUN pip install --no-deps -e . | ||
|
|
||
| # 2. Torch-dependent packages — --no-deps preserves the container's torch/CUDA | ||
| RUN pip install --no-deps \ | ||
| peft accelerate bitsandbytes datasets==4.3.0 trl==0.26.1 \ | ||
| hf_transfer unsloth unsloth_zoo \ | ||
| opacus sentence-transformers gliner kernels | ||
|
|
||
| # 3. Remaining deps (safe to resolve normally) | ||
| RUN pip install -e ".[engine,cuda-aarch64]" | ||
|
|
||
| ENTRYPOINT ["/usr/bin/bash"] | ||
mvansegbroeck marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| <!-- SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. --> | ||
| <!-- SPDX-License-Identifier: Apache-2.0 --> | ||
|
|
||
| # NeMo Safe Synthesizer on DGX Spark | ||
|
|
||
| Run NeMo Safe Synthesizer on DGX Spark (aarch64 / GB10) using a pre-built container with the correct Triton, vLLM, and PyTorch versions. | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### 1. Build and launch the container | ||
|
|
||
| ```bash | ||
| git clone https://github.com/NVIDIA-NeMo/Safe-Synthesizer.git | ||
| cd Safe-Synthesizer | ||
| docker build -f containers/Dockerfile.cuda-aarch64 -t nss-spark . | ||
| docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 nss-spark | ||
| ``` | ||
|
|
||
| ### 2. Run | ||
|
|
||
| ```python | ||
| python -c " | ||
| import pandas as pd, numpy as np | ||
| from nemo_safe_synthesizer.sdk.library_builder import SafeSynthesizer | ||
|
|
||
| # Sample data — replace with your own CSV or DataFrame | ||
| np.random.seed(42) | ||
| df = pd.DataFrame({ | ||
| 'age': np.random.randint(18, 85, 500), | ||
| 'income': np.random.lognormal(10.5, 0.8, 500).astype(int), | ||
| 'credit_score': np.random.randint(300, 850, 500), | ||
| 'default': np.random.choice(['yes', 'no'], 500, p=[0.15, 0.85]), | ||
| }) | ||
|
|
||
| builder = ( | ||
| SafeSynthesizer() | ||
| .with_data_source(df) | ||
| .with_replace_pii() | ||
| .with_generate(num_records=500) | ||
| .with_evaluate() | ||
| ) | ||
| builder.run() | ||
|
|
||
| s = builder.results.summary | ||
| print(f'Quality (SQS): {s.synthetic_data_quality_score}/10') | ||
| print(f'Privacy (DPS): {s.data_privacy_score}/10') | ||
| builder.save_results() | ||
| " | ||
| ``` | ||
|
|
||
| Expected: SQS ~8-9, DPS ~9-10. | ||
|
|
||
| > **First run is slower.** Model weights (~6 GB) download from HuggingFace and Triton | ||
| > JIT-compiles LoRA kernels for the GB10. Subsequent runs reuse cached weights and kernels. | ||
|
|
||
| ## Use Your Own Data | ||
|
|
||
| ```python | ||
| from nemo_safe_synthesizer.sdk.library_builder import SafeSynthesizer | ||
|
|
||
| builder = ( | ||
| SafeSynthesizer() | ||
| .with_data_source("your_data.csv") # or pass a DataFrame | ||
| .with_replace_pii() | ||
| .with_generate(num_records=1000) | ||
| .with_evaluate() | ||
| ) | ||
| builder.run() | ||
| builder.save_results() | ||
| ``` | ||
|
|
||
| Outputs are saved to `safe-synthesizer-artifacts/` — synthetic CSV and an HTML evaluation report. | ||
|
|
||
| ## Optional: Improve PII Detection | ||
|
|
||
| Set a NIM API key for LLM-based column classification (more accurate than NER-only): | ||
|
|
||
| ```bash | ||
| export NIM_ENDPOINT_URL="https://integrate.api.nvidia.com/v1" | ||
| export NIM_API_KEY="<your-api-key>" # pragma: allowlist secret # get one at build.nvidia.com/settings/api-keys | ||
| ``` | ||
|
|
||
| ## Optional: Differential Privacy | ||
|
|
||
| ```python | ||
| builder = ( | ||
| SafeSynthesizer() | ||
| .with_data_source(df) | ||
| .with_replace_pii() | ||
| .with_generate(num_records=1000) | ||
| .with_differential_privacy(dp_enabled=True, epsilon=8.0) | ||
| .with_evaluate() | ||
| ) | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| **Slow first generation batch?** Triton JIT-compiles LoRA kernels for the GB10 on first use. This is normal and only happens once per container session. | ||
|
|
||
| **Memory issues between runs?** Flush the cache: | ||
| ```bash | ||
| sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' | ||
| ``` | ||
|
|
||
| **Why a container?** DGX Spark's CUDA 13 + aarch64 requires specific Triton, vLLM, and PyTorch versions. The container (`nvcr.io/nvidia/vllm:26.02-py3`) provides a tested stack where Unsloth training and vLLM generation work natively. | ||
|
|
||
| **Full documentation:** [Safe Synthesizer User Guide](https://nvidia-nemo.github.io/Safe-Synthesizer/user-guide/getting-started/) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -148,26 +148,29 @@ cu128 = [ | |
| "kernels>=0.12.1", | ||
| "nvidia-cublas-cu12; sys_platform == 'linux'", | ||
| "nvidia-ml-py; sys_platform == 'linux'", | ||
| "onnxruntime", | ||
| "opacus", | ||
| "opt_einsum", | ||
| "peft", | ||
| "sentence-transformers", | ||
| "torch==2.9.1+cu128; sys_platform == 'linux'", | ||
| "torch-c-dlpack-ext", | ||
| "torchvision==0.24.1+cu128; sys_platform == 'linux'", | ||
| "torchao==0.15.0; sys_platform == 'linux'", | ||
| "torchvision==0.24.1+cu128; sys_platform == 'linux' and platform_machine == 'x86_64'", | ||
| "torchvision==0.24.1; sys_platform == 'linux' and platform_machine == 'aarch64'", | ||
| "torchao==0.15.0; sys_platform == 'linux' and platform_machine == 'x86_64'", | ||
| "transformers==4.57.3", | ||
| "triton>=2.0.0; sys_platform == 'linux'", | ||
| "trl>=0.23.0", | ||
| "unsloth[cu128-torch291]==2025.12.4; sys_platform == 'linux'", | ||
| "unsloth_zoo==2025.12.4; sys_platform == 'linux'", | ||
| "vllm==0.15.0; sys_platform == 'linux'", | ||
| "xformers==v0.0.33.post2; sys_platform == 'linux'", | ||
| "unsloth[cu128-torch291]==2025.12.4", | ||
| "unsloth_zoo==2025.12.4", | ||
| "vllm==0.15.0", | ||
| "xformers==v0.0.33.post2; sys_platform == 'linux' and platform_machine == 'x86_64'", | ||
| ] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is probably going to be a whole thing but we should make a new dep group or do better platform resolution for cross-platform deps.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree this needs a proper spike. For now the platform markers are minimal and don't break x86_64 right?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmmm do we want to keep the sys_platform markers? locking is borked in CI right now and I've been fighting it by putting linux markers in
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We really will need a spark and/or station as part of our github runners so we make sure this continues to work.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me know what to change here. |
||
|
|
||
| # at some point, do per-subpackage dependencies | ||
|
|
||
| [tool.uv] | ||
| required-version = ">=0.9.14, <0.11.0" # Allow current 0.10.x line while staying below the next minor | ||
| required-version = ">=0.9.14, <0.12.0" # Allow current 0.11.x line while staying below the next minor | ||
| cache-keys = [ | ||
| { file = "pyproject.toml" }, { git = { commit = true, tags = true } }, | ||
| { file = "uv.lock" } | ||
|
|
@@ -187,8 +190,8 @@ dependency-metadata = [ | |
|
|
||
|
|
||
| override-dependencies = [ | ||
| "flashinfer-python==0.6.1; sys_platform != 'darwin'", # uv locking won't find the matching versions of flashinfer-python and -cubin without overriding | ||
| "flashinfer-cubin==0.6.1; sys_platform != 'darwin'", # perhaps because the published wheels have some wrong metadata | ||
| "flashinfer-python==0.6.1; sys_platform != 'darwin' and platform_machine != 'aarch64'", # uv locking won't find the matching versions of flashinfer-python and -cubin without overriding | ||
| "flashinfer-cubin==0.6.1; sys_platform != 'darwin' and platform_machine != 'aarch64'", # perhaps because the published wheels have some wrong metadata | ||
| "xgrammar>=0.1.32,<1.0.0", # CVE-2026-25048: override vllm's pin on 0.1.29 | ||
| ] | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 should we add an
extrasgroup for them? mostly to not have these versions be different than what's in the lockfileThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--no-deps install is intentional to preserve the container's torch/CUDA