feat: Maarten/aarch64 spark support by mvansegbroeck · Pull Request #268 · NVIDIA-NeMo/Safe-Synthesizer

mvansegbroeck · 2026-03-19T17:37:39Z

Summary

Add DGX Spark (aarch64) support for Safe Synthesizer:

Dockerfile: New containers/Dockerfile.cuda-aarch64 for building on aarch64/Spark
Docs: docs/DGX_SPARK.md — quickstart guide for running NSS on DGX Spark
Dependencies: Platform-specific dependency guards for aarch64 (torchvision, torchao, xformers); add missing onnxruntime and opt_einsum to cu128 extra
Training config: Auto-detect aarch64 and override incompatible defaults (flash attention, quantization)
Secret scanning: Add pragma: allowlist secret for API key placeholder in docs

Pre-Review Checklist

make format && make check or via prek validation.
make test passes locally
make test-e2e passes locally
make test-ci-container passes locally (recommended)
GPU CI status check passes -- comment /sync on this PR to trigger a run (auto-triggers on ready-for-review)

Pre-Merge Checklist

New or updated tests for any fix or new behavior
Updated documentation for new features and behaviors, including docstrings for API docs.

binaryaaron

claude was a liar about unsloth and aarch64. also, pull in this change and test again

pyproject.toml

binaryaaron

if we want to support dgx spark quickly, the different install path in the container is fine but it's not proper support. let's make a follow up for this - we'll need to do a proper spike on cross-platform support + cuda version support.

binaryaaron · 2026-03-24T06:56:37Z

src/nemo_safe_synthesizer/generation/vllm_backend.py

+            structured_outputs_config=structured_outputs_config,
+            enforce_eager=enforce_eager,
+        )
+        # attention_config was added in vLLM 0.12+ but removed in some builds.


"some builds?"

NGC container nvcr.io/nvidia/vllm:26.02-py3 doesn't have attention_config. PyPI vLLM 0.15.0 does.

binaryaaron · 2026-03-24T06:56:42Z

containers/Dockerfile.spark

+# Torch-dependent packages — install with --no-deps to preserve container's torch/CUDA
+RUN pip install --no-deps \
+    peft accelerate bitsandbytes datasets==4.3.0 trl==0.26.1 \
+    hf_transfer unsloth unsloth_zoo \
+    opacus sentence-transformers gliner kernels
+
+# Safe Synthesizer + remaining deps
+RUN pip install --no-deps -e . && \
+    pip install \
+    faker 'pydantic[email]>=2.12.5' pydantic-settings pyyaml jsonschema rich structlog \
+    colorama 'huggingface-hub>=0.34.4,<1' anyascii pycountry betterproto flashtext \
+    cached-property category-encoders dython dateparser langchain-core json-repair \
+    matplotlib 'outlines>=1.0.0' plotly prv-accountant 'smart-open==7.0.5' python-stdnum \
+    'pandas>=2.1.3' ratelimit 'sqlfluff==3.2.0' 'range_regex>=0.1.0' 'tenacity==9.1.2' \
+    'tiktoken>=0.7.0' tldextract 'wandb==0.23.1' python-dotenv patsy \
+    pyarrow multiprocess onnxruntime opt_einsum dill==0.3.8 faiss-cpu


these should move to pyproject.toml and follow patterns from the cuda dockerfile. can call this one dockerfile.cuda-spark or cuda-aarch64.

Makes sense. I can move the deps into a spark or aarch64 extra in pyproject.toml and have the Dockerfile just do pip install -e ".[cuda-aarch64]".

binaryaaron · 2026-03-24T06:57:48Z

pyproject.toml

  "vllm==0.15.0",
-  "xformers==v0.0.33.post2; sys_platform == 'linux'",
+  "xformers==v0.0.33.post2; sys_platform == 'linux' and platform_machine == 'x86_64'",
 ]


this is probably going to be a whole thing but we should make a new dep group or do better platform resolution for cross-platform deps.

Agree this needs a proper spike. For now the platform markers are minimal and don't break x86_64 right?

hmmm do we want to keep the sys_platform markers? locking is borked in CI right now and I've been fighting it by putting linux markers in

We really will need a spark and/or station as part of our github runners so we make sure this continues to work.

binaryaaron

approving conditionally so i don't block while i'm out

binaryaaron · 2026-03-24T14:21:27Z

We've also got to publish the container somewhere in the future, same with the cuda one.

- containers/Dockerfile.spark: container-based install using nvcr.io/nvidia/vllm:26.02-py3 - docs/DGX_SPARK.md: quick start guide (build + run in 2 steps) - pyproject.toml: platform markers for aarch64-incompatible packages (faiss-gpu-cu12, torchvision+cu128, torchao, xformers) - config/training.py: auto-fallback Flash Attention 3 to sdpa on aarch64 - vllm_backend.py: handle vllm versions without attention_config kwarg Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

onnxruntime is required by gliner but not always resolved transitively on aarch64. opt_einsum is directly imported in dp_transformers/linear.py but was never declared. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

…lockfile Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

mckornfield · 2026-04-01T02:02:25Z

containers/Dockerfile.cuda-aarch64

+
+# 2. Torch-dependent packages — --no-deps preserves the container's torch/CUDA
+RUN pip install --no-deps \
+    peft accelerate bitsandbytes datasets==4.3.0 trl==0.26.1 \


🤔 should we add an extras group for them? mostly to not have these versions be different than what's in the lockfile

mckornfield · 2026-04-01T02:02:53Z

docs/DGX_SPARK.md

+
+# NeMo Safe Synthesizer on DGX Spark
+
+Generate synthetic tabular data with quality and privacy guarantees — train, generate, and evaluate in one command.


should this comment be specific to the spark setup?

mckornfield · 2026-04-01T02:04:37Z

src/nemo_safe_synthesizer/config/training.py

+        if platform.machine() == "aarch64":
+            if self.attn_implementation == "kernels-community/vllm-flash-attn3":


Suggested change

if platform.machine() == "aarch64":

if self.attn_implementation == "kernels-community/vllm-flash-attn3":

if platform.machine() == "aarch64" and self.attn_implementation == "kernels-community/vllm-flash-attn3":

mckornfield · 2026-04-01T02:06:29Z

pyproject.toml

  "vllm==0.15.0",
-  "xformers==v0.0.33.post2; sys_platform == 'linux'",
+  "xformers==v0.0.33.post2; sys_platform == 'linux' and platform_machine == 'x86_64'",
 ]


hmmm do we want to keep the sys_platform markers? locking is borked in CI right now and I've been fighting it by putting linux markers in

kendrickb-nvidia · 2026-04-01T17:14:04Z

docs/DGX_SPARK.md

Is there a reason not to put this under the user or developer guide and be a full part of our docs? A top level file in docs/ that isn't rendered to github pages isn't the layout I expect.

If we don't want it in the github pages yet, then should move to top level of the repo.

kendrickb-nvidia · 2026-04-01T17:14:42Z

docs/DGX_SPARK.md

+```
+
+> If HTTPS clone fails with authentication errors, use SSH:
+> `git clone git@github.com:NVIDIA-NeMo/Safe-Synthesizer.git`


suggestion: Now that the repo is public, this should not be an issue.

kendrickb-nvidia · 2026-04-01T17:16:40Z

docs/DGX_SPARK.md

+
+**Why a container?** DGX Spark's CUDA 13 + aarch64 requires specific Triton, vLLM, and PyTorch versions. The container (`nvcr.io/nvidia/vllm:26.02-py3`) provides a tested stack where Unsloth training and vLLM generation work natively.
+
+**Full documentation:** [Safe Synthesizer User Guide](https://github.com/NVIDIA-NeMo/Safe-Synthesizer/blob/main/docs/user-guide/getting-started.md)


suggestion: We should link to the github pages location at https://nvidia-nemo.github.io/Safe-Synthesizer/user-guide/getting-started/ (now that it's public).

Or if we move this fully into github pages, use a relative link so mkdocs will render it correctly.

kendrickb-nvidia · 2026-04-01T17:18:06Z

pyproject.toml

  "vllm==0.15.0",
-  "xformers==v0.0.33.post2; sys_platform == 'linux'",
+  "xformers==v0.0.33.post2; sys_platform == 'linux' and platform_machine == 'x86_64'",
 ]


We really will need a spark and/or station as part of our github runners so we make sure this continues to work.

mvansegbroeck requested review from a team as code owners March 19, 2026 17:37

binaryaaron requested changes Mar 19, 2026

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

mvansegbroeck force-pushed the maarten/aarch64-spark-support branch 6 times, most recently from 59d13ec to 83fff62 Compare March 24, 2026 01:22

binaryaaron reviewed Mar 24, 2026

View reviewed changes

binaryaaron previously approved these changes Mar 24, 2026

View reviewed changes

mvansegbroeck dismissed binaryaaron’s stale review via c3bf79b March 24, 2026 17:04

mvansegbroeck force-pushed the maarten/aarch64-spark-support branch 5 times, most recently from a113a39 to 655ca6c Compare March 24, 2026 21:22

mvansegbroeck changed the title ~~Maarten/aarch64 spark support~~ feat: Maarten/aarch64 spark support Mar 24, 2026

mvansegbroeck force-pushed the maarten/aarch64-spark-support branch 2 times, most recently from 8408cca to 819bdb8 Compare March 26, 2026 00:05

mvansegbroeck and others added 4 commits March 31, 2026 10:03

fix: allowlist secret pragma and use Self type hint

27db97f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

chore: fix formatting and bump uv version ceiling to <0.12.0

c7f1cf7

Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

mvansegbroeck force-pushed the maarten/aarch64-spark-support branch from f3b9988 to c7f1cf7 Compare March 31, 2026 17:04

fix: pass model as positional arg to satisfy type checker and update …

2688050

…lockfile Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

mckornfield reviewed Apr 1, 2026

View reviewed changes

kendrickb-nvidia reviewed Apr 1, 2026

View reviewed changes


		# NeMo Safe Synthesizer on DGX Spark

		Generate synthetic tabular data with quality and privacy guarantees — train, generate, and evaluate in one command.

		if platform.machine() == "aarch64":
		if self.attn_implementation == "kernels-community/vllm-flash-attn3":

	if platform.machine() == "aarch64":
	if self.attn_implementation == "kernels-community/vllm-flash-attn3":
	if platform.machine() == "aarch64" and self.attn_implementation == "kernels-community/vllm-flash-attn3":


		Why a container? DGX Spark's CUDA 13 + aarch64 requires specific Triton, vLLM, and PyTorch versions. The container (`nvcr.io/nvidia/vllm:26.02-py3`) provides a tested stack where Unsloth training and vLLM generation work natively.

		Full documentation: [Safe Synthesizer User Guide](https://github.com/NVIDIA-NeMo/Safe-Synthesizer/blob/main/docs/user-guide/getting-started.md)

Conversation

mvansegbroeck commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pre-Review Checklist

Pre-Merge Checklist

Uh oh!

binaryaaron left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

binaryaaron left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

binaryaaron left a comment

Choose a reason for hiding this comment

Uh oh!

binaryaaron commented Mar 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mvansegbroeck commented Mar 19, 2026 •

edited

Loading