Skip to content

feat: Maarten/aarch64 spark support#268

Open
mvansegbroeck wants to merge 5 commits intomainfrom
maarten/aarch64-spark-support
Open

feat: Maarten/aarch64 spark support#268
mvansegbroeck wants to merge 5 commits intomainfrom
maarten/aarch64-spark-support

Conversation

@mvansegbroeck
Copy link
Copy Markdown
Collaborator

@mvansegbroeck mvansegbroeck commented Mar 19, 2026

Summary

Add DGX Spark (aarch64) support for Safe Synthesizer:

  • Dockerfile: New containers/Dockerfile.cuda-aarch64 for building on aarch64/Spark
  • Docs: docs/DGX_SPARK.md — quickstart guide for running NSS on DGX Spark
  • Dependencies: Platform-specific dependency guards for aarch64 (torchvision, torchao, xformers); add missing onnxruntime and opt_einsum to cu128 extra
  • Training config: Auto-detect aarch64 and override incompatible defaults (flash attention, quantization)
  • Secret scanning: Add pragma: allowlist secret for API key placeholder in docs

Pre-Review Checklist

  • make format && make check or via prek validation.
  • make test passes locally
  • make test-e2e passes locally
  • make test-ci-container passes locally (recommended)
  • GPU CI status check passes -- comment /sync on this PR to trigger a run (auto-triggers on ready-for-review)

Pre-Merge Checklist

  • New or updated tests for any fix or new behavior
  • Updated documentation for new features and behaviors, including docstrings for API docs.

@mvansegbroeck mvansegbroeck requested review from a team as code owners March 19, 2026 17:37
Copy link
Copy Markdown
Collaborator

@binaryaaron binaryaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude was a liar about unsloth and aarch64. also, pull in this change and test again

@mvansegbroeck mvansegbroeck force-pushed the maarten/aarch64-spark-support branch 6 times, most recently from 59d13ec to 83fff62 Compare March 24, 2026 01:22
Copy link
Copy Markdown
Collaborator

@binaryaaron binaryaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to support dgx spark quickly, the different install path in the container is fine but it's not proper support. let's make a follow up for this - we'll need to do a proper spike on cross-platform support + cuda version support.

structured_outputs_config=structured_outputs_config,
enforce_eager=enforce_eager,
)
# attention_config was added in vLLM 0.12+ but removed in some builds.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"some builds?"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NGC container nvcr.io/nvidia/vllm:26.02-py3 doesn't have attention_config. PyPI vLLM 0.15.0 does.

Comment on lines +20 to +35
# Torch-dependent packages — install with --no-deps to preserve container's torch/CUDA
RUN pip install --no-deps \
peft accelerate bitsandbytes datasets==4.3.0 trl==0.26.1 \
hf_transfer unsloth unsloth_zoo \
opacus sentence-transformers gliner kernels

# Safe Synthesizer + remaining deps
RUN pip install --no-deps -e . && \
pip install \
faker 'pydantic[email]>=2.12.5' pydantic-settings pyyaml jsonschema rich structlog \
colorama 'huggingface-hub>=0.34.4,<1' anyascii pycountry betterproto flashtext \
cached-property category-encoders dython dateparser langchain-core json-repair \
matplotlib 'outlines>=1.0.0' plotly prv-accountant 'smart-open==7.0.5' python-stdnum \
'pandas>=2.1.3' ratelimit 'sqlfluff==3.2.0' 'range_regex>=0.1.0' 'tenacity==9.1.2' \
'tiktoken>=0.7.0' tldextract 'wandb==0.23.1' python-dotenv patsy \
pyarrow multiprocess onnxruntime opt_einsum dill==0.3.8 faiss-cpu
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should move to pyproject.toml and follow patterns from the cuda dockerfile. can call this one dockerfile.cuda-spark or cuda-aarch64.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I can move the deps into a spark or aarch64 extra in pyproject.toml and have the Dockerfile just do pip install -e ".[cuda-aarch64]".

"vllm==0.15.0",
"xformers==v0.0.33.post2; sys_platform == 'linux'",
"xformers==v0.0.33.post2; sys_platform == 'linux' and platform_machine == 'x86_64'",
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably going to be a whole thing but we should make a new dep group or do better platform resolution for cross-platform deps.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this needs a proper spike. For now the platform markers are minimal and don't break x86_64 right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm do we want to keep the sys_platform markers? locking is borked in CI right now and I've been fighting it by putting linux markers in

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really will need a spark and/or station as part of our github runners so we make sure this continues to work.

binaryaaron
binaryaaron previously approved these changes Mar 24, 2026
Copy link
Copy Markdown
Collaborator

@binaryaaron binaryaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving conditionally so i don't block while i'm out

@binaryaaron
Copy link
Copy Markdown
Collaborator

We've also got to publish the container somewhere in the future, same with the cuda one.

@mvansegbroeck mvansegbroeck force-pushed the maarten/aarch64-spark-support branch 5 times, most recently from a113a39 to 655ca6c Compare March 24, 2026 21:22
@mvansegbroeck mvansegbroeck changed the title Maarten/aarch64 spark support feat: Maarten/aarch64 spark support Mar 24, 2026
@mvansegbroeck mvansegbroeck force-pushed the maarten/aarch64-spark-support branch 2 times, most recently from 8408cca to 819bdb8 Compare March 26, 2026 00:05
mvansegbroeck and others added 4 commits March 31, 2026 10:03
- containers/Dockerfile.spark: container-based install using nvcr.io/nvidia/vllm:26.02-py3
- docs/DGX_SPARK.md: quick start guide (build + run in 2 steps)
- pyproject.toml: platform markers for aarch64-incompatible packages
  (faiss-gpu-cu12, torchvision+cu128, torchao, xformers)
- config/training.py: auto-fallback Flash Attention 3 to sdpa on aarch64
- vllm_backend.py: handle vllm versions without attention_config kwarg

Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>
onnxruntime is required by gliner but not always resolved
transitively on aarch64. opt_einsum is directly imported in
dp_transformers/linear.py but was never declared.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>
Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>
@mvansegbroeck mvansegbroeck force-pushed the maarten/aarch64-spark-support branch from f3b9988 to c7f1cf7 Compare March 31, 2026 17:04
…lockfile

Signed-off-by: mvansegbroeck <mvansegbroeck@gmail.com>

# 2. Torch-dependent packages — --no-deps preserves the container's torch/CUDA
RUN pip install --no-deps \
peft accelerate bitsandbytes datasets==4.3.0 trl==0.26.1 \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 should we add an extras group for them? mostly to not have these versions be different than what's in the lockfile


# NeMo Safe Synthesizer on DGX Spark

Generate synthetic tabular data with quality and privacy guarantees — train, generate, and evaluate in one command.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this comment be specific to the spark setup?

Comment on lines +279 to +280
if platform.machine() == "aarch64":
if self.attn_implementation == "kernels-community/vllm-flash-attn3":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if platform.machine() == "aarch64":
if self.attn_implementation == "kernels-community/vllm-flash-attn3":
if platform.machine() == "aarch64" and self.attn_implementation == "kernels-community/vllm-flash-attn3":

"vllm==0.15.0",
"xformers==v0.0.33.post2; sys_platform == 'linux'",
"xformers==v0.0.33.post2; sys_platform == 'linux' and platform_machine == 'x86_64'",
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm do we want to keep the sys_platform markers? locking is borked in CI right now and I've been fighting it by putting linux markers in

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to put this under the user or developer guide and be a full part of our docs? A top level file in docs/ that isn't rendered to github pages isn't the layout I expect.

If we don't want it in the github pages yet, then should move to top level of the repo.

```

> If HTTPS clone fails with authentication errors, use SSH:
> `git clone git@github.com:NVIDIA-NeMo/Safe-Synthesizer.git`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Now that the repo is public, this should not be an issue.


**Why a container?** DGX Spark's CUDA 13 + aarch64 requires specific Triton, vLLM, and PyTorch versions. The container (`nvcr.io/nvidia/vllm:26.02-py3`) provides a tested stack where Unsloth training and vLLM generation work natively.

**Full documentation:** [Safe Synthesizer User Guide](https://github.com/NVIDIA-NeMo/Safe-Synthesizer/blob/main/docs/user-guide/getting-started.md)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: We should link to the github pages location at https://nvidia-nemo.github.io/Safe-Synthesizer/user-guide/getting-started/ (now that it's public).

Or if we move this fully into github pages, use a relative link so mkdocs will render it correctly.

"vllm==0.15.0",
"xformers==v0.0.33.post2; sys_platform == 'linux'",
"xformers==v0.0.33.post2; sys_platform == 'linux' and platform_machine == 'x86_64'",
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really will need a spark and/or station as part of our github runners so we make sure this continues to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants