Skip to content

Sync upstream#17

Merged
BBuf merged 23 commits intosgl-releasefrom
sync-upstream
Feb 7, 2026
Merged

Sync upstream#17
BBuf merged 23 commits intosgl-releasefrom
sync-upstream

Conversation

@BBuf
Copy link
Copy Markdown
Collaborator

@BBuf BBuf commented Feb 7, 2026

No description provided.

jasl and others added 23 commits October 9, 2025 09:09
* build: Minor tweeks for wheel build

Signed-off-by: oliver könig <okoenig@nvidia.com>

* ci: Workflows for wheel build

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fix

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fix

Signed-off-by: oliver könig <okoenig@nvidia.com>

* build: Add CachedWheel

Signed-off-by: oliver könig <okoenig@nvidia.com>

* add version to init

Signed-off-by: oliver könig <okoenig@nvidia.com>

* revert

Signed-off-by: oliver könig <okoenig@nvidia.com>

* revert

Signed-off-by: oliver könig <okoenig@nvidia.com>

* revert

Signed-off-by: oliver könig <okoenig@nvidia.com>

* v2

Signed-off-by: oliver könig <okoenig@nvidia.com>

* update

Signed-off-by: oliver könig <okoenig@nvidia.com>

* test

Signed-off-by: oliver könig <okoenig@nvidia.com>

* from packaging.version import parse

Signed-off-by: oliver könig <okoenig@nvidia.com>

* local version

Signed-off-by: oliver könig <okoenig@nvidia.com>

* remove file

Signed-off-by: oliver könig <okoenig@nvidia.com>

* revert

Signed-off-by: oliver könig <okoenig@nvidia.com>

* Updates and lint

* revert missing cudaextension args

Signed-off-by: oliver könig <okoenig@nvidia.com>

* Add timeout

* fix DG settings

Signed-off-by: oliver könig <okoenig@nvidia.com>

* DG_USE_LOCAL_VERSION

Signed-off-by: oliver könig <okoenig@nvidia.com>

* Update version

* Detect local changes

* Minor fix

* Revert CUTLASS

* Unify options

---------

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Chenggang Zhao <chenggangz@deepseek.com>
* build: Allow NGC builds

Signed-off-by: oliver könig <okoenig@nvidia.com>

* reduce grid

Signed-off-by: oliver könig <okoenig@nvidia.com>

* update grid

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fix

Signed-off-by: oliver könig <okoenig@nvidia.com>

* upgrade cuda action

Signed-off-by: oliver könig <okoenig@nvidia.com>

* remove test

Signed-off-by: oliver könig <okoenig@nvidia.com>

* py3.8

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fix

Signed-off-by: oliver könig <okoenig@nvidia.com>

* exclude

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fix

Signed-off-by: oliver könig <okoenig@nvidia.com>

* torch-version

Signed-off-by: oliver könig <okoenig@nvidia.com>

* py3.8/torch2.1/cuda12.3

Signed-off-by: oliver könig <okoenig@nvidia.com>

* Update publish.yml

* fix grid

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fix

Signed-off-by: oliver könig <okoenig@nvidia.com>

* cuda11.8

Signed-off-by: oliver könig <okoenig@nvidia.com>

* no hopper for 118

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fix

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fix

Signed-off-by: oliver könig <okoenig@nvidia.com>

---------

Signed-off-by: oliver könig <okoenig@nvidia.com>
* py3.8

Signed-off-by: oliver könig <okoenig@nvidia.com>

* chore: Rename from `deep_geem` to `deepgemm`

Signed-off-by: oliver könig <okoenig@nvidia.com>

---------

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
…me-change

Ko3n1g/chore/revert name change
…_gemm

The function sm90_bf16_k_grouped_gemm was incorrectly using SM100ArchSpec
to calculate TMA descriptor block sizes. Since this file is the SM90
implementation, it should consistently use SM90ArchSpec like the other
functions in this file (sm90_bf16_gemm, sm90_m_grouped_bf16_gemm_contiguous,
etc.).

This fixes a copy-paste error that could cause incorrect block size
calculations on SM90 (Hopper) GPUs.

Fixes deepseek-ai#242

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
fix: use SM90ArchSpec instead of SM100ArchSpec in sm90_bf16_k_grouped_gemm
# Conflicts:
#	csrc/apis/attention.hpp
#	csrc/apis/einsum.hpp
#	csrc/apis/gemm.hpp
#	csrc/apis/layout.hpp
#	csrc/apis/runtime.hpp
#	csrc/jit/device_runtime.hpp
#	csrc/jit_kernels/heuristics/common.hpp
#	csrc/jit_kernels/heuristics/sm100.hpp
#	csrc/jit_kernels/heuristics/sm90.hpp
#	csrc/jit_kernels/impls/runtime_utils.hpp
#	csrc/jit_kernels/impls/sm100_fp8_gemm_1d1d.hpp
#	csrc/jit_kernels/impls/sm100_fp8_gemm_1d2d.hpp
#	csrc/jit_kernels/impls/sm90_fp8_gemm_1d2d.hpp
#	csrc/python_api.cpp
#	deep_gemm/__init__.py
#	deep_gemm/utils/layout.py
#	tests/generators.py
#	tests/test_einsum.py
#	tests/test_fp8.py
@BBuf BBuf merged commit 63e845f into sgl-release Feb 7, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants