-
Notifications
You must be signed in to change notification settings - Fork 317
Fixes build for MXFP8 quantize #2781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2781
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit c2ae2eb with merge base 69e71d9 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This is odd, we run microbenchmarks and e2e training benchmarks using mxfp8 with dim1 cast CUDA kernel and it builds successfully, and have never seen that error. To double check, I just did a fresh build on latest commit on main with command I'm using torch cuda 12.8, here are my env vars: https://www.internalfb.com/phabricator/paste/view/P1906186247
I have a question about this - we use this in .cuh files where the actual kernels are implemented. Does nvcc process preprocessor directives on the CPU first before doing compilation of the resulting source code on the device? |
@@ -634,6 +634,10 @@ def get_extensions(): | |||
mxfp8_src_files_exist = all(os.path.exists(f) for f in mxfp8_sources) | |||
if mxfp8_src_files_exist and build_for_sm100a: | |||
print("Building mxfp8_cuda extension") | |||
arch_flags = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your compile logs from before this change in the PR description, I already see the proper gencodes:
...
-DTORCH_EXTENSION_NAME=mxfp8_cuda -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_100a,code=sm_100a -gencode=arch=compute_120,code=compute_120 -gencode=arch=compute_120,
code=sm_120 -gencode=arch=compute_120a,code=sm_120a -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_90,
code=sm_90 -gencode=arch=compute_90a,code=sm_90a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proper gencodes are there but the kernel compilation was still getting skipped... If you remove CUDA_ARCH, you'll see that we get other errors like instructions not available in SM90 etc, since all these gencodes are being passed. So that's why needed to conditionally add the gencodes through setup.py to specific source files, just like it's been done for other files in setup.py.
This warning doesn't impact functionality but still is worth fixing, I will submit a PR:
I took a look and the function incorrectly has a |
FYI I just made #2782 to fix the other warning I see in your build log ( |
I forgot to mention this was with CUDA 13. May be with CUDA 12.8, we don't see this error (I'll verify soon). Regardless, the |
__CUDA_ARCH__
has undefined behavior in host code and should only be used in device code. Without this PR, we run into the following error:Essentially, the
mxfp8_quantize_kernel
kernels are not launched.Compilation command:
TORCH_CUDA_ARCH_LIST="9.0a 10.0a 12.0a 7.5 8.0 8.6 9.0 10.0 12.0+PTX" pip install --no-build-isolation . -vvv
Before this PR, the compilation on B200 looks like:
And after: