-
Notifications
You must be signed in to change notification settings - Fork 543
unittest: fix failed unittest on hopper #1952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @yzh119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses and resolves an issue where unit tests were failing on the Hopper platform. The root cause was identified as invalid configurations being generated during JIT warmup, particularly when attempting to use the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughA targeted control-flow guard was added to Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a failing unit test on Hopper by preventing the JIT warmup function gen_prefill_attention_modules from generating configurations for mixed-precision attention with the fa3 backend, which does not support it. The change is correct for the test helper function. However, the review identifies a potential issue where the root cause in the main library code is not addressed. The function determine_attention_backend can still incorrectly select the fa3 backend for mixed-precision cases, which could lead to runtime errors when using the public APIs. It is strongly recommended to fix this at the source by updating is_fa3_backend_supported.
| if q_dtype != kv_dtype: | ||
| continue # fa3 template do not support mixed precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This correctly prevents generating an invalid configuration for the fa3 backend in this test helper. However, this only addresses the symptom in the JIT warmup. The root cause appears to be in flashinfer.utils.determine_attention_backend, which can still select the fa3 backend for mixed-precision cases because is_fa3_backend_supported doesn't perform this check. This could lead to runtime errors in user-facing APIs like single_prefill_with_kv_cache. A more robust fix would be to add the mixed-precision check to is_fa3_backend_supported to prevent incorrect backend selection throughout the library.
📌 Description
Some invalid configuration are generated in JIT warmup (mixed precision) function
gen_prefill_attention_modules.🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit