⚡️ Speed up function is_gfx95_supported by 34%
#473
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 34% (0.34x) speedup for
is_gfx95_supportedinpython/sglang/srt/utils/common.py⏱️ Runtime :
963 microseconds→718 microseconds(best of59runs)📝 Explanation and details
The optimization replaces
any(gfx in gcn_arch for gfx in ["gfx95"])with a direct substring check"gfx95" in gcn_arch, achieving a 34% speedup.Key optimization:
["gfx95"], then usesany()to evaluate it. The optimized version performs a direct substring search.any()function call and generator expression overhead.inoperator for substring checking is highly optimized in C and faster than iterating over a list with one element.Performance characteristics:
gcnArchNamestrings where the substring search efficiency matters most.@lru_cache(maxsize=1)ensures the optimization benefit is realized on first call, with subsequent calls being near-instantaneous.Impact on workloads:
Since this function checks GPU architecture support, it's likely called during initialization or capability detection phases. The 34% improvement reduces latency in GPU setup paths, particularly beneficial in scenarios where multiple architecture checks occur or in environments with frequent re-initialization.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-is_gfx95_supported-mijuthfeand push.