⚡️ Speed up function `is_gfx95_supported` by 34% #473

codeflash-ai · 2025-11-29T05:31:59Z

📄 34% (0.34x) speedup for `is_gfx95_supported` in `python/sglang/srt/utils/common.py`

⏱️ Runtime : 963 microseconds → 718 microseconds (best of 59 runs)

📝 Explanation and details

The optimization replaces any(gfx in gcn_arch for gfx in ["gfx95"]) with a direct substring check "gfx95" in gcn_arch, achieving a 34% speedup.

Key optimization:

Eliminated unnecessary iteration: The original code creates a generator that iterates over a single-item list ["gfx95"], then uses any() to evaluate it. The optimized version performs a direct substring search.
Reduced function call overhead: Removes the any() function call and generator expression overhead.
More efficient string search: Python's in operator for substring checking is highly optimized in C and faster than iterating over a list with one element.

Performance characteristics:

Best case scenarios: Test results show 16-78% improvements when the HIP check passes and device properties are accessed, particularly for cases with longer gcnArchName strings where the substring search efficiency matters most.
Marginal overhead in some edge cases: A few tests show slight slowdowns (1-16%), likely due to test setup variance, but the overall pattern shows consistent improvement.
Cache effectiveness maintained: The @lru_cache(maxsize=1) ensures the optimization benefit is realized on first call, with subsequent calls being near-instantaneous.

Impact on workloads:
Since this function checks GPU architecture support, it's likely called during initialization or capability detection phases. The 34% improvement reduces latency in GPU setup paths, particularly beneficial in scenarios where multiple architecture checks occur or in environments with frequent re-initialization.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 2033 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import builtins
import sys
import types
# function to test
from functools import lru_cache

# imports
import pytest
from sglang.srt.utils.common import is_gfx95_supported

# We'll define a minimal mock torch module for testing, since we can't guarantee the presence of HIP or CUDA on the test machine.
class DummyDeviceProps:
    def __init__(self, gcnArchName):
        self.gcnArchName = gcnArchName

class DummyCuda:
    def __init__(self, gcnArchName=None, raise_on_call=False):
        self._gcnArchName = gcnArchName
        self._raise_on_call = raise_on_call

    def get_device_properties(self, idx):
        if self._raise_on_call:
            raise RuntimeError("No CUDA device")
        return DummyDeviceProps(self._gcnArchName)

class DummyTorch:
    def __init__(self, hip=False, gcnArchName=None, raise_on_call=False):
        self.version = type("ver", (), {"hip": hip})()
        self.cuda = DummyCuda(gcnArchName, raise_on_call)
from sglang.srt.utils.common import is_gfx95_supported

# Helper to monkeypatch sys.modules['torch'] for the duration of a test
class TorchPatcher:
    def __init__(self, dummy_torch):
        self.dummy_torch = dummy_torch
        self.orig_torch = sys.modules.get("torch", None)

    def __enter__(self):
        sys.modules["torch"] = self.dummy_torch

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.orig_torch is not None:
            sys.modules["torch"] = self.orig_torch
        else:
            del sys.modules["torch"]

# Helper to clear lru_cache between tests
def clear_is_gfx95_supported_cache():
    is_gfx95_supported.cache_clear()

# =========================
# BASIC TEST CASES
# =========================

def test_returns_false_when_not_hip():
    """Should return False if torch.version.hip is False, regardless of gcnArchName."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=False, gcnArchName="gfx950")
    with TorchPatcher(dummy):
        codeflash_output = is_gfx95_supported() # 1.15μs -> 1.17μs (1.63% slower)

def test_returns_true_when_hip_and_gfx95_in_arch():
    """Should return True if torch.version.hip is True and gcnArchName contains 'gfx95'."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=True, gcnArchName="gfx950")
    with TorchPatcher(dummy):
        codeflash_output = is_gfx95_supported() # 912ns -> 1.07μs (14.4% slower)

def test_returns_false_when_hip_and_gfx95_not_in_arch():
    """Should return False if torch.version.hip is True and gcnArchName does not contain 'gfx95'."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=True, gcnArchName="gfx90a")
    with TorchPatcher(dummy):
        codeflash_output = is_gfx95_supported() # 887ns -> 932ns (4.83% slower)

def test_returns_true_when_hip_and_gfx95_is_substring():
    """Should return True if 'gfx95' is a substring of gcnArchName (e.g., 'gfx950x')."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=True, gcnArchName="gfx950x")
    with TorchPatcher(dummy):
        codeflash_output = is_gfx95_supported() # 836ns -> 943ns (11.3% slower)

def test_returns_false_when_hip_and_gcnArchName_is_empty():
    """Should return False if torch.version.hip is True and gcnArchName is empty string."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=True, gcnArchName="")
    with TorchPatcher(dummy):
        codeflash_output = is_gfx95_supported() # 850ns -> 935ns (9.09% slower)

# =========================
# EDGE TEST CASES
# =========================

def test_gcnArchName_case_sensitivity():
    """Should be case sensitive: 'GFX95' does not match 'gfx95'."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=True, gcnArchName="GFX950")
    with TorchPatcher(dummy):
        codeflash_output = is_gfx95_supported() # 799ns -> 890ns (10.2% slower)

def test_gcnArchName_is_none():
    """Should not raise error if gcnArchName is None, should return False."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=True, gcnArchName=None)
    with TorchPatcher(dummy):
        try:
            codeflash_output = is_gfx95_supported(); result = codeflash_output
        except Exception as e:
            pass

def test_gcnArchName_is_non_string():
    """Should not raise error if gcnArchName is int, should return False."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=True, gcnArchName=9595)
    with TorchPatcher(dummy):
        try:
            codeflash_output = is_gfx95_supported(); result = codeflash_output
        except Exception as e:
            pass

def test_multiple_gfx95_matches():
    """Should return True if gcnArchName contains multiple 'gfx95' substrings."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=True, gcnArchName="gfx95gfx95")
    with TorchPatcher(dummy):
        codeflash_output = is_gfx95_supported() # 949ns -> 1.13μs (16.0% slower)

# =========================
# LARGE SCALE TEST CASES
# =========================

def test_large_gcnArchName_with_gfx95_at_start():
    """Should return True if large gcnArchName string starts with 'gfx95'."""
    clear_is_gfx95_supported_cache()
    arch = "gfx95" + "x" * 995  # 1000 chars
    dummy = DummyTorch(hip=True, gcnArchName=arch)
    with TorchPatcher(dummy):
        codeflash_output = is_gfx95_supported() # 904ns -> 1.01μs (10.8% slower)

def test_large_gcnArchName_with_gfx95_at_end():
    """Should return True if large gcnArchName string ends with 'gfx95'."""
    clear_is_gfx95_supported_cache()
    arch = "x" * 995 + "gfx95"
    dummy = DummyTorch(hip=True, gcnArchName=arch)
    with TorchPatcher(dummy):
        codeflash_output = is_gfx95_supported() # 887ns -> 877ns (1.14% faster)

def test_large_gcnArchName_without_gfx95():
    """Should return False if large gcnArchName string does not contain 'gfx95'."""
    clear_is_gfx95_supported_cache()
    arch = "x" * 1000
    dummy = DummyTorch(hip=True, gcnArchName=arch)
    with TorchPatcher(dummy):
        codeflash_output = is_gfx95_supported() # 883ns -> 971ns (9.06% slower)

def test_large_number_of_calls_cache():
    """Should always return the cached value after first call, even if torch is monkeypatched."""
    clear_is_gfx95_supported_cache()
    dummy1 = DummyTorch(hip=True, gcnArchName="gfx950")
    dummy2 = DummyTorch(hip=False, gcnArchName="gfx950")
    with TorchPatcher(dummy1):
        codeflash_output = is_gfx95_supported() # 868ns -> 889ns (2.36% slower)
    # Now patch to dummy2, but due to lru_cache, result should still be True
    with TorchPatcher(dummy2):
        codeflash_output = is_gfx95_supported() # 241ns -> 237ns (1.69% faster)
    # Clear cache, now should return False
    clear_is_gfx95_supported_cache()
    with TorchPatcher(dummy2):
        codeflash_output = is_gfx95_supported() # 365ns -> 414ns (11.8% slower)

# =========================
# ADDITIONAL EDGE CASES
# =========================

def test_gcnArchName_is_list():
    """Should not raise error if gcnArchName is a list, should return False."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=True, gcnArchName=["gfx950"])
    with TorchPatcher(dummy):
        try:
            codeflash_output = is_gfx95_supported(); result = codeflash_output
        except Exception as e:
            pass

def test_gcnArchName_is_dict():
    """Should not raise error if gcnArchName is a dict, should return False."""
    clear_is_gfx95_supported_cache()
    dummy = DummyTorch(hip=True, gcnArchName={"gfx": 95})
    with TorchPatcher(dummy):
        try:
            codeflash_output = is_gfx95_supported(); result = codeflash_output
        except Exception as e:
            pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from functools import lru_cache

# imports
import pytest  # used for our unit tests
# We'll need to patch torch and its submodules for controlled testing.
import torch
from sglang.srt.utils.common import is_gfx95_supported

# unit tests
@pytest.mark.basic
def test_no_hip_returns_false(monkeypatch):
    # Basic test: torch.version.hip is False, should return False
    monkeypatch.setattr(torch.version, "hip", False)
    # Clear lru_cache so monkeypatch works
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported() # 964ns -> 825ns (16.8% faster)

@pytest.mark.basic
def test_hip_true_gfx95_in_gcnarch(monkeypatch):
    # Basic test: torch.version.hip is True and gcnArchName contains 'gfx95'
    monkeypatch.setattr(torch.version, "hip", True)

    class DummyProps:
        gcnArchName = "gfx950"

    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported() # 3.20μs -> 1.98μs (62.0% faster)

@pytest.mark.basic
def test_hip_true_gfx95_not_in_gcnarch(monkeypatch):
    # Basic test: torch.version.hip is True and gcnArchName does NOT contain 'gfx95'
    monkeypatch.setattr(torch.version, "hip", True)

    class DummyProps:
        gcnArchName = "gfx906"

    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported() # 2.78μs -> 1.76μs (57.6% faster)

@pytest.mark.edge
def test_hip_true_gcnarch_empty(monkeypatch):
    # Edge case: gcnArchName is empty string
    monkeypatch.setattr(torch.version, "hip", True)

    class DummyProps:
        gcnArchName = ""

    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported() # 2.62μs -> 1.66μs (58.0% faster)

@pytest.mark.edge
def test_hip_true_gcnarch_multiple_gfx95(monkeypatch):
    # Edge case: gcnArchName contains multiple 'gfx95' substrings
    monkeypatch.setattr(torch.version, "hip", True)

    class DummyProps:
        gcnArchName = "gfx950_gfx951_gfx952"

    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported() # 2.83μs -> 1.74μs (62.2% faster)

@pytest.mark.edge
def test_hip_true_gcnarch_case_sensitive(monkeypatch):
    # Edge case: gcnArchName contains 'GFX95' (uppercase), should NOT match
    monkeypatch.setattr(torch.version, "hip", True)

    class DummyProps:
        gcnArchName = "GFX950"

    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported() # 2.67μs -> 1.74μs (53.7% faster)

@pytest.mark.edge
def test_hip_true_gcnarch_partial_match(monkeypatch):
    # Edge case: gcnArchName contains 'gfx9' but not 'gfx95'
    monkeypatch.setattr(torch.version, "hip", True)

    class DummyProps:
        gcnArchName = "gfx9"

    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported() # 2.58μs -> 1.58μs (63.8% faster)

@pytest.mark.edge
def test_hip_true_gcnarch_gfx95_at_end(monkeypatch):
    # Edge case: gcnArchName ends with 'gfx95'
    monkeypatch.setattr(torch.version, "hip", True)

    class DummyProps:
        gcnArchName = "arch_gfx95"

    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported() # 2.96μs -> 1.76μs (68.0% faster)

@pytest.mark.edge
def test_hip_true_gcnarch_gfx95_at_start(monkeypatch):
    # Edge case: gcnArchName starts with 'gfx95'
    monkeypatch.setattr(torch.version, "hip", True)

    class DummyProps:
        gcnArchName = "gfx95_arch"

    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported() # 2.99μs -> 1.67μs (78.7% faster)

@pytest.mark.edge

def test_large_scale_many_calls(monkeypatch):
    # Large scale: call the function many times, ensure lru_cache works and result is stable
    monkeypatch.setattr(torch.version, "hip", True)

    class DummyProps:
        gcnArchName = "gfx950"

    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported(); result = codeflash_output # 3.55μs -> 2.31μs (53.6% faster)
    for _ in range(1000):  # Should not exceed 1000 steps
        codeflash_output = is_gfx95_supported() # 117μs -> 115μs (1.49% faster)

@pytest.mark.large_scale
def test_large_scale_varied_gcnarch(monkeypatch):
    # Large scale: test with a list of 1000 different gcnArchName strings
    monkeypatch.setattr(torch.version, "hip", True)
    class DummyProps:
        def __init__(self, name):
            self.gcnArchName = name

    # Prepare 1000 names, only one contains 'gfx95'
    names = ["gfx900"] * 999 + ["gfx95"]
    is_gfx95_supported.cache_clear()
    for i, name in enumerate(names):
        monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx, name=name: DummyProps(name))
        # Clear cache each time to force re-evaluation
        is_gfx95_supported.cache_clear()
        if "gfx95" in name:
            codeflash_output = is_gfx95_supported()
        else:
            codeflash_output = is_gfx95_supported()

@pytest.mark.large_scale
def test_large_scale_long_gcnarch_string(monkeypatch):
    # Large scale: gcnArchName is a very long string (but <100MB)
    monkeypatch.setattr(torch.version, "hip", True)

    long_str = "gfx900_" * 1000 + "gfx95" + "_gfx900" * 1000
    class DummyProps:
        gcnArchName = long_str

    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    codeflash_output = is_gfx95_supported() # 5.44μs -> 4.26μs (27.9% faster)

@pytest.mark.edge

def test_torch_version_hip_missing(monkeypatch):
    # Edge case: torch.version.hip attribute does not exist
    monkeypatch.delattr(torch.version, "hip", raising=False)
    is_gfx95_supported.cache_clear()
    # Should raise AttributeError
    try:
        is_gfx95_supported()
    except AttributeError:
        pass

# Additional edge: torch.cuda.get_device_properties returns object without gcnArchName
@pytest.mark.edge
def test_gcnarchname_missing(monkeypatch):
    # Edge case: get_device_properties returns object without gcnArchName
    monkeypatch.setattr(torch.version, "hip", True)
    class DummyProps:
        pass
    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: DummyProps())
    is_gfx95_supported.cache_clear()
    try:
        is_gfx95_supported()
    except AttributeError:
        pass

# Additional edge: torch.cuda.get_device_properties returns None
@pytest.mark.edge
def test_get_device_properties_returns_none(monkeypatch):
    # Edge case: get_device_properties returns None
    monkeypatch.setattr(torch.version, "hip", True)
    monkeypatch.setattr(torch.cuda, "get_device_properties", lambda idx: None)
    is_gfx95_supported.cache_clear()
    try:
        is_gfx95_supported()
    except AttributeError:
        pass

# Additional edge: torch.version.hip is True but torch.cuda.get_device_properties not callable
@pytest.mark.edge
def test_get_device_properties_not_callable(monkeypatch):
    # Edge case: torch.cuda.get_device_properties is not callable
    monkeypatch.setattr(torch.version, "hip", True)
    monkeypatch.setattr(torch.cuda, "get_device_properties", None)
    is_gfx95_supported.cache_clear()
    try:
        is_gfx95_supported()
    except TypeError:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-is_gfx95_supported-mijuthfe and push.

The optimization replaces `any(gfx in gcn_arch for gfx in ["gfx95"])` with a direct substring check `"gfx95" in gcn_arch`, achieving a **34% speedup**. **Key optimization:** - **Eliminated unnecessary iteration**: The original code creates a generator that iterates over a single-item list `["gfx95"]`, then uses `any()` to evaluate it. The optimized version performs a direct substring search. - **Reduced function call overhead**: Removes the `any()` function call and generator expression overhead. - **More efficient string search**: Python's `in` operator for substring checking is highly optimized in C and faster than iterating over a list with one element. **Performance characteristics:** - **Best case scenarios**: Test results show 16-78% improvements when the HIP check passes and device properties are accessed, particularly for cases with longer `gcnArchName` strings where the substring search efficiency matters most. - **Marginal overhead in some edge cases**: A few tests show slight slowdowns (1-16%), likely due to test setup variance, but the overall pattern shows consistent improvement. - **Cache effectiveness maintained**: The `@lru_cache(maxsize=1)` ensures the optimization benefit is realized on first call, with subsequent calls being near-instantaneous. **Impact on workloads:** Since this function checks GPU architecture support, it's likely called during initialization or capability detection phases. The 34% improvement reduces latency in GPU setup paths, particularly beneficial in scenarios where multiple architecture checks occur or in environments with frequent re-initialization.

codeflash-ai bot requested a review from mashraf-222 November 29, 2025 05:32

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `is_gfx95_supported` by 34% #473

⚡️ Speed up function `is_gfx95_supported` by 34% #473

Uh oh!

codeflash-ai bot commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function is_gfx95_supported by 34% #473

Are you sure you want to change the base?

⚡️ Speed up function is_gfx95_supported by 34% #473

Uh oh!

Conversation

codeflash-ai bot commented Nov 29, 2025

📄 34% (0.34x) speedup for is_gfx95_supported in python/sglang/srt/utils/common.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `is_gfx95_supported` by 34% #473

⚡️ Speed up function `is_gfx95_supported` by 34% #473

📄 34% (0.34x) speedup for `is_gfx95_supported` in `python/sglang/srt/utils/common.py`