Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 7, 2025

📄 15% (0.15x) speedup for is_layer_skipped_quant in python/sglang/srt/layers/quantization/moe_wna16.py

⏱️ Runtime : 407 microseconds 354 microseconds (best of 250 runs)

📝 Explanation and details

The optimization replaces Python's built-in any() function with generator expression with a simple explicit for-loop that returns True immediately when a match is found, achieving a 14% speedup.

Key Changes:

  • Eliminated generator overhead: The original code creates a generator object and passes it to any(), which adds function call overhead and object allocation costs
  • Direct early termination: The optimized version returns True as soon as the first matching module is found, without creating intermediate objects
  • Reduced call stack depth: Removes the any() function call layer, making each iteration more direct

Why This Works:
In Python, generator expressions with any() involve creating a generator object and making function calls for each iteration. The explicit for-loop eliminates these overheads while maintaining identical logic. For substring matching operations like module_name in prefix, the direct approach is more efficient.

Performance Impact:
Based on the function reference, is_layer_skipped_quant is called from get_quant_method() during model quantization setup. While not in a tight loop, this function likely gets called for each layer during model initialization, so the 14% improvement can accumulate meaningfully during model loading.

Test Case Performance:
The optimization shows consistent improvements across all test scenarios:

  • Small lists: 90-170% faster for basic cases
  • Large lists with early matches: 100-140% faster when match is found quickly
  • Large lists with no matches: 18-30% faster even when checking all items
  • Edge cases: 80-170% faster for empty lists, special characters, etc.

The optimization is particularly effective for cases with early matches (where modules appear at the start of the list) but still provides benefits even when scanning the entire list.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 65 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

from typing import List

# imports
import pytest  # used for our unit tests
from sglang.srt.layers.quantization.moe_wna16 import is_layer_skipped_quant

# unit tests

# ------------------------
# 1. Basic Test Cases
# ------------------------

def test_basic_single_exact_match():
    # Should skip quant if prefix matches exactly one module name
    codeflash_output = is_layer_skipped_quant("encoder.layer1", ["encoder.layer1"]) # 1.15μs -> 581ns (97.4% faster)

def test_basic_single_no_match():
    # Should not skip quant if prefix does not match any module name
    codeflash_output = is_layer_skipped_quant("encoder.layer2", ["encoder.layer1"]) # 1.04μs -> 472ns (120% faster)

def test_basic_multiple_modules_one_match():
    # Should skip quant if prefix matches at least one module name in the list
    codeflash_output = is_layer_skipped_quant("decoder.layer3", ["encoder.layer1", "decoder.layer3", "embedding"]) # 1.29μs -> 611ns (111% faster)

def test_basic_multiple_modules_no_match():
    # Should not skip quant if prefix matches none of the module names
    codeflash_output = is_layer_skipped_quant("attention", ["encoder", "decoder", "embedding"]) # 1.18μs -> 617ns (91.1% faster)

def test_basic_partial_match():
    # Should skip quant if prefix contains a module name as a substring
    codeflash_output = is_layer_skipped_quant("encoder.layer1.submodule", ["layer1"]) # 1.12μs -> 505ns (122% faster)

def test_basic_empty_modules_to_not_convert():
    # Should not skip quant if the list of modules to not convert is empty
    codeflash_output = is_layer_skipped_quant("encoder.layer1", []) # 829ns -> 330ns (151% faster)

# ------------------------
# 2. Edge Test Cases
# ------------------------

def test_edge_empty_prefix():
    # Should not skip quant if prefix is empty and modules_to_not_convert is non-empty (unless one is empty string)
    codeflash_output = is_layer_skipped_quant("", ["encoder", "layer1"]) # 1.02μs -> 484ns (111% faster)

def test_edge_empty_prefix_and_empty_module_name():
    # Should skip quant if prefix is empty and modules_to_not_convert contains empty string (since '' in '' is True)
    codeflash_output = is_layer_skipped_quant("", [""]) # 1.07μs -> 472ns (128% faster)

def test_edge_modules_to_not_convert_contains_empty_string_and_prefix_nonempty():
    # Should skip quant if modules_to_not_convert contains empty string (since '' in any string is True)
    codeflash_output = is_layer_skipped_quant("anything", [""]) # 1.12μs -> 417ns (168% faster)

def test_edge_prefix_is_whitespace():
    # Should not skip quant if prefix is whitespace and modules_to_not_convert does not match
    codeflash_output = is_layer_skipped_quant("   ", ["encoder", "layer1"]) # 987ns -> 480ns (106% faster)

def test_edge_prefix_and_module_name_are_whitespace():
    # Should skip quant if both prefix and module name are the same whitespace string
    codeflash_output = is_layer_skipped_quant("   ", ["   "]) # 1.18μs -> 524ns (125% faster)

def test_edge_prefix_is_substring_of_module_name():
    # Should not skip quant if prefix is a substring of a module name (since check is module_name in prefix)
    codeflash_output = is_layer_skipped_quant("layer", ["layer1"]) # 907ns -> 426ns (113% faster)

def test_edge_module_name_is_substring_of_prefix():
    # Should skip quant if module name is a substring of prefix
    codeflash_output = is_layer_skipped_quant("layer1", ["layer"]) # 1.16μs -> 505ns (130% faster)

def test_edge_special_characters():
    # Should skip quant if prefix contains module name with special characters
    codeflash_output = is_layer_skipped_quant("encoder.layer$1", ["layer$1"]) # 1.23μs -> 505ns (143% faster)

def test_edge_case_sensitive():
    # Should not skip quant if case does not match (function is case-sensitive)
    codeflash_output = is_layer_skipped_quant("Encoder.Layer1", ["encoder.layer1"]) # 1.04μs -> 489ns (112% faster)

def test_edge_module_name_longer_than_prefix():
    # Should not skip quant if module name is longer than prefix
    codeflash_output = is_layer_skipped_quant("layer", ["layer123"]) # 890ns -> 449ns (98.2% faster)

def test_edge_module_name_equals_prefix():
    # Should skip quant if module name equals prefix
    codeflash_output = is_layer_skipped_quant("layerX", ["layerX"]) # 1.15μs -> 502ns (129% faster)

def test_edge_multiple_empty_strings():
    # Should skip quant if modules_to_not_convert contains multiple empty strings
    codeflash_output = is_layer_skipped_quant("foo", ["", ""]) # 1.09μs -> 420ns (160% faster)

# ------------------------
# 3. Large Scale Test Cases
# ------------------------

def test_large_scale_no_match():
    # Large list of modules_to_not_convert, none match the prefix
    modules = [f"module_{i}" for i in range(1000)]
    codeflash_output = is_layer_skipped_quant("target_module", modules) # 28.5μs -> 24.2μs (18.0% faster)

def test_large_scale_match_at_start():
    # Large list, match at the start
    modules = ["target_module"] + [f"module_{i}" for i in range(999)]
    codeflash_output = is_layer_skipped_quant("target_module", modules) # 1.14μs -> 502ns (128% faster)

def test_large_scale_match_at_end():
    # Large list, match at the end
    modules = [f"module_{i}" for i in range(999)] + ["target_module"]
    codeflash_output = is_layer_skipped_quant("target_module", modules) # 28.6μs -> 24.1μs (18.5% faster)

def test_large_scale_match_in_middle():
    # Large list, match in the middle
    modules = [f"module_{i}" for i in range(500)] + ["target_module"] + [f"module_{i}" for i in range(499)]
    codeflash_output = is_layer_skipped_quant("target_module", modules) # 14.9μs -> 12.4μs (20.1% faster)

def test_large_scale_long_prefixes_and_module_names():
    # Test with long strings to check performance and correctness
    long_prefix = "a" * 500 + "b" * 500
    long_module = "a" * 500
    modules = [long_module, "x" * 1000]
    # long_module is a substring of long_prefix
    codeflash_output = is_layer_skipped_quant(long_prefix, modules) # 1.51μs -> 985ns (53.6% faster)

def test_large_scale_all_empty_module_names():
    # All module names are empty string, should always return True
    modules = [""] * 1000
    codeflash_output = is_layer_skipped_quant("any_prefix", modules) # 1.01μs -> 457ns (121% faster)

def test_large_scale_no_modules_to_not_convert():
    # Large prefix, empty modules_to_not_convert
    prefix = "a" * 1000
    modules = []
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 801ns -> 326ns (146% faster)

def test_large_scale_many_matches():
    # Multiple module names are substrings of prefix
    prefix = "abc_def_ghi_jkl"
    modules = ["abc", "def", "ghi", "jkl"]
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 1.09μs -> 493ns (121% faster)

def test_large_scale_multiple_prefixes_parametrized():
    # Parametrized test for multiple large prefixes and modules
    prefixes = [f"module_{i}" for i in range(100, 110)]
    modules = [f"module_{i}" for i in range(1000)]
    for prefix in prefixes:
        codeflash_output = is_layer_skipped_quant(prefix, modules) # 5.03μs -> 2.41μs (109% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

from typing import List

# imports
import pytest  # used for our unit tests
from sglang.srt.layers.quantization.moe_wna16 import is_layer_skipped_quant

# unit tests

# ---------------- Basic Test Cases ----------------

def test_basic_match_at_start():
    # Should return True when prefix starts with a module name
    codeflash_output = is_layer_skipped_quant("layer1.sub", ["layer1"]) # 1.03μs -> 437ns (137% faster)

def test_basic_match_in_middle():
    # Should return True when module name is in the middle of prefix
    codeflash_output = is_layer_skipped_quant("foo.layer2.bar", ["layer2"]) # 1.09μs -> 477ns (129% faster)

def test_basic_no_match():
    # Should return False when no module name is found in prefix
    codeflash_output = is_layer_skipped_quant("layer3", ["layer1", "layer2"]) # 1.01μs -> 501ns (101% faster)

def test_basic_multiple_modules_one_matches():
    # Should return True if any module name is found in prefix
    codeflash_output = is_layer_skipped_quant("foo.layer4.bar", ["layer1", "layer4", "layerX"]) # 1.24μs -> 578ns (115% faster)

def test_basic_empty_modules_list():
    # Should return False if modules_to_not_convert is empty
    codeflash_output = is_layer_skipped_quant("layer5", []) # 810ns -> 310ns (161% faster)

def test_basic_empty_prefix():
    # Should return False if prefix is empty and modules_to_not_convert is non-empty
    codeflash_output = is_layer_skipped_quant("", ["layer1", "layer2"]) # 1.02μs -> 463ns (121% faster)

def test_basic_empty_prefix_and_modules():
    # Should return False if both prefix and modules_to_not_convert are empty
    codeflash_output = is_layer_skipped_quant("", []) # 785ns -> 314ns (150% faster)

def test_basic_exact_match():
    # Should return True if prefix exactly matches a module name
    codeflash_output = is_layer_skipped_quant("layer6", ["layer6"]) # 1.17μs -> 473ns (148% faster)

def test_basic_case_sensitive():
    # Should be case sensitive
    codeflash_output = is_layer_skipped_quant("Layer7", ["layer7"]) # 1.02μs -> 453ns (126% faster)

# ---------------- Edge Test Cases ----------------

def test_edge_module_name_is_empty_string():
    # Should return True if modules_to_not_convert contains an empty string, since '' in prefix is always True
    codeflash_output = is_layer_skipped_quant("anything", [""]) # 1.17μs -> 427ns (173% faster)

def test_edge_prefix_is_empty_string_and_module_is_empty_string():
    # Should return True if both prefix and module name are empty string
    codeflash_output = is_layer_skipped_quant("", [""]) # 1.07μs -> 413ns (159% faster)

def test_edge_module_name_longer_than_prefix():
    # Should return False if module name is longer than prefix
    codeflash_output = is_layer_skipped_quant("abc", ["abcdef"]) # 946ns -> 402ns (135% faster)

def test_edge_prefix_is_substring_of_module_name():
    # Should return False if prefix is a substring of module name but not vice versa
    codeflash_output = is_layer_skipped_quant("foo", ["foobar"]) # 931ns -> 397ns (135% faster)

def test_edge_module_name_is_special_characters():
    # Should handle special characters in module name
    codeflash_output = is_layer_skipped_quant("foo$bar#baz", ["$bar#"]) # 1.16μs -> 499ns (131% faster)

def test_edge_prefix_and_module_name_with_spaces():
    # Should handle spaces correctly
    codeflash_output = is_layer_skipped_quant("foo bar", [" bar"]) # 1.10μs -> 506ns (118% faster)

def test_edge_module_name_at_end_of_prefix():
    # Should return True if module name is at the end of prefix
    codeflash_output = is_layer_skipped_quant("abc.def.ghi", ["ghi"]) # 1.15μs -> 507ns (127% faster)

def test_edge_multiple_empty_strings_in_modules():
    # Should return True if any module name is empty string
    codeflash_output = is_layer_skipped_quant("nonempty", ["", "foo"]) # 1.04μs -> 427ns (145% faster)

def test_edge_module_name_is_whitespace():
    # Should return True if whitespace is present in prefix
    codeflash_output = is_layer_skipped_quant("abc def", [" "]) # 1.01μs -> 437ns (130% faster)

def test_edge_module_name_not_in_prefix_but_similar():
    # Should return False if module name is similar but not present
    codeflash_output = is_layer_skipped_quant("layer_10", ["layer10"]) # 1.02μs -> 475ns (115% faster)

# ---------------- Large Scale Test Cases ----------------

def test_large_scale_many_modules_some_match():
    # Should return True if one of many modules matches
    modules = [f"mod_{i}" for i in range(500)]
    prefix = "foo.mod_123.bar"
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 1.27μs -> 584ns (118% faster)

def test_large_scale_many_modules_none_match():
    # Should return False if none of many modules match
    modules = [f"mod_{i}" for i in range(500)]
    prefix = "foo.nomatch.bar"
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 14.6μs -> 11.2μs (30.7% faster)

def test_large_scale_long_prefix_and_modules():
    # Should handle long prefix and long module names
    modules = ["x" * 100 for _ in range(100)]
    prefix = "a" * 1000 + "x" * 100
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 1.53μs -> 844ns (81.9% faster)

def test_large_scale_all_empty_modules():
    # Should return True if all module names are empty strings ('' in prefix is always True)
    modules = [""] * 1000
    prefix = "anyprefix"
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 1.04μs -> 431ns (141% faster)

def test_large_scale_large_prefix_no_match():
    # Should return False if large prefix and no module matches
    modules = [f"mod_{i}" for i in range(1000)]
    prefix = "x" * 999
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 260μs -> 255μs (1.98% faster)

def test_large_scale_prefix_with_repeated_patterns():
    # Should return True if module name is repeated in prefix
    modules = ["repeat"]
    prefix = "repeat" * 150
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 1.17μs -> 530ns (120% faster)

def test_large_scale_multiple_matches():
    # Should return True if multiple module names are present in prefix
    modules = [f"mod_{i}" for i in range(1000)]
    prefix = "mod_123_mod_456"
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 1.26μs -> 531ns (137% faster)

def test_large_scale_prefix_and_modules_with_unicode():
    # Should handle unicode characters
    modules = ["模块", "層", "レイヤー"]
    prefix = "foo.模块.層.bar"
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 1.25μs -> 527ns (137% faster)

def test_large_scale_prefix_and_modules_with_numbers():
    # Should match numeric module names
    modules = [str(i) for i in range(1000)]
    prefix = "foo.789.bar"
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 1.36μs -> 674ns (103% faster)

def test_large_scale_prefix_and_modules_with_overlapping_names():
    # Should match correct module even if names overlap
    modules = ["abc", "abcd", "abcde"]
    prefix = "foo.abcde.bar"
    codeflash_output = is_layer_skipped_quant(prefix, modules) # 1.12μs -> 519ns (116% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-is_layer_skipped_quant-mhoz3keu and push.

Codeflash Static Badge

The optimization replaces Python's built-in `any()` function with generator expression with a simple explicit for-loop that returns `True` immediately when a match is found, achieving a **14% speedup**.

**Key Changes:**
- **Eliminated generator overhead**: The original code creates a generator object and passes it to `any()`, which adds function call overhead and object allocation costs
- **Direct early termination**: The optimized version returns `True` as soon as the first matching module is found, without creating intermediate objects
- **Reduced call stack depth**: Removes the `any()` function call layer, making each iteration more direct

**Why This Works:**
In Python, generator expressions with `any()` involve creating a generator object and making function calls for each iteration. The explicit for-loop eliminates these overheads while maintaining identical logic. For substring matching operations like `module_name in prefix`, the direct approach is more efficient.

**Performance Impact:**
Based on the function reference, `is_layer_skipped_quant` is called from `get_quant_method()` during model quantization setup. While not in a tight loop, this function likely gets called for each layer during model initialization, so the 14% improvement can accumulate meaningfully during model loading.

**Test Case Performance:**
The optimization shows consistent improvements across all test scenarios:
- **Small lists**: 90-170% faster for basic cases
- **Large lists with early matches**: 100-140% faster when match is found quickly  
- **Large lists with no matches**: 18-30% faster even when checking all items
- **Edge cases**: 80-170% faster for empty lists, special characters, etc.

The optimization is particularly effective for cases with early matches (where modules appear at the start of the list) but still provides benefits even when scanning the entire list.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 7, 2025 14:50
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant