Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 29, 2025

📄 77% (0.77x) speedup for parse_connector_type in python/sglang/srt/utils/common.py

⏱️ Runtime : 2.97 milliseconds 1.68 milliseconds (best of 169 runs)

📝 Explanation and details

The optimization replaces inline regex compilation with a pre-compiled regex pattern, achieving a 76% speedup by eliminating repetitive pattern compilation overhead.

Key optimization: The regex pattern r"(.+)://(.*)"is compiled once at module load time as _connector_pattern and reused across all function calls, instead of recompiling it on every invocation.

Why this works: Python's re.compile() creates an optimized pattern object that can be reused efficiently. The original code was recompiling the same pattern on every call (74.2% of total time spent in re.match()), which involves parsing the regex syntax and building internal state machines. The optimized version eliminates this overhead entirely.

Performance impact by workload:

  • Hot path usage: Based on function references, this function is called in create_remote_connector() for URL parsing and _handle_model_specific_adjustments() for model path validation. These are likely called during server startup and model loading phases.
  • Best gains: Simple/short URLs show 80-90% improvements, while very long strings (1000+ chars) show smaller but still significant 10-25% gains
  • Batch processing: Large-scale tests with 1000 URLs show 68-88% improvements, indicating excellent scaling for bulk operations

The optimization is particularly effective for this use case because URL parsing typically involves many calls with the same simple pattern, making regex compilation the primary bottleneck.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 4088 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import re

# imports
import pytest  # used for our unit tests
import torch.distributed
from sglang.srt.utils.common import parse_connector_type

# unit tests

# -------------------- Basic Test Cases --------------------

def test_basic_http_url():
    # Standard HTTP URL
    codeflash_output = parse_connector_type("http://example.com") # 2.53μs -> 1.37μs (84.7% faster)

def test_basic_https_url():
    # Standard HTTPS URL
    codeflash_output = parse_connector_type("https://example.com") # 2.51μs -> 1.39μs (80.4% faster)

def test_basic_file_url():
    # File URL
    codeflash_output = parse_connector_type("file:///some/path") # 2.43μs -> 1.39μs (75.4% faster)

def test_basic_custom_connector():
    # Custom connector type
    codeflash_output = parse_connector_type("myconnector://resource") # 2.46μs -> 1.35μs (82.8% faster)

def test_basic_connector_with_numbers():
    # Connector type with numbers
    codeflash_output = parse_connector_type("abc123://path/to/resource") # 2.52μs -> 1.45μs (74.0% faster)

def test_basic_connector_with_underscore():
    # Connector type with underscore
    codeflash_output = parse_connector_type("my_connector://foo/bar") # 2.41μs -> 1.38μs (75.3% faster)

# -------------------- Edge Test Cases --------------------

def test_no_connector_type():
    # No connector type present
    codeflash_output = parse_connector_type("justastringwithoutdelimiter") # 2.22μs -> 1.20μs (85.2% faster)

def test_empty_string():
    # Empty input string
    codeflash_output = parse_connector_type("") # 1.82μs -> 739ns (146% faster)

def test_connector_type_only():
    # Only connector type, no path
    codeflash_output = parse_connector_type("ftp://") # 2.52μs -> 1.49μs (69.9% faster)

def test_connector_type_with_empty_path():
    # Connector type with empty path (should still match)
    codeflash_output = parse_connector_type("local://") # 2.27μs -> 1.39μs (63.2% faster)

def test_multiple_colons():
    # Multiple colons in connector type (should only use the first '://')
    codeflash_output = parse_connector_type("foo:bar://baz") # 2.46μs -> 1.45μs (70.1% faster)

def test_path_with_colons():
    # Path contains colons, should not affect connector type extraction
    codeflash_output = parse_connector_type("s3://bucket:folder:object") # 2.64μs -> 1.56μs (68.7% faster)

def test_connector_type_with_symbols():
    # Connector type contains symbols
    codeflash_output = parse_connector_type("a-b+c.1_2://path") # 2.42μs -> 1.37μs (76.8% faster)

def test_connector_type_with_unicode():
    # Unicode in connector type
    codeflash_output = parse_connector_type("üñîçødë://resource") # 2.63μs -> 1.65μs (59.4% faster)

def test_connector_type_with_spaces():
    # Spaces in connector type (should be included literally)
    codeflash_output = parse_connector_type("type with space://path") # 2.41μs -> 1.38μs (73.9% faster)

def test_connector_type_with_leading_trailing_spaces():
    # Leading/trailing spaces in connector type
    codeflash_output = parse_connector_type("  type://path") # 2.44μs -> 1.38μs (76.3% faster)

def test_connector_type_with_newline():
    # Newline in connector type
    codeflash_output = parse_connector_type("foo\nbar://baz") # 1.93μs -> 946ns (104% faster)

def test_connector_type_with_tab():
    # Tab in connector type
    codeflash_output = parse_connector_type("foo\tbar://baz") # 2.42μs -> 1.39μs (74.0% faster)

def test_url_with_no_path_but_colon():
    # URL ends with :// but no path
    codeflash_output = parse_connector_type("test://") # 2.30μs -> 1.29μs (78.1% faster)

def test_url_with_connector_type_only_colon():
    # Only connector type with colon, no slashes
    codeflash_output = parse_connector_type("foo:") # 1.98μs -> 1.02μs (94.2% faster)

def test_url_with_connector_type_and_one_slash():
    # Only one slash after colon
    codeflash_output = parse_connector_type("foo:/bar") # 1.93μs -> 1.05μs (84.6% faster)

def test_url_with_connector_type_and_extra_slashes():
    # More than two slashes after colon
    codeflash_output = parse_connector_type("foo:////bar") # 2.46μs -> 1.48μs (65.8% faster)

def test_url_with_connector_type_and_question_mark():
    # Question mark in connector type
    codeflash_output = parse_connector_type("foo?bar://baz") # 2.38μs -> 1.39μs (72.0% faster)

# -------------------- Large Scale Test Cases --------------------

def test_large_number_of_urls():
    # Test with a large number of valid URLs
    urls = [f"proto{i}://path{i}" for i in range(1000)]
    for i, url in enumerate(urls):
        # Each should return the correct connector type
        codeflash_output = parse_connector_type(url) # 728μs -> 422μs (72.5% faster)

def test_large_number_of_invalid_urls():
    # Test with a large number of invalid URLs (no connector type)
    urls = [f"pathonly{i}" for i in range(1000)]
    for url in urls:
        codeflash_output = parse_connector_type(url) # 634μs -> 337μs (88.3% faster)

def test_large_connector_type():
    # Connector type is a very long string
    long_type = "a" * 500
    url = f"{long_type}://some/path"
    codeflash_output = parse_connector_type(url) # 3.53μs -> 2.17μs (62.6% faster)

def test_large_path():
    # Path is a very long string
    long_path = "b" * 900
    url = f"foo://{long_path}"
    codeflash_output = parse_connector_type(url) # 8.30μs -> 7.26μs (14.4% faster)

def test_large_connector_type_and_path():
    # Both connector type and path are long
    long_type = "x" * 400
    long_path = "y" * 500
    url = f"{long_type}://{long_path}"
    codeflash_output = parse_connector_type(url) # 6.14μs -> 4.95μs (24.1% faster)

# -------------------- Additional Robustness Tests --------------------

@pytest.mark.parametrize("url,expected", [
    # Various special characters in connector type
    ("foo-bar://baz", "foo-bar"),
    ("foo.bar://baz", "foo.bar"),
    ("foo+bar://baz", "foo+bar"),
    ("foo_bar://baz", "foo_bar"),
    ("foo@bar://baz", "foo@bar"),
    ("foo#bar://baz", "foo#bar"),
    ("foo$bar://baz", "foo$bar"),
    ("foo%bar://baz", "foo%bar"),
    ("foo&bar://baz", "foo&bar"),
    ("foo*bar://baz", "foo*bar"),
])
def test_connector_type_with_special_characters(url, expected):
    # Connector type with various special characters
    codeflash_output = parse_connector_type(url) # 25.3μs -> 14.2μs (77.9% faster)

def test_connector_type_with_slash_in_type():
    # Slash in connector type (should be included if before ://)
    codeflash_output = parse_connector_type("foo/bar://baz") # 2.51μs -> 1.41μs (78.1% faster)

def test_connector_type_with_multiple_delimiters():
    # Multiple '://' in the URL (should only split at the first one)
    url = "foo://bar://baz"
    codeflash_output = parse_connector_type(url) # 2.43μs -> 1.39μs (75.4% faster)

def test_connector_type_with_trailing_delimiter():
    # URL ends with '://'
    codeflash_output = parse_connector_type("foo://") # 2.50μs -> 1.28μs (94.7% faster)

def test_connector_type_with_leading_delimiter():
    # URL starts with '://'
    codeflash_output = parse_connector_type("://foo/bar") # 2.13μs -> 1.07μs (98.1% faster)

def test_connector_type_with_only_delimiter():
    # URL is just '://'
    codeflash_output = parse_connector_type("://") # 1.77μs -> 732ns (141% faster)

def test_connector_type_with_empty_connector_and_path():
    # Empty connector type and path
    codeflash_output = parse_connector_type("://") # 1.74μs -> 725ns (140% faster)

def test_connector_type_with_connector_type_and_no_path():
    # Only connector type and delimiter
    codeflash_output = parse_connector_type("abc://") # 2.50μs -> 1.42μs (75.7% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

import re

# imports
import pytest  # used for our unit tests
import torch.distributed
from sglang.srt.utils.common import parse_connector_type

# unit tests

# --------------------------
# 1. Basic Test Cases
# --------------------------

def test_basic_http():
    # Test standard HTTP URL
    codeflash_output = parse_connector_type("http://localhost:8080") # 2.79μs -> 1.67μs (66.8% faster)

def test_basic_https():
    # Test standard HTTPS URL
    codeflash_output = parse_connector_type("https://example.com") # 2.55μs -> 1.50μs (70.4% faster)

def test_basic_custom_connector():
    # Test custom connector type
    codeflash_output = parse_connector_type("myconnector://some/path") # 2.50μs -> 1.39μs (80.1% faster)

def test_basic_connector_with_numbers():
    # Connector type with numbers
    codeflash_output = parse_connector_type("ftp123://fileserver") # 2.57μs -> 1.46μs (76.6% faster)

def test_basic_connector_with_dash():
    # Connector type with dash
    codeflash_output = parse_connector_type("my-conn://foo") # 2.44μs -> 1.41μs (72.3% faster)

def test_basic_connector_with_plus():
    # Connector type with plus
    codeflash_output = parse_connector_type("grpc+ssl://host") # 2.42μs -> 1.34μs (80.8% faster)

def test_basic_connector_with_underscore():
    # Connector type with underscore
    codeflash_output = parse_connector_type("abc_def://bar") # 2.38μs -> 1.33μs (79.4% faster)

def test_basic_connector_with_colon_in_path():
    # Path contains colon, but only the first '://' matters
    codeflash_output = parse_connector_type("abc://foo:bar:baz") # 2.47μs -> 1.46μs (69.8% faster)

# --------------------------
# 2. Edge Test Cases
# --------------------------

def test_edge_missing_scheme():
    # No connector type present
    codeflash_output = parse_connector_type("localhost:8080") # 2.11μs -> 1.10μs (91.0% faster)

def test_edge_empty_string():
    # Empty string input
    codeflash_output = parse_connector_type("") # 1.77μs -> 730ns (143% faster)

def test_edge_only_scheme_separator():
    # Only the separator, no connector type
    codeflash_output = parse_connector_type("://foo") # 2.01μs -> 1.01μs (99.4% faster)

def test_edge_only_connector_type():
    # Only connector type, no separator
    codeflash_output = parse_connector_type("abc") # 1.75μs -> 686ns (155% faster)

def test_edge_connector_type_with_spaces():
    # Connector type contains spaces (should be parsed literally)
    codeflash_output = parse_connector_type("my conn://foo") # 2.63μs -> 1.58μs (65.9% faster)

def test_edge_connector_type_with_special_chars():
    # Connector type contains special chars
    codeflash_output = parse_connector_type("!@#://foo") # 2.48μs -> 1.40μs (77.7% faster)

def test_edge_connector_type_with_unicode():
    # Unicode connector type
    codeflash_output = parse_connector_type("连接器://路径") # 2.97μs -> 2.13μs (39.4% faster)

def test_edge_multiple_scheme_separators():
    # Multiple '://' in the string, only the first matters
    codeflash_output = parse_connector_type("abc://def://ghi") # 2.46μs -> 1.43μs (72.3% faster)

def test_edge_connector_type_empty_and_path_empty():
    # Only '://'
    codeflash_output = parse_connector_type("://") # 1.67μs -> 688ns (142% faster)

def test_edge_connector_type_empty_and_path_nonempty():
    # '://' at the beginning, path is non-empty
    codeflash_output = parse_connector_type("://foo") # 2.11μs -> 1.02μs (107% faster)

def test_edge_connector_type_with_leading_trailing_spaces():
    # Connector type has leading/trailing spaces
    codeflash_output = parse_connector_type("  abc  ://foo") # 2.55μs -> 1.48μs (72.3% faster)

def test_edge_path_is_empty():
    # Path is empty
    codeflash_output = parse_connector_type("abc://") # 2.46μs -> 1.38μs (78.3% faster)

def test_edge_connector_type_with_slash():
    # Connector type with slash (rare, but possible)
    codeflash_output = parse_connector_type("abc/def://foo") # 2.37μs -> 1.42μs (67.1% faster)

def test_edge_connector_type_with_colon():
    # Connector type with colon (should be parsed literally)
    codeflash_output = parse_connector_type("abc:def://foo") # 2.42μs -> 1.36μs (78.2% faster)

def test_edge_connector_type_with_newline():
    # Connector type with newline character
    codeflash_output = parse_connector_type("abc\n://foo") # 1.98μs -> 925ns (114% faster)

def test_edge_path_with_newline():
    # Path with newline, should not affect connector type
    codeflash_output = parse_connector_type("abc://foo\nbar") # 2.40μs -> 1.35μs (77.6% faster)

def test_edge_connector_type_is_digit():
    # Connector type is a digit
    codeflash_output = parse_connector_type("123://foo") # 2.44μs -> 1.39μs (75.8% faster)

def test_edge_path_is_only_separator():
    # Path is only separator
    codeflash_output = parse_connector_type("abc://:") # 2.34μs -> 1.33μs (76.3% faster)

def test_edge_connector_type_is_empty_string():
    # Empty connector type (should not match)
    codeflash_output = parse_connector_type("://foo") # 2.05μs -> 923ns (122% faster)

# --------------------------
# 3. Large Scale Test Cases
# --------------------------

def test_large_number_of_unique_connector_types():
    # Test with many unique connector types
    for i in range(1000):
        connector = f"conn{i}"
        url = f"{connector}://host/path"
        codeflash_output = parse_connector_type(url) # 743μs -> 442μs (68.0% faster)

def test_large_connector_type_string():
    # Connector type is very long (999 chars)
    connector = "a" * 999
    url = f"{connector}://foo"
    codeflash_output = parse_connector_type(url) # 3.71μs -> 2.54μs (46.3% faster)

def test_large_path_string():
    # Path is very long (999 chars)
    connector = "abc"
    path = "x" * 999
    url = f"{connector}://{path}"
    codeflash_output = parse_connector_type(url) # 8.76μs -> 7.90μs (10.9% faster)

def test_large_connector_type_and_path():
    # Both connector type and path are large
    connector = "c" * 500
    path = "p" * 499
    url = f"{connector}://{path}"
    codeflash_output = parse_connector_type(url) # 6.14μs -> 4.94μs (24.4% faster)

def test_large_scale_mixed_types():
    # Mix of valid and invalid URLs in a large batch
    for i in range(500):
        # Valid
        connector = f"c{i}"
        url = f"{connector}://foo"
        codeflash_output = parse_connector_type(url) # 350μs -> 200μs (75.1% faster)
        # Invalid (no scheme)
        url2 = f"foo{i}"
        codeflash_output = parse_connector_type(url2)

def test_large_scale_connector_type_with_various_chars():
    # Test a variety of special characters in connector type in a large batch
    specials = "!@#$%^&*()_+-=~"
    for i, c in enumerate(specials):
        connector = f"prefix{c}suffix"
        url = f"{connector}://foo"
        codeflash_output = parse_connector_type(url) # 13.4μs -> 7.73μs (73.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-parse_connector_type-mijnd2fa and push.

Codeflash Static Badge

The optimization replaces inline regex compilation with a pre-compiled regex pattern, achieving a **76% speedup** by eliminating repetitive pattern compilation overhead.

**Key optimization**: The regex pattern `r"(.+)://(.*)"`is compiled once at module load time as `_connector_pattern` and reused across all function calls, instead of recompiling it on every invocation.

**Why this works**: Python's `re.compile()` creates an optimized pattern object that can be reused efficiently. The original code was recompiling the same pattern on every call (74.2% of total time spent in `re.match()`), which involves parsing the regex syntax and building internal state machines. The optimized version eliminates this overhead entirely.

**Performance impact by workload**:
- **Hot path usage**: Based on function references, this function is called in `create_remote_connector()` for URL parsing and `_handle_model_specific_adjustments()` for model path validation. These are likely called during server startup and model loading phases.
- **Best gains**: Simple/short URLs show 80-90% improvements, while very long strings (1000+ chars) show smaller but still significant 10-25% gains
- **Batch processing**: Large-scale tests with 1000 URLs show 68-88% improvements, indicating excellent scaling for bulk operations

The optimization is particularly effective for this use case because URL parsing typically involves many calls with the same simple pattern, making regex compilation the primary bottleneck.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 29, 2025 02:03
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant