Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 11, 2025

📄 25% (0.25x) speedup for _re_compile_path in mlflow/server/auth/__init__.py

⏱️ Runtime : 1.27 milliseconds 1.01 milliseconds (best of 5 runs)

📝 Explanation and details

The optimization achieves a 25% speedup by pre-compiling the regex pattern used for string substitution, moving it from function-local to module-level scope.

Key Change:

  • Pre-compiled regex pattern: _ANGLE_BRACKET_PATTERN = re.compile(r"<([^>]+)>") is now compiled once at module import time instead of being compiled on every function call.

Why This Works:
In the original code, re.sub(r"<([^>]+)>", ...) internally compiles the regex pattern on every function call. The re module caches compiled patterns, but this still involves hash lookups and cache management overhead. By pre-compiling the pattern and using _ANGLE_BRACKET_PATTERN.sub(), we eliminate this repetitive compilation cost entirely.

Performance Analysis:
The line profiler shows the per-hit time dropping from 176,860ns to 166,608ns (6% improvement per call), and with 1,484 hits, this compounds to the observed 25% total speedup. The optimization is most effective for:

  • Paths without parameters (43.2% faster on empty paths)
  • Simple paths (25-40% improvements on basic cases)
  • High-frequency calls (32.5% faster when called 1000 times in a loop)

Impact on Workloads:
This function appears to be part of MLflow's authentication/routing system for converting URL templates to regex patterns. Any authentication middleware or request routing that processes many URL patterns will benefit significantly from this optimization, especially in high-throughput scenarios where the same path patterns are repeatedly compiled.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1043 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime

import re

imports

import pytest
from mlflow.server.auth.init import _re_compile_path

unit tests

-----------------------

Basic Test Cases

-----------------------

def test_no_parameters():
# Path without angle brackets should remain unchanged
codeflash_output = _re_compile_path("/api/2.0/experiments/list"); pattern = codeflash_output # 4.77μs -> 3.44μs (38.5% faster)

def test_single_parameter():
# Single parameter should be replaced with ([^/]+)
codeflash_output = _re_compile_path("/api/2.0/experiments/<experiment_id>"); pattern = codeflash_output # 4.49μs -> 4.13μs (8.71% faster)

def test_multiple_parameters():
# Multiple parameters should all be replaced
codeflash_output = _re_compile_path("/api//experiments/<experiment_id>"); pattern = codeflash_output # 5.05μs -> 4.41μs (14.4% faster)

def test_adjacent_parameters():
# Adjacent parameters (unusual but possible)
codeflash_output = _re_compile_path("/api/"); pattern = codeflash_output # 4.90μs -> 4.24μs (15.6% faster)

def test_parameter_in_middle():
# Parameter in the middle of the path
codeflash_output = _re_compile_path("/api//foo"); pattern = codeflash_output # 5.26μs -> 4.59μs (14.6% faster)

def test_parameter_with_underscore():
# Parameter name with underscores or digits
codeflash_output = _re_compile_path("/api/<experiment_id_2>"); pattern = codeflash_output # 4.58μs -> 4.13μs (10.8% faster)

-----------------------

Edge Test Cases

-----------------------

def test_empty_path():
# Empty path should return an empty regex
codeflash_output = _re_compile_path(""); pattern = codeflash_output # 2.78μs -> 2.64μs (5.07% faster)

def test_only_parameter():
# Path is just a parameter
codeflash_output = _re_compile_path(""); pattern = codeflash_output # 4.12μs -> 3.75μs (10.0% faster)

def test_parameter_at_start_and_end():
# Parameter at start and end
codeflash_output = _re_compile_path("/foo/"); pattern = codeflash_output # 5.05μs -> 4.22μs (19.8% faster)

def test_parameter_with_special_chars_in_name():
# Parameter name contains dashes, dots, or other special chars (should match the name, but not affect the regex)
codeflash_output = _re_compile_path("/api/<exp-id.v2>"); pattern = codeflash_output # 4.51μs -> 3.85μs (17.1% faster)

def test_multiple_adjacent_angle_brackets():
# Path with multiple angle brackets, some empty (should not match empty names)
codeflash_output = _re_compile_path("/api/<>/foo/<>"); pattern = codeflash_output # 5.13μs -> 3.87μs (32.8% faster)

def test_parameter_with_nested_brackets():
# Parameter with nested brackets in name (should treat first closing '>')
codeflash_output = _re_compile_path("/api/<foo>/list"); pattern = codeflash_output # 4.50μs -> 4.32μs (4.38% faster)

def test_path_with_regex_metacharacters():
# Path with regex metacharacters should not break the regex
codeflash_output = _re_compile_path("/api/2.0/experiments/.*"); pattern = codeflash_output # 5.05μs -> 4.43μs (13.9% faster)

def test_parameter_with_unicode():
# Path with unicode in parameter name
codeflash_output = _re_compile_path("/api/<параметр>"); pattern = codeflash_output # 5.33μs -> 4.87μs (9.55% faster)

def test_path_with_trailing_slash():
# Path with trailing slash and parameter
codeflash_output = _re_compile_path("/api//"); pattern = codeflash_output # 4.78μs -> 4.36μs (9.78% faster)

-----------------------

Large Scale Test Cases

-----------------------

def test_long_path_many_parameters():
# Path with many parameters (scalability)
num_params = 100
path = "/" + "/".join([f"<p{i}>" for i in range(num_params)])
codeflash_output = _re_compile_path(path); pattern = codeflash_output # 14.5μs -> 13.0μs (11.5% faster)
expected_pattern = "/" + "/".join(["([^/]+)"] * num_params)
# Build a matching string
matching_str = "/" + "/".join([f"val{i}" for i in range(num_params)])

def test_long_static_path():
# Very long static path (no parameters)
path = "/" + "a" * 999
codeflash_output = _re_compile_path(path); pattern = codeflash_output # 3.95μs -> 3.47μs (13.7% faster)

def test_large_path_with_mixed_parameters():
# Large path with alternating static and parameter segments
segments = []
for i in range(500):
segments.append(f"static{i}")
segments.append(f"<param{i}>")
path = "/" + "/".join(segments)
codeflash_output = _re_compile_path(path); pattern = codeflash_output # 61.8μs -> 58.9μs (4.89% faster)
expected_pattern = "/" + "/".join([f"static{i}/([^/]+)" for i in range(500)])
# Build a matching string
match_str = "/" + "/".join([f"static{i}/value{i}" for i in range(500)])
# Should not match if one value is missing
broken_str = "/" + "/".join([f"static{i}/value{i}" for i in range(499)])

def test_performance_large_number_of_paths():
# Test compilation performance with many different paths
# (This is not a strict performance test, but ensures function handles many calls)
for i in range(1000):
path = f"/api/<param{i}>/foo"
codeflash_output = _re_compile_path(path); pattern = codeflash_output # 913μs -> 689μs (32.5% faster)

def test_parameter_names_with_similar_prefixes():
# Parameters with similar prefixes should not interfere with each other
codeflash_output = _re_compile_path("/api//foo/"); pattern = codeflash_output # 5.02μs -> 4.40μs (14.2% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
import re

imports

import pytest # used for our unit tests
from mlflow.server.auth.init import _re_compile_path

unit tests

-------------------------------

Basic Test Cases

-------------------------------

def test_basic_single_parameter():
# Test single parameter replacement
codeflash_output = _re_compile_path("/api/2.0/experiments/<experiment_id>"); pattern = codeflash_output # 4.79μs -> 4.39μs (9.23% faster)
# Should match and extract experiment_id
match = pattern.match("/api/2.0/experiments/123")

def test_basic_multiple_parameters():
# Test multiple parameter replacement
codeflash_output = _re_compile_path("/api//experiments/<experiment_id>"); pattern = codeflash_output # 4.88μs -> 4.12μs (18.4% faster)
match = pattern.match("/api/2.0/experiments/456")

def test_basic_no_parameters():
# Test path with no parameters
codeflash_output = _re_compile_path("/api/experiments/list"); pattern = codeflash_output # 3.17μs -> 2.53μs (25.5% faster)
match = pattern.match("/api/experiments/list")

def test_basic_parameter_at_start():
# Parameter at the start of the path
codeflash_output = _re_compile_path("/api"); pattern = codeflash_output # 4.63μs -> 3.98μs (16.3% faster)
match = pattern.match("main/api")

def test_basic_parameter_at_end():
# Parameter at the end of the path
codeflash_output = _re_compile_path("/api/"); pattern = codeflash_output # 4.36μs -> 3.66μs (19.1% faster)
match = pattern.match("/api/xyz")

-------------------------------

Edge Test Cases

-------------------------------

def test_edge_empty_path():
# Empty string path
codeflash_output = _re_compile_path(""); pattern = codeflash_output # 3.09μs -> 2.16μs (43.2% faster)
match = pattern.match("")

def test_edge_only_parameter():
# Path is only parameter
codeflash_output = _re_compile_path(""); pattern = codeflash_output # 4.19μs -> 3.69μs (13.6% faster)
match = pattern.match("foo")

def test_edge_adjacent_parameters():
# Adjacent parameters without separator
codeflash_output = _re_compile_path("/api/"); pattern = codeflash_output # 4.52μs -> 4.16μs (8.64% faster)
match = pattern.match("/api/123abc")

def test_edge_parameter_with_special_chars():
# Parameter name contains special characters (should be replaced)
codeflash_output = _re_compile_path("/api/<id_1$>/foo"); pattern = codeflash_output # 4.79μs -> 3.99μs (20.2% faster)
match = pattern.match("/api/val/foo")

def test_edge_parameter_with_slash_in_value():
# Should not match if parameter value contains slash
codeflash_output = _re_compile_path("/api/"); pattern = codeflash_output # 4.52μs -> 3.68μs (22.8% faster)

def test_edge_multiple_same_parameter_names():
# Multiple parameters with the same name (should not matter for regex)
codeflash_output = _re_compile_path("//"); pattern = codeflash_output # 5.11μs -> 4.31μs (18.6% faster)
match = pattern.match("/123/456")

def test_edge_nested_angle_brackets():
# Angle brackets inside parameter name (should treat as literal)
codeflash_output = _re_compile_path("/api/<id>"); pattern = codeflash_output # 4.96μs -> 4.08μs (21.7% faster)
match = pattern.match("/api/bar")

def test_edge_parameter_with_empty_brackets():
# Empty angle brackets, should be replaced
codeflash_output = _re_compile_path("/api/<>"); pattern = codeflash_output # 4.05μs -> 3.31μs (22.4% faster)
match = pattern.match("/api/val")

def test_edge_parameter_with_spaces():
# Parameter name with spaces
codeflash_output = _re_compile_path("/api/"); pattern = codeflash_output # 4.58μs -> 3.63μs (26.2% faster)
match = pattern.match("/api/hello")

def test_edge_parameter_with_unicode():
# Parameter name with unicode characters
codeflash_output = _re_compile_path("/api/<имя>"); pattern = codeflash_output # 5.24μs -> 4.73μs (10.9% faster)
match = pattern.match("/api/значение")

def test_edge_parameter_with_dot():
# Parameter name with dot
codeflash_output = _re_compile_path("/api/<id.value>"); pattern = codeflash_output # 4.39μs -> 3.82μs (15.0% faster)
match = pattern.match("/api/abc.def")

def test_edge_parameter_with_dash():
# Parameter name with dash
codeflash_output = _re_compile_path("/api/"); pattern = codeflash_output # 4.27μs -> 3.72μs (14.7% faster)
match = pattern.match("/api/abc-def")

def test_edge_malformed_angle_brackets():
# Unclosed angle bracket
codeflash_output = _re_compile_path("/api/<id"); pattern = codeflash_output # 4.33μs -> 3.59μs (20.4% faster)
match = pattern.match("/api/<id")

# Extra closing bracket
codeflash_output = _re_compile_path("/api/id>"); pattern2 = codeflash_output # 1.49μs -> 1.07μs (40.0% faster)
match2 = pattern2.match("/api/id>")

def test_edge_parameter_with_regex_like_name():
# Parameter name that looks like a regex
codeflash_output = _re_compile_path("/api/<id.*>"); pattern = codeflash_output # 4.45μs -> 3.70μs (20.1% faster)
match = pattern.match("/api/abc123")

-------------------------------

Large Scale Test Cases

-------------------------------

def test_large_many_parameters():
# Path with many parameters (up to 1000)
param_count = 1000
path = "/" + "/".join([f"<p{i}>" for i in range(param_count)])
codeflash_output = _re_compile_path(path); pattern = codeflash_output # 86.5μs -> 84.8μs (2.06% faster)
expected_pattern = "/" + "/".join(["([^/]+)" for _ in range(param_count)])
# Build a matching string
match_str = "/" + "/".join([str(i) for i in range(param_count)])
match = pattern.match(match_str)

def test_large_long_path_no_parameters():
# Very long path with no parameters
path = "/" + "/".join(["segment"] * 1000)
codeflash_output = _re_compile_path(path); pattern = codeflash_output # 9.34μs -> 9.07μs (2.95% faster)
match = pattern.match(path)

def test_large_long_parameter_names():
# Path with very long parameter names
path = "/" + "/".join([f"<{'a'*100}>" for _ in range(10)])
codeflash_output = _re_compile_path(path); pattern = codeflash_output # 6.40μs -> 5.81μs (10.2% faster)
expected_pattern = "/" + "/".join(["([^/]+)" for _ in range(10)])
match_str = "/" + "/".join([str(i) for i in range(10)])
match = pattern.match(match_str)

def test_large_parameters_with_special_chars():
# Many parameters with special characters in names
path = "/" + "/".join([f"<p{i}_$#@!>" for i in range(50)])
codeflash_output = _re_compile_path(path); pattern = codeflash_output # 10.0μs -> 9.84μs (1.92% faster)
expected_pattern = "/" + "/".join(["([^/]+)" for _ in range(50)])
match_str = "/" + "/".join([str(i) for i in range(50)])
match = pattern.match(match_str)

To edit these changes git checkout codeflash/optimize-_re_compile_path-mhup2nm5 and push.

Codeflash Static Badge

The optimization achieves a **25% speedup** by pre-compiling the regex pattern used for string substitution, moving it from function-local to module-level scope.

**Key Change:**
- **Pre-compiled regex pattern**: `_ANGLE_BRACKET_PATTERN = re.compile(r"<([^>]+)>")` is now compiled once at module import time instead of being compiled on every function call.

**Why This Works:**
In the original code, `re.sub(r"<([^>]+)>", ...)` internally compiles the regex pattern on every function call. The `re` module caches compiled patterns, but this still involves hash lookups and cache management overhead. By pre-compiling the pattern and using `_ANGLE_BRACKET_PATTERN.sub()`, we eliminate this repetitive compilation cost entirely.

**Performance Analysis:**
The line profiler shows the per-hit time dropping from 176,860ns to 166,608ns (6% improvement per call), and with 1,484 hits, this compounds to the observed 25% total speedup. The optimization is most effective for:

- **Paths without parameters** (43.2% faster on empty paths)
- **Simple paths** (25-40% improvements on basic cases)
- **High-frequency calls** (32.5% faster when called 1000 times in a loop)

**Impact on Workloads:**
This function appears to be part of MLflow's authentication/routing system for converting URL templates to regex patterns. Any authentication middleware or request routing that processes many URL patterns will benefit significantly from this optimization, especially in high-throughput scenarios where the same path patterns are repeatedly compiled.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 11, 2025 14:56
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant