⚡️ Speed up method Qwen25Detector.build_ebnf by 85%
#340
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 85% (0.85x) speedup for
Qwen25Detector.build_ebnfinpython/sglang/srt/function_call/qwen25_detector.py⏱️ Runtime :
6.42 milliseconds→3.46 milliseconds(best of250runs)📝 Explanation and details
The optimized code achieves an 85% speedup by eliminating the quadratic complexity in handling optional function parameters. The original implementation generated all possible ordering permutations for optional parameters using nested loops that created O(n²) string concatenations, while the optimized version uses a simpler linear approach.
Key Optimizations Applied:
Eliminated Quadratic Optional Parameter Processing: The original code used nested loops (
for i in range(len(optional))andfor j in range(i, len(optional))) to generate all permutation alternatives for optional parameters, creating 25,918 inner loop iterations in the profiled case. The optimized version replaces this with a simple list comprehension that wraps each optional parameter individually as( {pair} )?, reducing complexity from O(n²) to O(n).Reduced String Operations: Pre-computed formatted key-value pairs are stored in a list of tuples
[(prop_name, formatted_pair)]instead of a dictionary, eliminating repeated dictionary lookups and format operations in nested loops. This reduces the expensive.format()calls from being repeated in the quadratic loop structure.Streamlined Data Structures: The optimization separates required and optional properties using a single pass through the pre-computed pairs, avoiding the original approach of building intermediate lists through list comprehensions with membership tests.
Performance Analysis from Line Profiler:
get_value_rulecalls) but eliminates the quadratic string building entirelyTest Case Impact:
The optimization particularly excels with functions having many optional parameters - showing 1045% speedup for 200 optional parameters and 226% speedup for 50 optional parameters. Functions with only required parameters see modest 6-18% improvements due to the cleaner data flow, while edge cases with no parameters remain largely unchanged.
This optimization is especially valuable for LLM function calling scenarios where tools commonly have multiple optional parameters, significantly improving EBNF grammar generation latency without changing the generated grammar's functionality.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
from typing import List
imports
import pytest
from sglang.srt.function_call.qwen25_detector import Qwen25Detector
Minimal Tool class for testing (mimics sglang.srt.entrypoints.openai.protocol.Tool)
class Function:
def init(self, name, parameters=None):
self.name = name
self.parameters = parameters or {}
class Tool:
def init(self, function):
self.function = function
from sglang.srt.function_call.qwen25_detector import Qwen25Detector
------------------- UNIT TESTS -------------------
1. Basic Test Cases
def test_single_function_no_params():
# Test a single tool with no parameters
tool = Tool(Function("hello"))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 8.22μs -> 7.56μs (8.76% faster)
def test_single_function_one_required_param():
# Tool with one required string param
tool = Tool(Function("greet", {
"properties": {
"name": {"type": "string"}
},
"required": ["name"]
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 12.2μs -> 10.7μs (13.9% faster)
def test_single_function_required_and_optional_param():
# Tool with required and optional param
tool = Tool(Function("weather", {
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 17.2μs -> 15.3μs (12.4% faster)
def test_multiple_tools():
# Two tools
tools = [
Tool(Function("foo")),
Tool(Function("bar", {
"properties": {
"x": {"type": "number"}
},
"required": ["x"]
}))
]
codeflash_output = Qwen25Detector().build_ebnf(tools); ebnf = codeflash_output # 15.1μs -> 13.5μs (11.6% faster)
def test_enum_param():
# Enum param
tool = Tool(Function("choose", {
"properties": {
"color": {"type": "string", "enum": ["red", "green", "blue"]}
},
"required": ["color"]
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 12.7μs -> 11.3μs (12.3% faster)
2. Edge Test Cases
def test_no_tools():
# No tools: should still produce a root rule but function_call ::= (empty)
codeflash_output = Qwen25Detector().build_ebnf([]); ebnf = codeflash_output # 3.62μs -> 3.79μs (4.72% slower)
def test_all_optional_params():
# All optional params
tool = Tool(Function("optfunc", {
"properties": {
"a": {"type": "string"},
"b": {"type": "number"}
}
# no required
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 16.3μs -> 13.7μs (18.3% faster)
def test_large_enum_param():
# Enum with many values
enum_values = [str(i) for i in range(20)]
tool = Tool(Function("pick", {
"properties": {
"num": {"type": "string", "enum": enum_values}
},
"required": ["num"]
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 15.3μs -> 14.0μs (9.51% faster)
# All enum values present
for v in enum_values:
pass
def test_param_types():
# All supported types
tool = Tool(Function("types", {
"properties": {
"s": {"type": "string"},
"n": {"type": "number"},
"i": {"type": "integer"},
"b": {"type": "boolean"},
"nil": {"type": "null"},
"arr": {"type": "array"},
"obj": {"type": "object"}
},
"required": ["s", "n", "i", "b", "nil", "arr", "obj"]
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 20.0μs -> 18.6μs (7.56% faster)
def test_param_order_preserved():
# Params order should be preserved in grammar
tool = Tool(Function("order", {
"properties": {
"first": {"type": "string"},
"second": {"type": "string"},
"third": {"type": "string"}
},
"required": ["first", "second", "third"]
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 14.7μs -> 13.9μs (6.44% faster)
# The order in arguments_order ::= should match: first, second, third
idx_first = ebnf.index('"\"first\""')
idx_second = ebnf.index('"\"second\""')
idx_third = ebnf.index('"\"third\""')
def test_separator_and_wrapping():
# Should use the correct separator and wrapping tokens
tool = Tool(Function("wrap", {
"properties": {
"a": {"type": "string"}
},
"required": ["a"]
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 11.6μs -> 10.4μs (11.6% faster)
def test_optional_param_grouping():
# Multiple optional params: grouping and alternation
tool = Tool(Function("optgroup", {
"properties": {
"x": {"type": "string"},
"y": {"type": "number"},
"z": {"type": "boolean"}
}
# all optional
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 17.0μs -> 14.1μs (20.5% faster)
def test_required_and_multiple_optional():
# Required param with multiple optional params
tool = Tool(Function("mix", {
"properties": {
"req": {"type": "string"},
"opt1": {"type": "number"},
"opt2": {"type": "boolean"}
},
"required": ["req"]
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 16.9μs -> 14.7μs (14.8% faster)
# Should have required param first, then optional group
idx_req = ebnf.index('"\"req\""')
idx_opt1 = ebnf.index('"\"opt1\""')
idx_opt2 = ebnf.index('"\"opt2\""')
# Should allow missing opt1 and/or opt2
3. Large Scale Test Cases
def test_many_tools():
# 100 tools, each with one required param
tools = [
Tool(Function(f"func{i}", {
"properties": {f"param{i}": {"type": "string"}},
"required": [f"param{i}"]
}))
for i in range(100)
]
codeflash_output = Qwen25Detector().build_ebnf(tools); ebnf = codeflash_output # 287μs -> 243μs (18.0% faster)
# Should contain all call_funcX and arguments_funcX rules
for i in range(100):
pass
def test_large_number_of_params():
# One tool with 50 required params
properties = {f"p{i}": {"type": "string"} for i in range(50)}
required = [f"p{i}" for i in range(50)]
tool = Tool(Function("bigfunc", {
"properties": properties,
"required": required
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 63.5μs -> 59.4μs (6.78% faster)
# Should contain all params in arguments_bigfunc
for i in range(50):
pass
def test_large_number_of_optional_params():
# One tool with 50 optional params
properties = {f"opt{i}": {"type": "number"} for i in range(50)}
tool = Tool(Function("optfunc", {
"properties": properties
# no required
}))
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 191μs -> 58.7μs (226% faster)
# Should contain alternation for all 50 optional params
for i in range(50):
pass
def test_scalability_limit():
# 200 tools, each with 5 required params
tools = []
for i in range(200):
props = {f"p{j}": {"type": "string"} for j in range(5)}
req = [f"p{j}" for j in range(5)]
tools.append(Tool(Function(f"tool{i}", {
"properties": props,
"required": req
})))
codeflash_output = Qwen25Detector().build_ebnf(tools); ebnf = codeflash_output # 1.38ms -> 1.27ms (8.03% faster)
# Should contain call_toolX and arguments_toolX for all tools
for i in range(200):
for j in range(5):
pass
def test_performance_large_grammar(monkeypatch):
# Stress test: 500 params, all required
properties = {f"x{i}": {"type": "number"} for i in range(500)}
required = [f"x{i}" for i in range(500)]
tool = Tool(Function("huge", {
"properties": properties,
"required": required
}))
# Patch out EBNFComposer to measure performance
import time
start = time.time()
codeflash_output = Qwen25Detector().build_ebnf([tool]); ebnf = codeflash_output # 496μs -> 478μs (3.66% faster)
elapsed = time.time() - start
# Should contain all param rules
for i in range(500):
pass
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Any, Dict, List, Literal, Optional
imports
import pytest
from sglang.srt.function_call.qwen25_detector import Qwen25Detector
Minimal Tool/function mock to simulate OpenAI Tool objects
class Function:
def init(self, name, parameters=None):
self.name = name
self.parameters = parameters or {}
class Tool:
def init(self, function):
self.function = function
class BaseFormatDetector:
def init(self):
self._buffer = ""
self.prev_tool_call_arr = []
self.current_tool_id = -1
self.current_tool_name_sent = False
self.streamed_args_for_tool = []
self.bot_token = ""
self.eot_token = ""
self.tool_call_separator = ", "
from sglang.srt.function_call.qwen25_detector import Qwen25Detector
--- End of Qwen25Detector and dependencies ---
-----------------
UNIT TESTS
-----------------
@pytest.fixture
def detector():
return Qwen25Detector()
1. BASIC TEST CASES
def test_single_function_no_params(detector):
# Function with no parameters
tool = Tool(Function("hello", parameters={"properties": {}}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 9.24μs -> 8.25μs (12.0% faster)
def test_single_function_one_required_param(detector):
# Function with one required parameter
tool = Tool(Function("weather", parameters={
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 13.0μs -> 11.7μs (10.9% faster)
def test_single_function_one_optional_param(detector):
# Function with one optional parameter
tool = Tool(Function("greet", parameters={
"properties": {
"name": {"type": "string"}
}
# no "required"
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 13.3μs -> 11.6μs (14.6% faster)
def test_single_function_required_and_optional(detector):
# Function with required and optional parameters
tool = Tool(Function("sum", parameters={
"properties": {
"a": {"type": "number"},
"b": {"type": "number"},
"verbose": {"type": "boolean"}
},
"required": ["a", "b"]
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 16.9μs -> 15.4μs (9.70% faster)
# Requireds should be before optionals
req_idx = ebnf.index('\"a\"')
opt_idx = ebnf.index('\"verbose\"')
def test_multiple_functions(detector):
# Multiple functions
tool1 = Tool(Function("foo", parameters={"properties": {"x": {"type": "number"}}, "required": ["x"]}))
tool2 = Tool(Function("bar", parameters={"properties": {"y": {"type": "string"}}, "required": ["y"]}))
codeflash_output = detector.build_ebnf([tool1, tool2]); ebnf = codeflash_output # 17.1μs -> 15.2μs (12.3% faster)
def test_json_format_is_used(detector):
# Should use json format
tool = Tool(Function("baz", parameters={"properties": {"z": {"type": "boolean"}}, "required": ["z"]}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 11.6μs -> 10.3μs (12.7% faster)
2. EDGE TEST CASES
def test_no_tools(detector):
# No tools: should still produce a root rule and function_call rule
codeflash_output = detector.build_ebnf([]); ebnf = codeflash_output # 3.53μs -> 3.72μs (4.98% slower)
def test_function_with_enum_param(detector):
# Enum parameter
tool = Tool(Function("choose", parameters={
"properties": {
"color": {"type": "string", "enum": ["red", "green", "blue"]}
},
"required": ["color"]
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 13.3μs -> 12.4μs (7.26% faster)
def test_function_with_all_optional_params(detector):
# All optional parameters
tool = Tool(Function("opt", parameters={
"properties": {
"foo": {"type": "number"},
"bar": {"type": "string"}
}
# no required
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 16.1μs -> 13.9μs (16.0% faster)
def test_function_with_no_properties(detector):
# Function parameters dict exists but empty
tool = Tool(Function("empty", parameters={"properties": {}}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 8.27μs -> 7.31μs (13.1% faster)
def test_function_with_null_type(detector):
# Function with a null type parameter
tool = Tool(Function("maybe", parameters={
"properties": {
"val": {"type": "null"}
},
"required": ["val"]
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 12.3μs -> 10.9μs (13.3% faster)
def test_function_with_array_and_object(detector):
# Function with array and object types
tool = Tool(Function("complex", parameters={
"properties": {
"arr": {"type": "array"},
"obj": {"type": "object"}
},
"required": ["arr", "obj"]
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 13.6μs -> 12.4μs (9.57% faster)
def test_function_with_integer_type(detector):
# Integer type should map to basic_number
tool = Tool(Function("intfunc", parameters={
"properties": {
"n": {"type": "integer"}
},
"required": ["n"]
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 11.9μs -> 10.8μs (10.2% faster)
def test_function_with_mixed_types(detector):
# Function with all supported types
tool = Tool(Function("alltypes", parameters={
"properties": {
"s": {"type": "string"},
"i": {"type": "integer"},
"n": {"type": "number"},
"b": {"type": "boolean"},
"nul": {"type": "null"},
"a": {"type": "array"},
"o": {"type": "object"}
},
"required": ["s", "i", "n", "b", "nul", "a", "o"]
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 19.9μs -> 18.1μs (9.65% faster)
for t in ["basic_string", "basic_number", "basic_boolean", "basic_null", "basic_array", "basic_object"]:
pass
def test_function_with_no_parameters_field(detector):
# Function with no parameters field at all
tool = Tool(Function("noparams"))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 8.23μs -> 7.54μs (9.12% faster)
3. LARGE SCALE TEST CASES
def test_many_functions(detector):
# 50 functions, each with a single required param
tools = [
Tool(Function(f"func{i}", parameters={
"properties": {f"p{i}": {"type": "string"}},
"required": [f"p{i}"]
}))
for i in range(50)
]
codeflash_output = detector.build_ebnf(tools); ebnf = codeflash_output # 151μs -> 128μs (17.7% faster)
# All function names should be present
for i in range(50):
pass
def test_large_number_of_params(detector):
# Function with 100 params, half required, half optional
properties = {f"key{i}": {"type": "string"} for i in range(100)}
required = [f"key{i}" for i in range(50)]
tool = Tool(Function("bigfunc", parameters={
"properties": properties,
"required": required
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 246μs -> 109μs (125% faster)
# All keys should be present
for i in range(100):
pass
# Should not crash or be missing any required/optional
def test_long_property_names(detector):
# Function with very long property names
long_name = "a" * 200
tool = Tool(Function("long", parameters={
"properties": {
long_name: {"type": "string"}
},
"required": [long_name]
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 13.0μs -> 11.0μs (18.2% faster)
def test_large_enum(detector):
# Function with an enum of 100 values
enum_vals = [f"val{i}" for i in range(100)]
tool = Tool(Function("enumfunc", parameters={
"properties": {
"choice": {"type": "string", "enum": enum_vals}
},
"required": ["choice"]
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 24.1μs -> 22.9μs (5.24% faster)
for v in enum_vals:
pass
def test_many_optional_params(detector):
# Function with 200 optional parameters
properties = {f"opt{i}": {"type": "number"} for i in range(200)}
tool = Tool(Function("manyopts", parameters={
"properties": properties
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 2.27ms -> 198μs (1045% faster)
for i in range(200):
pass
def test_long_function_name(detector):
# Function with a very long name
long_func = "f" * 256
tool = Tool(Function(long_func, parameters={
"properties": {"x": {"type": "string"}},
"required": ["x"]
}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 15.6μs -> 11.9μs (31.2% faster)
def test_large_scale_combined(detector):
# 10 functions, each with 50 parameters (25 required, 25 optional)
tools = []
for i in range(10):
props = {f"p{i}{j}": {"type": "string"} for j in range(50)}
required = [f"p{i}{j}" for j in range(25)]
tools.append(Tool(Function(f"func{i}", parameters={
"properties": props,
"required": required
})))
codeflash_output = detector.build_ebnf(tools); ebnf = codeflash_output # 889μs -> 510μs (74.3% faster)
for i in range(10):
for j in range(50):
pass
Edge: Ensure separator is correct for multiple calls
def test_tool_call_separator(detector):
tool = Tool(Function("foo", parameters={"properties": {"x": {"type": "number"}}, "required": ["x"]}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 13.3μs -> 11.1μs (19.4% faster)
Edge: Ensure individual call tokens are correct
def test_individual_call_tokens(detector):
tool = Tool(Function("foo", parameters={"properties": {"x": {"type": "number"}}, "required": ["x"]}))
codeflash_output = detector.build_ebnf([tool]); ebnf = codeflash_output # 12.1μs -> 10.9μs (11.0% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from sglang.srt.function_call.qwen25_detector import Qwen25Detector
def test_Qwen25Detector_build_ebnf():
Qwen25Detector.build_ebnf(Qwen25Detector(), [])
🔎 Concolic Coverage Tests and Runtime
To edit these changes
git checkout codeflash/optimize-Qwen25Detector.build_ebnf-mhv4s83eand push.