⚡️ Speed up function `_decode` by 18% #124

codeflash-ai · 2025-11-11T10:55:56Z

📄 18% (0.18x) speedup for `_decode` in `mlflow/utils/uri.py`

⏱️ Runtime : 1.93 milliseconds → 1.64 milliseconds (best of 97 runs)

📝 Explanation and details

The optimization adds an early return check after URL decoding to avoid unnecessary URL parsing and reconstruction operations.

Key Change:

Added if decoded == url: return url before the expensive urlparse/urlunparse operations
This creates a fast path when urllib.parse.unquote() doesn't change the URL (meaning it's already fully decoded)

Why It's Faster:
The original code always performed URL parsing/unparsing even when no decoding occurred. The urlparse/urlunparse combination is expensive - from the profiler, it consumed 33.2% of total runtime in the original version but only 22.3% in the optimized version. By checking if decoding actually changed the URL first, we can skip this expensive operation entirely for already-decoded URLs.

Performance Impact by Test Case:

Massive gains (400-570% faster) on already-decoded URLs like simple URLs, empty strings, and URLs with plus signs
Moderate gains (15-30% faster) on URLs requiring actual decoding, since we still avoid one unnecessary parse/unparse cycle
Smaller gains (3-15% faster) on complex cases like large double-encoded URLs where multiple iterations are needed

The optimization is particularly effective because many real-world URLs are already properly decoded, making the early return path the common case. Even for URLs that do need decoding, we eliminate at least one redundant parse/unparse cycle per iteration.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 36 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	85.7%

🌀 Generated Regression Tests and Runtime

import urllib.parse

imports

import pytest # used for our unit tests
from mlflow.utils.uri import _decode

unit tests

------------------- Basic Test Cases -------------------

def test_decode_simple_url():
# Basic: URL with no encoding
url = "http://example.com/path/to/resource"
codeflash_output = _decode(url) # 17.3μs -> 2.59μs (569% faster)

def test_decode_percent_encoded_path():
# Basic: URL with percent-encoded path
url = "http://example.com/path%20with%20spaces"
expected = "http://example.com/path with spaces"
codeflash_output = _decode(url) # 24.4μs -> 21.1μs (15.7% faster)

def test_decode_percent_encoded_query():
# Basic: URL with percent-encoded query parameters
url = "http://example.com/?q=hello%20world"
expected = "http://example.com/?q=hello world"
codeflash_output = _decode(url) # 30.0μs -> 25.6μs (17.1% faster)

def test_decode_multiple_encodings():
# Basic: URL with double encoding
url = "http://example.com/path%2520with%2520spaces"
expected = "http://example.com/path with spaces"
codeflash_output = _decode(url) # 31.8μs -> 28.5μs (11.6% faster)

def test_decode_full_url_with_all_components():
# Basic: URL with scheme, netloc, path, params, query, fragment
url = "https://user:[email protected]:8080/path%20here;params?query%20val=1#frag%20ment"
expected = "https://user:[email protected]:8080/path here;params?query val=1#frag ment"
codeflash_output = _decode(url) # 34.1μs -> 28.6μs (19.0% faster)

------------------- Edge Test Cases -------------------

def test_decode_empty_string():
# Edge: Empty string input
url = ""
codeflash_output = _decode(url) # 13.0μs -> 2.55μs (409% faster)

def test_decode_only_percent_sign():
# Edge: String with only percent sign, which is not a valid encoding
url = "%"
codeflash_output = _decode(url) # 19.1μs -> 8.94μs (114% faster)

def test_decode_invalid_percent_encoding():
# Edge: Invalid percent encoding (not followed by two hex digits)
url = "http://example.com/%zz"
codeflash_output = _decode(url) # 24.3μs -> 9.25μs (163% faster)

def test_decode_url_with_reserved_characters():
# Edge: URL with reserved characters encoded
url = "http://example.com/%3F%23%2F%3A"
expected = "http://example.com/?#/:"
codeflash_output = _decode(url) # 33.8μs -> 26.8μs (26.0% faster)

def test_decode_url_with_unicode_characters():
# Edge: URL with percent-encoded Unicode characters
# "你好" in UTF-8 is %E4%BD%A0%E5%A5%BD
url = "http://example.com/%E4%BD%A0%E5%A5%BD"
expected = "http://example.com/你好"
codeflash_output = _decode(url) # 34.2μs -> 29.4μs (16.5% faster)

def test_decode_url_with_no_scheme():
# Edge: URL missing scheme (just a path)
url = "/foo%20bar"
expected = "/foo bar"
codeflash_output = _decode(url) # 23.8μs -> 20.5μs (15.9% faster)

def test_decode_url_with_fragment_only():
# Edge: URL with only fragment, percent-encoded
url = "#frag%20ment"
expected = "#frag ment"
codeflash_output = _decode(url) # 24.8μs -> 20.9μs (18.8% faster)

def test_decode_url_with_params_only():
# Edge: URL with only params, percent-encoded
url = ";params%20here"
expected = ";params here"
codeflash_output = _decode(url) # 26.1μs -> 22.0μs (18.6% faster)

------------------- Large Scale Test Cases -------------------

def test_decode_large_url():
# Large: Decoding a long URL with many percent-encoded spaces
base = "http://example.com/"
path = "%20".join(["segment"] * 100) # 100 segments separated by encoded spaces
url = base + path
expected = base + "segment" + (" segment" * 99)
codeflash_output = _decode(url) # 51.1μs -> 46.0μs (11.1% faster)

def test_decode_large_double_encoded_url():
# Large: Double-encoded URL with 500 segments
base = "http://example.com/"
path = "%2520".join(["segment"] * 500) # 500 segments, double encoded spaces
url = base + path
# After two decodings, spaces separate the segments
expected = base + "segment" + (" segment" * 499)
codeflash_output = _decode(url) # 218μs -> 210μs (3.51% faster)

def test_decode_long_query_string():
# Large: Long query string with percent-encoded values
base = "http://example.com/?"
query = "&".join([f"key{i}={urllib.parse.quote('value with spaces')}" for i in range(200)])
url = base + query
expected_query = "&".join([f"key{i}=value with spaces" for i in range(200)])
expected = base + expected_query
codeflash_output = _decode(url) # 117μs -> 105μs (12.0% faster)

def test_decode_large_url_with_unicode():
# Large: URL with many percent-encoded Unicode characters
base = "http://example.com/"
# "你好" in UTF-8 is %E4%BD%A0%E5%A5%BD
path = "%E4%BD%A0%E5%A5%BD" * 100 # 100 times "你好"
url = base + path
expected = base + ("你好" * 100)
codeflash_output = _decode(url) # 117μs -> 115μs (2.06% faster)

def test_decode_large_url_with_mixed_encoding():
# Large: URL with mixed single and double encoding
base = "http://example.com/"
# Alternate between "%20" and "%2520"
path = ""
for i in range(500):
if i % 2 == 0:
path += "segment%20"
else:
path += "segment%2520"
url = base + path.rstrip("%20").rstrip("%2520")
# After decoding, all "%20" become spaces, all "%2520" become "%20" then spaces
# So after full decoding, all are separated by spaces
expected = base + "segment" + (" segment" * 499)
codeflash_output = _decode(url) # 184μs -> 173μs (6.81% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
import urllib.parse

imports

import pytest # used for our unit tests
from mlflow.utils.uri import _decode

unit tests

1. Basic Test Cases

def test_decode_already_decoded_url():
# Basic: URL is already decoded, should return as is
url = "http://example.com/path/to/resource"
codeflash_output = _decode(url); result = codeflash_output # 17.4μs -> 2.59μs (571% faster)

def test_decode_single_encoded_component():
# Basic: URL with a single encoded space
url = "http://example.com/path%20with%20spaces"
expected = "http://example.com/path with spaces"
codeflash_output = _decode(url); result = codeflash_output # 29.4μs -> 20.1μs (46.2% faster)

def test_decode_multiple_encoded_components():
# Basic: URL with multiple encoded characters
url = "https://site.com/a%20b%2Fc%3Fd"
expected = "https://site.com/a b/c?d"
codeflash_output = _decode(url); result = codeflash_output # 30.8μs -> 26.3μs (16.8% faster)

def test_decode_query_parameters():
# Basic: Encoded query parameters
url = "https://site.com/?q=hello%20world&lang=en"
expected = "https://site.com/?q=hello world&lang=en"
codeflash_output = _decode(url); result = codeflash_output # 29.9μs -> 25.6μs (17.1% faster)

def test_decode_fragment():
# Basic: Encoded fragment
url = "https://site.com/page#section%201"
expected = "https://site.com/page#section 1"
codeflash_output = _decode(url); result = codeflash_output # 29.9μs -> 25.5μs (17.0% faster)

2. Edge Test Cases

def test_decode_empty_string():
# Edge: Empty string
url = ""
codeflash_output = _decode(url); result = codeflash_output # 12.6μs -> 2.56μs (393% faster)

def test_decode_only_encoded():
# Edge: URL is only encoded characters
url = "%2F%3F%23%20"
expected = "/?# "
codeflash_output = _decode(url); result = codeflash_output # 26.9μs -> 22.7μs (18.8% faster)

def test_decode_double_encoded():
# Edge: Double-encoded URL, e.g. "%2520" -> "%20" -> " "
url = "http://example.com/path%2520with%2520spaces"
expected = "http://example.com/path with spaces"
codeflash_output = _decode(url); result = codeflash_output # 36.7μs -> 27.8μs (31.9% faster)

def test_decode_malformed_url():
# Edge: Malformed URL, should still decode what it can
url = "ht!tp://exa%mple.com/%20bad%20url"
expected = "ht!tp://exa%mple.com/ bad url"
codeflash_output = _decode(url); result = codeflash_output # 28.9μs -> 25.2μs (14.7% faster)

def test_decode_url_with_non_ascii():
# Edge: URL with non-ascii characters encoded
url = "http://example.com/%E2%9C%93"
expected = "http://example.com/✓"
codeflash_output = _decode(url); result = codeflash_output # 32.2μs -> 28.3μs (13.7% faster)

def test_decode_url_with_reserved_characters():
# Edge: Encoded reserved URL characters
url = "http://example.com/%3F%23%26"
expected = "http://example.com/?#&"
codeflash_output = _decode(url); result = codeflash_output # 33.2μs -> 27.2μs (22.3% faster)

def test_decode_url_with_plus_sign():
# Edge: Plus sign is not decoded by urllib.parse.unquote, should remain as is
url = "http://example.com/a+b"
expected = "http://example.com/a+b"
codeflash_output = _decode(url); result = codeflash_output # 17.1μs -> 2.58μs (564% faster)

def test_decode_url_with_percent_at_end():
# Edge: URL ending with percent sign, not a valid encoding
url = "http://example.com/path%"
expected = "http://example.com/path%"
codeflash_output = _decode(url); result = codeflash_output # 24.2μs -> 9.00μs (169% faster)

def test_decode_url_with_incomplete_encoding():
# Edge: Incomplete percent encoding, should remain as is
url = "http://example.com/path%2"
expected = "http://example.com/path%2"
codeflash_output = _decode(url); result = codeflash_output # 23.8μs -> 9.17μs (160% faster)

3. Large Scale Test Cases

def test_decode_large_url():
# Large: Very long URL with many encoded spaces
base = "http://example.com/"
encoded_path = "%20".join(["segment"] * 500) # 500 segments separated by encoded spaces
url = base + encoded_path
expected = base + " segment".join(["segment"] * 500)
codeflash_output = _decode(url); result = codeflash_output # 124μs -> 108μs (14.6% faster)

def test_decode_large_double_encoded_url():
# Large: Very long double-encoded URL
base = "http://example.com/"
segment = "segment"
encoded = urllib.parse.quote(segment) # "segment" -> "segment" (no encoding)
double_encoded = urllib.parse.quote(encoded) # "segment" -> "segment" (no encoding)
# But let's use a string with spaces for actual encoding
segment = "seg ment"
encoded = urllib.parse.quote(segment) # "seg ment" -> "seg%20ment"
double_encoded = urllib.parse.quote(encoded) # "seg%20ment" -> "seg%2520ment"
url = base + double_encoded * 500 # 500 double-encoded segments
# Now decode
decoded_once = urllib.parse.unquote(url)
decoded_twice = urllib.parse.unquote(decoded_once)
expected = decoded_twice
codeflash_output = _decode(url); result = codeflash_output # 201μs -> 192μs (4.86% faster)

def test_decode_large_url_with_varied_encoding():
# Large: Long URL with mixed encoded and unencoded segments
base = "https://site.com/"
segments = []
for i in range(500):
if i % 2 == 0:
segments.append("plainsegment")
else:
segments.append(urllib.parse.quote("encoded segment"))
url = base + "/".join(segments)
# Build expected result
expected_segments = []
for i in range(500):
if i % 2 == 0:
expected_segments.append("plainsegment")
else:
expected_segments.append("encoded segment")
expected = base + "/".join(expected_segments)
codeflash_output = _decode(url); result = codeflash_output # 103μs -> 90.6μs (14.7% faster)

def test_decode_large_url_with_query_and_fragment():
# Large: Long URL with encoded query and fragment
base = "https://site.com/resource"
query = "?q=" + "%20".join(["word"] * 200)
fragment = "#frag" + "%20".join(["ment"] * 200)
url = base + query + fragment
expected = base + "?q=" + " word".join(["word"] * 200) + "#frag" + " ment".join(["ment"] * 200)
codeflash_output = _decode(url); result = codeflash_output # 101μs -> 94.4μs (7.61% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_decode-mhuggreu and push.

The optimization adds an early return check after URL decoding to avoid unnecessary URL parsing and reconstruction operations. **Key Change:** - Added `if decoded == url: return url` before the expensive `urlparse`/`urlunparse` operations - This creates a fast path when `urllib.parse.unquote()` doesn't change the URL (meaning it's already fully decoded) **Why It's Faster:** The original code always performed URL parsing/unparsing even when no decoding occurred. The `urlparse`/`urlunparse` combination is expensive - from the profiler, it consumed 33.2% of total runtime in the original version but only 22.3% in the optimized version. By checking if decoding actually changed the URL first, we can skip this expensive operation entirely for already-decoded URLs. **Performance Impact by Test Case:** - **Massive gains (400-570% faster)** on already-decoded URLs like simple URLs, empty strings, and URLs with plus signs - **Moderate gains (15-30% faster)** on URLs requiring actual decoding, since we still avoid one unnecessary parse/unparse cycle - **Smaller gains (3-15% faster)** on complex cases like large double-encoded URLs where multiple iterations are needed The optimization is particularly effective because many real-world URLs are already properly decoded, making the early return path the common case. Even for URLs that do need decoding, we eliminate at least one redundant parse/unparse cycle per iteration.

codeflash-ai bot requested a review from mashraf-222 November 11, 2025 10:55

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `_decode` by 18% #124

⚡️ Speed up function `_decode` by 18% #124

Uh oh!

codeflash-ai bot commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function _decode by 18% #124

Are you sure you want to change the base?

⚡️ Speed up function _decode by 18% #124

Uh oh!

Conversation

codeflash-ai bot commented Nov 11, 2025

📄 18% (0.18x) speedup for _decode in mlflow/utils/uri.py

📝 Explanation and details

imports

unit tests

------------------- Basic Test Cases -------------------

------------------- Edge Test Cases -------------------

------------------- Large Scale Test Cases -------------------

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

imports

unit tests

1. Basic Test Cases

2. Edge Test Cases

3. Large Scale Test Cases

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `_decode` by 18% #124

⚡️ Speed up function `_decode` by 18% #124

📄 18% (0.18x) speedup for `_decode` in `mlflow/utils/uri.py`