Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 7, 2025

📄 72% (0.72x) speedup for SchedulerOutputProcessorMixin.add_input_logprob_return_values in python/sglang/srt/managers/scheduler_output_processor_mixin.py

⏱️ Runtime : 183 microseconds 106 microseconds (best of 50 runs)

📝 Explanation and details

The optimized code achieves a 72% speedup through several key performance improvements that reduce repeated computations and attribute lookups:

Key Optimizations:

  1. Reduced attribute access overhead: The code caches frequently accessed values like req.origin_input_ids[req.logprob_start_len:] into slice_ids and self.server_args.multi_item_scoring_delimiter into multi_item_delim, avoiding repeated property lookups that are expensive in Python.

  2. Optimized list operations:

    • In _process_input_token_logprobs, the original [None] + input_token_logprobs[:-1] creates two temporary lists and concatenates them. The optimized version uses conditional extension (req.input_token_logprobs_val += input_token_logprobs[:-1]) only when needed.
    • In _calculate_relevant_tokens_len, replaced the generator expression with slice_ids.count(multi_item_delim) which is a native C implementation and much faster for counting operations.
  3. Minimized repeated object creation: Uses local variables (input_top_logprobs_val, input_token_ids_logprobs_val) instead of repeatedly accessing req attributes, then assigns once at the end. This reduces both attribute lookup overhead and potential list reallocation.

  4. Smarter conditional checks: Added existence checks (if temp_val and temp_idx:, if input_top_logprobs_val:) to avoid unnecessary operations on empty lists.

  5. Cached computation in main function: The add_input_logprob_return_values function caches req.input_token_logprobs to a local variable to avoid repeated attribute access during the extend operation.

Performance Impact by Test Case:

  • Large-scale operations see the biggest gains: The test_large_scale_regular_request shows 89% speedup and test_large_scale_multi_item_scoring shows 154% speedup, indicating these optimizations are particularly effective for high-throughput scenarios.
  • Multi-item scoring benefits significantly: Tests like test_basic_multi_item_scoring (18.9% faster) benefit from the optimized counting and list comprehension operations.
  • Some small overhead on simple cases: A few basic tests show minor slowdowns (4-6%), likely due to the additional local variable assignments, but this is vastly outweighed by gains on realistic workloads.

The optimizations are especially valuable for logprob processing in language model inference pipelines, where these functions are called frequently during token generation and the input sizes can be substantial.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 9 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 95.6%
🌀 Generated Regression Tests and Runtime
import pytest
import torch
from sglang.srt.managers.scheduler_output_processor_mixin import \
    SchedulerOutputProcessorMixin

# --- Minimal stubs for dependencies ---

class DummyServerArgs:
    def __init__(self, multi_item_scoring_delimiter=None):
        self.multi_item_scoring_delimiter = multi_item_scoring_delimiter

class DummyModelConfig:
    def __init__(self, vocab_size=32000):
        self.vocab_size = vocab_size

class DummyReq:
    def __init__(
        self,
        origin_input_ids,
        logprob_start_len,
        is_prefill_only=False,
        return_logprob=True,
        top_logprobs_num=0,
        token_ids_logprob=None,
    ):
        self.origin_input_ids = origin_input_ids
        self.logprob_start_len = logprob_start_len
        self.is_prefill_only = is_prefill_only
        self.return_logprob = return_logprob
        self.top_logprobs_num = top_logprobs_num
        self.token_ids_logprob = token_ids_logprob

        # Output fields
        self.input_token_logprobs = None
        self.input_token_logprobs_val = None
        self.input_token_logprobs_idx = None

        self.temp_input_top_logprobs_val = None
        self.temp_input_top_logprobs_idx = None
        self.input_top_logprobs_val = None
        self.input_top_logprobs_idx = None

        self.temp_input_token_ids_logprobs_val = None
        self.temp_input_token_ids_logprobs_idx = None
        self.input_token_ids_logprobs_val = None
        self.input_token_ids_logprobs_idx = None

class DummyLogitsProcessorOutput:
    def __init__(
        self,
        input_token_logprobs,
        input_top_logprobs_val=None,
        input_top_logprobs_idx=None,
        input_token_ids_logprobs_val=None,
        input_token_ids_logprobs_idx=None,
    ):
        self.input_token_logprobs = input_token_logprobs
        self.input_top_logprobs_val = input_top_logprobs_val
        self.input_top_logprobs_idx = input_top_logprobs_idx
        self.input_token_ids_logprobs_val = input_token_ids_logprobs_val
        self.input_token_ids_logprobs_idx = input_token_ids_logprobs_idx

class DummyScheduler(SchedulerOutputProcessorMixin):
    def __init__(self, server_args, model_config):
        self.server_args = server_args
        self.model_config = model_config

# --- Unit tests for add_input_logprob_return_values ---

# 1. Basic Test Cases

def test_basic_regular_request():
    # Regular request, no multi-item scoring, single chunk
    server_args = DummyServerArgs(multi_item_scoring_delimiter=None)
    model_config = DummyModelConfig(vocab_size=100)
    scheduler = DummyScheduler(server_args, model_config)
    req = DummyReq(
        origin_input_ids=[10, 20, 30, 40],
        logprob_start_len=1,
        is_prefill_only=False,
        return_logprob=True,
        top_logprobs_num=0,
        token_ids_logprob=None,
    )
    # input_token_logprobs should be a tuple
    output = DummyLogitsProcessorOutput(input_token_logprobs=(0.1, 0.2, 0.3))
    scheduler.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=3,
        last_prefill_chunk=True,
    ) # 7.05μs -> 7.37μs (4.37% slower)

def test_basic_multi_item_scoring():
    # Multi-item scoring, delimiter present
    server_args = DummyServerArgs(multi_item_scoring_delimiter=999)
    model_config = DummyModelConfig(vocab_size=1000)
    scheduler = DummyScheduler(server_args, model_config)
    req = DummyReq(
        origin_input_ids=[1, 999, 2, 999, 3],
        logprob_start_len=1,
        is_prefill_only=True,
        return_logprob=True,
        top_logprobs_num=0,
        token_ids_logprob=None,
    )
    output = DummyLogitsProcessorOutput(input_token_logprobs=(0.5, 0.6))
    scheduler.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=2,
        last_prefill_chunk=True,
    ) # 7.75μs -> 6.51μs (18.9% faster)




def test_multi_item_scoring_no_delimiter():
    # Multi-item scoring enabled but no delimiter tokens present
    server_args = DummyServerArgs(multi_item_scoring_delimiter=999)
    model_config = DummyModelConfig(vocab_size=1000)
    scheduler = DummyScheduler(server_args, model_config)
    req = DummyReq(
        origin_input_ids=[1, 2, 3, 4, 5],
        logprob_start_len=1,
        is_prefill_only=True,
        return_logprob=True,
        top_logprobs_num=0,
        token_ids_logprob=None,
    )
    output = DummyLogitsProcessorOutput(input_token_logprobs=())
    scheduler.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=0,
        last_prefill_chunk=True,
    ) # 7.92μs -> 7.48μs (5.79% faster)


def test_vocab_size_clipping():
    # Token ids exceeding vocab_size - 1 should be clipped to 0
    server_args = DummyServerArgs(multi_item_scoring_delimiter=None)
    model_config = DummyModelConfig(vocab_size=5)
    scheduler = DummyScheduler(server_args, model_config)
    req = DummyReq(
        origin_input_ids=[1, 2, 5, 6],  # 5 and 6 exceed vocab_size-1 (4)
        logprob_start_len=1,
        is_prefill_only=False,
        return_logprob=True,
        top_logprobs_num=0,
        token_ids_logprob=None,
    )
    output = DummyLogitsProcessorOutput(input_token_logprobs=(0.1, 0.2, 0.3))
    scheduler.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=3,
        last_prefill_chunk=True,
    ) # 7.04μs -> 7.53μs (6.44% slower)

def test_retract_behavior():
    # If input_token_logprobs_val is already set, function should return early
    server_args = DummyServerArgs(multi_item_scoring_delimiter=None)
    model_config = DummyModelConfig(vocab_size=10)
    scheduler = DummyScheduler(server_args, model_config)
    req = DummyReq(
        origin_input_ids=[1, 2, 3],
        logprob_start_len=1,
        is_prefill_only=False,
        return_logprob=True,
        top_logprobs_num=1,
        token_ids_logprob=None,
    )
    req.input_token_logprobs_val = [None, 0.1, 0.2]
    output = DummyLogitsProcessorOutput(
        input_token_logprobs=(0.1, 0.2),
        input_top_logprobs_val=[[0.1], [0.2]],
        input_top_logprobs_idx=[[1], [2]],
    )
    scheduler.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=2,
        last_prefill_chunk=True,
    ) # 1.67μs -> 1.62μs (2.84% faster)

def test_token_ids_logprob_none():
    # token_ids_logprob is None, should not set input_token_ids_logprobs_val
    server_args = DummyServerArgs(multi_item_scoring_delimiter=None)
    model_config = DummyModelConfig(vocab_size=10)
    scheduler = DummyScheduler(server_args, model_config)
    req = DummyReq(
        origin_input_ids=[1, 2, 3],
        logprob_start_len=1,
        is_prefill_only=False,
        return_logprob=True,
        top_logprobs_num=0,
        token_ids_logprob=None,
    )
    output = DummyLogitsProcessorOutput(
        input_token_logprobs=(0.1, 0.2)
    )
    scheduler.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=2,
        last_prefill_chunk=True,
    ) # 6.33μs -> 6.39μs (0.954% slower)

def test_top_logprobs_num_zero():
    # top_logprobs_num == 0, should not set input_top_logprobs_val
    server_args = DummyServerArgs(multi_item_scoring_delimiter=None)
    model_config = DummyModelConfig(vocab_size=10)
    scheduler = DummyScheduler(server_args, model_config)
    req = DummyReq(
        origin_input_ids=[1, 2, 3],
        logprob_start_len=1,
        is_prefill_only=False,
        return_logprob=True,
        top_logprobs_num=0,
        token_ids_logprob=None,
    )
    output = DummyLogitsProcessorOutput(
        input_token_logprobs=(0.1, 0.2)
    )
    scheduler.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=2,
        last_prefill_chunk=True,
    ) # 5.93μs -> 5.93μs (0.034% faster)

# 3. Large Scale Test Cases

def test_large_scale_regular_request():
    # Large regular request, 1000 tokens
    server_args = DummyServerArgs(multi_item_scoring_delimiter=None)
    model_config = DummyModelConfig(vocab_size=10000)
    scheduler = DummyScheduler(server_args, model_config)
    origin_input_ids = list(range(1000))
    req = DummyReq(
        origin_input_ids=origin_input_ids,
        logprob_start_len=0,
        is_prefill_only=False,
        return_logprob=True,
        top_logprobs_num=0,
        token_ids_logprob=None,
    )
    # input_token_logprobs is tuple of 1000 floats
    input_token_logprobs = tuple(float(i) for i in range(1000))
    output = DummyLogitsProcessorOutput(input_token_logprobs=input_token_logprobs)
    scheduler.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=1000,
        last_prefill_chunk=True,
    ) # 63.4μs -> 33.5μs (89.1% faster)

def test_large_scale_multi_item_scoring():
    # Large multi-item scoring, 1000 tokens, delimiter every 10 tokens
    server_args = DummyServerArgs(multi_item_scoring_delimiter=888)
    model_config = DummyModelConfig(vocab_size=2000)
    scheduler = DummyScheduler(server_args, model_config)
    origin_input_ids = [i if i % 10 != 0 else 888 for i in range(1000)]
    req = DummyReq(
        origin_input_ids=origin_input_ids,
        logprob_start_len=0,
        is_prefill_only=True,
        return_logprob=True,
        top_logprobs_num=0,
        token_ids_logprob=None,
    )
    # Only delimiter positions after logprob_start_len get logprobs
    delimiter_count = origin_input_ids.count(888)
    input_token_logprobs = tuple(float(i) for i in range(delimiter_count))
    output = DummyLogitsProcessorOutput(input_token_logprobs=input_token_logprobs)
    scheduler.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=delimiter_count,
        last_prefill_chunk=True,
    ) # 76.2μs -> 30.0μs (154% faster)



#------------------------------------------------
from typing import List, Tuple

# imports
import pytest
import torch
from sglang.srt.managers.scheduler_output_processor_mixin import \
    SchedulerOutputProcessorMixin

# --- Minimal stubs for dependencies ---

class DummyServerArgs:
    def __init__(self, multi_item_scoring_delimiter=None):
        self.multi_item_scoring_delimiter = multi_item_scoring_delimiter

class DummyModelConfig:
    def __init__(self, vocab_size=32000):
        self.vocab_size = vocab_size

class DummyReq:
    def __init__(
        self,
        origin_input_ids,
        logprob_start_len,
        is_prefill_only=False,
        top_logprobs_num=0,
        token_ids_logprob=None,
        return_logprob=True,
    ):
        self.origin_input_ids = origin_input_ids
        self.logprob_start_len = logprob_start_len
        self.is_prefill_only = is_prefill_only
        self.top_logprobs_num = top_logprobs_num
        self.token_ids_logprob = token_ids_logprob
        self.return_logprob = return_logprob

        # These are set/used by the function
        self.input_token_logprobs = None
        self.input_token_logprobs_val = None
        self.input_token_logprobs_idx = None
        self.input_top_logprobs_val = None
        self.input_top_logprobs_idx = None
        self.input_token_ids_logprobs_val = None
        self.input_token_ids_logprobs_idx = None

        self.temp_input_top_logprobs_val = None
        self.temp_input_top_logprobs_idx = None
        self.temp_input_token_ids_logprobs_val = None
        self.temp_input_token_ids_logprobs_idx = None

class DummyLogitsProcessorOutput:
    def __init__(
        self,
        input_token_logprobs,
        input_top_logprobs_val=None,
        input_top_logprobs_idx=None,
        input_token_ids_logprobs_val=None,
        input_token_ids_logprobs_idx=None,
    ):
        self.input_token_logprobs = input_token_logprobs
        self.input_top_logprobs_val = input_top_logprobs_val
        self.input_top_logprobs_idx = input_top_logprobs_idx
        self.input_token_ids_logprobs_val = input_token_ids_logprobs_val
        self.input_token_ids_logprobs_idx = input_token_ids_logprobs_idx

# --- Unit tests ---

@pytest.fixture
def mixin_basic():
    # Basic config: vocab_size=100, delimiter=99
    return SchedulerOutputProcessorMixin(DummyServerArgs(multi_item_scoring_delimiter=99), DummyModelConfig(vocab_size=100))

@pytest.fixture
def mixin_no_delimiter():
    # No multi-item scoring delimiter
    return SchedulerOutputProcessorMixin(DummyServerArgs(multi_item_scoring_delimiter=None), DummyModelConfig(vocab_size=100))

# --- Basic Test Cases ---

def test_basic_regular_request(mixin_no_delimiter):
    # Regular request, 3 tokens, start at 0, no top logprobs, no token_ids_logprob
    req = DummyReq([10, 20, 30], 0)
    output = DummyLogitsProcessorOutput(input_token_logprobs=(0.1, 0.2, 0.3, 0.4))
    mixin_no_delimiter.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=3,
        last_prefill_chunk=True,
    )

def test_basic_multi_item_scoring(mixin_basic):
    # Multi-item scoring, delimiter=99, only delimiter tokens get logprobs
    req = DummyReq([10, 99, 20, 99, 30], 0, is_prefill_only=True)
    output = DummyLogitsProcessorOutput(input_token_logprobs=(0.5, 0.6))
    mixin_basic.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=2,
        last_prefill_chunk=True,
    )




def test_logprob_start_len_beyond_input(mixin_no_delimiter):
    # logprob_start_len > input length (should produce empty idx)
    req = DummyReq([1,2,3], 5)
    output = DummyLogitsProcessorOutput(input_token_logprobs=(0.1, 0.2, 0.3))
    mixin_no_delimiter.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=2,
        last_prefill_chunk=True,
    )

def test_padded_token_ids_are_zeroed(mixin_no_delimiter):
    # Input IDs contain vocab_size-1, which should be replaced with 0
    req = DummyReq([98,99,100,101], 0)
    output = DummyLogitsProcessorOutput(input_token_logprobs=(0.1, 0.2, 0.3, 0.4))
    mixin_no_delimiter.model_config.vocab_size = 100
    mixin_no_delimiter.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=4,
        last_prefill_chunk=True,
    )

def test_multi_item_scoring_no_delimiters(mixin_basic):
    # Multi-item scoring, but no delimiter tokens present
    req = DummyReq([10,20,30], 0, is_prefill_only=True)
    output = DummyLogitsProcessorOutput(input_token_logprobs=())
    mixin_basic.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=0,
        last_prefill_chunk=True,
    )

def test_return_logprob_false(mixin_no_delimiter):
    # Should skip length assertions if return_logprob is False
    req = DummyReq([1,2,3], 0, return_logprob=False)
    output = DummyLogitsProcessorOutput(input_token_logprobs=(0.1, 0.2, 0.3, 0.4))
    mixin_no_delimiter.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=3,
        last_prefill_chunk=True,
    )

def test_already_computed_logprobs(mixin_no_delimiter):
    # Should early return if input_token_logprobs_val is not None
    req = DummyReq([1,2,3], 0)
    req.input_token_logprobs_val = [None, 0.1, 0.2]
    output = DummyLogitsProcessorOutput(input_token_logprobs=(0.1, 0.2, 0.3, 0.4))
    mixin_no_delimiter.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=3,
        last_prefill_chunk=True,
    )

# --- Large Scale Test Cases ---

def test_large_regular_request(mixin_no_delimiter):
    # Large input, up to 1000 tokens
    ids = list(range(1000))
    req = DummyReq(ids, 0)
    # Provide logprobs for all tokens + 1 (to allow for drop-last)
    logprobs = tuple(float(i)/1000 for i in range(1000))
    output = DummyLogitsProcessorOutput(input_token_logprobs=logprobs)
    mixin_no_delimiter.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=1000,
        last_prefill_chunk=True,
    )

def test_large_multi_item_scoring(mixin_basic):
    # Large multi-item scoring, delimiter every 10 tokens
    ids = [99 if i%10==0 else i for i in range(1000)]
    req = DummyReq(ids, 0, is_prefill_only=True)
    # Only positions with delimiter get logprobs
    num_delimiters = ids.count(99)
    logprobs = tuple(float(i)/num_delimiters for i in range(num_delimiters))
    output = DummyLogitsProcessorOutput(input_token_logprobs=logprobs)
    mixin_basic.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=num_delimiters,
        last_prefill_chunk=True,
    )

def test_large_top_logprobs(mixin_no_delimiter):
    # Large input with top_logprobs_num
    ids = list(range(500))
    req = DummyReq(ids, 0, top_logprobs_num=1)
    # Provide logprobs for all tokens + 1
    logprobs = tuple(float(i)/500 for i in range(501))
    top_vals = [[float(i)/500] for i in range(501)]
    top_idxs = [[i] for i in range(501)]
    output = DummyLogitsProcessorOutput(
        input_token_logprobs=logprobs,
        input_top_logprobs_val=top_vals,
        input_top_logprobs_idx=top_idxs,
    )
    mixin_no_delimiter.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=500,
        last_prefill_chunk=True,
    )

def test_large_token_ids_logprob(mixin_no_delimiter):
    # Large input with token_ids_logprob
    ids = list(range(500))
    req = DummyReq(ids, 0, token_ids_logprob=True)
    # Provide logprobs for all tokens + 1
    logprobs = tuple(float(i)/500 for i in range(501))
    token_ids_vals = [torch.tensor([float(i)/500 for i in range(501)])]
    token_ids_idxs = [list(range(501))]
    output = DummyLogitsProcessorOutput(
        input_token_logprobs=logprobs,
        input_token_ids_logprobs_val=token_ids_vals,
        input_token_ids_logprobs_idx=token_ids_idxs,
    )
    mixin_no_delimiter.add_input_logprob_return_values(
        i=0,
        req=req,
        output=output,
        logprob_pt=0,
        num_input_logprobs=500,
        last_prefill_chunk=True,
    )
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-SchedulerOutputProcessorMixin.add_input_logprob_return_values-mhotyez3 and push.

Codeflash Static Badge

The optimized code achieves a **72% speedup** through several key performance improvements that reduce repeated computations and attribute lookups:

**Key Optimizations:**

1. **Reduced attribute access overhead**: The code caches frequently accessed values like `req.origin_input_ids[req.logprob_start_len:]` into `slice_ids` and `self.server_args.multi_item_scoring_delimiter` into `multi_item_delim`, avoiding repeated property lookups that are expensive in Python.

2. **Optimized list operations**: 
   - In `_process_input_token_logprobs`, the original `[None] + input_token_logprobs[:-1]` creates two temporary lists and concatenates them. The optimized version uses conditional extension (`req.input_token_logprobs_val += input_token_logprobs[:-1]`) only when needed.
   - In `_calculate_relevant_tokens_len`, replaced the generator expression with `slice_ids.count(multi_item_delim)` which is a native C implementation and much faster for counting operations.

3. **Minimized repeated object creation**: Uses local variables (`input_top_logprobs_val`, `input_token_ids_logprobs_val`) instead of repeatedly accessing `req` attributes, then assigns once at the end. This reduces both attribute lookup overhead and potential list reallocation.

4. **Smarter conditional checks**: Added existence checks (`if temp_val and temp_idx:`, `if input_top_logprobs_val:`) to avoid unnecessary operations on empty lists.

5. **Cached computation in main function**: The `add_input_logprob_return_values` function caches `req.input_token_logprobs` to a local variable to avoid repeated attribute access during the extend operation.

**Performance Impact by Test Case:**
- **Large-scale operations see the biggest gains**: The `test_large_scale_regular_request` shows 89% speedup and `test_large_scale_multi_item_scoring` shows 154% speedup, indicating these optimizations are particularly effective for high-throughput scenarios.
- **Multi-item scoring benefits significantly**: Tests like `test_basic_multi_item_scoring` (18.9% faster) benefit from the optimized counting and list comprehension operations.
- **Some small overhead on simple cases**: A few basic tests show minor slowdowns (4-6%), likely due to the additional local variable assignments, but this is vastly outweighed by gains on realistic workloads.

The optimizations are especially valuable for logprob processing in language model inference pipelines, where these functions are called frequently during token generation and the input sizes can be substantial.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 7, 2025 12:27
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant