Skip to content

Conversation

@AlexsanderHamir
Copy link
Collaborator

@AlexsanderHamir AlexsanderHamir commented Oct 3, 2025

Title

[Fix] Cache - Avoiding expensive operations when cache isn't available

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🧹 Refactoring

Changes

  • Only do expensive work when necessary, avoid it otherwise.

Performance Gains

With DB

Before

Type Name # Requests # Fails Median (ms) 95%ile (ms) 99%ile (ms) Average (ms) Min (ms) Max (ms) Average size (bytes) Current RPS Current Failures/s
POST /chat/completions 102040 2 500 830 1200 532.05 67 2527 398 778.4 0
Custom LiteLLM Overhead Duration (ms) 102038 0 47 71 97 50.42 4 1438 0 778.4 0
  Aggregated 204078 2 290 700 1100 291.24 4 2527 199 1556.8 0

After

Type Name # Requests # Fails Median (ms) 95%ile (ms) 99%ile (ms) Average (ms) Min (ms) Max (ms) Average size (bytes) Current RPS Current Failures/s
POST /chat/completions 101891 2 280 520 1000 310.07 105 53491 398 939.1 0.1
Custom LiteLLM Overhead Duration (ms) 101889 0 25 45 60 27.27 1 851 0 939 0
  Aggregated 203780 2 130 420 870 168.67 1 53491 199 1878.1 0.1

With DB + Redis

Before

Type Name # Requests # Fails Median (ms) 95%ile (ms) 99%ile (ms) Average (ms) Min (ms) Max (ms) Average size (bytes) Current RPS Current Failures/s
POST /chat/completions 35569 0 1900 3500 4400 2052.36 254 8315 398 330.2 0
Custom LiteLLM Overhead Duration (ms) 35569 0 300 770 1200 356.51 27 2156 0 330.2 0
  Aggregated 71138 0 920 3100 4000 1204.44 27 8315 199 660.4 0

After

Type Name # Requests # Fails Median (ms) 95%ile (ms) 99%ile (ms) Average (ms) Min (ms) Max (ms) Average size (bytes) Current RPS Current Failures/s
POST /chat/completions 21000 0 1700 2800 3400 1744.71 84 5383 398 424.1 0
Custom LiteLLM Overhead Duration (ms) 21000 0 270 650 1100 311.15 7 1413 0 424.1 0
  Aggregated 42000 0 820 2600 3100 1027.93 7 5383 199 848.2 0

…ing is disabled

- Moved cache availability checks before expensive operations to improve performance for non-cached requests
- Updated client code to handle None responses from caching handler
@vercel
Copy link

vercel bot commented Oct 3, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
litellm Error Error Oct 4, 2025 3:19pm

Copy link
Contributor

@ishaan-jaff ishaan-jaff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Fixed `TypeError: typing.Any cannot be used with isinstance()` that was
occurring in the caching handler when checking cached streaming responses.

The issue was caused by CustomStreamWrapper being aliased to `typing.Any`
at runtime through the TYPE_CHECKING conditional import pattern. When the
code attempted to use isinstance(cached_result, CustomStreamWrapper) at
lines 222 and 338, it failed because Python's isinstance() cannot be used
with typing.Any.

Solution: Import CustomStreamWrapper at runtime separately from the
TYPE_CHECKING block, while keeping a type alias for static type checking.
This allows isinstance checks to work properly while maintaining type hints.
@ishaan-jaff ishaan-jaff merged commit 5d22229 into main Oct 4, 2025
38 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants