Reduce memory cost of importing the completion function #16860

AlexsanderHamir · 2025-11-20T01:56:19Z

This PR is not to be merged, will cherry pick from here and merge into main slowly.

Title

Reduce memory cost of importing the completion function

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🧹 Refactoring

Context

Our current import strategy pulls in large portions of the codebase—even when only a single function is needed. Many modules perform heavy work at import time or bring in sizable dependencies, so importing the completion function triggers unnecessary initialization and memory allocation.

While this PR reduces the overhead for the completion function, it doesn’t fully resolve the underlying issue. A broader cleanup of our import structure is required for a complete fix.

Changes

Lazy-loaded the heaviest libraries identified in the memory profile during completion import.

Memory Differences

Before

After

This change removes 67MB of memory consumption on import time.

This reduced memory usage when importing the LiteLLM completion function from 200 MB to 140 MB.

This brings us down to 20MB, but something is getting triggered that is causing memory to spike.

Lazy-load most functions and response types from utils.py to avoid loading tiktoken and other heavy dependencies at import time. This significantly reduces memory usage when importing completion from litellm. Changes: - Made utils functions (exception_type, get_litellm_params, ModelResponse, etc.) lazy-loaded via __getattr__ - Made ALL_LITELLM_RESPONSE_TYPES lazy-loaded - Fixed circular imports by updating files to import directly from litellm.utils or litellm.types.utils instead of from litellm - Kept client decorator as immediate import since it's used at function definition time Only client is now imported immediately from utils.py; all other utils functions and response types are loaded on-demand when accessed.

Lazy-load tiktoken and default_encoding from litellm_core_utils to avoid loading these heavy dependencies at import time. This further reduces memory usage when importing completion from litellm. Changes: - Made tiktoken imports lazy-loaded in utils.py, main.py, and token_counter.py - Made default_encoding lazy-loaded in token_counter.py and utils.py - Made get_modified_max_tokens lazy-loaded in utils.py (only used internally) - Made encoding attribute lazy-loaded via __getattr__ in __init__.py - Removed top-level tiktoken and Encoding imports that were loading at module level tiktoken and default_encoding are now only loaded when token counting or encoding functions are actually called, not when importing completion.

Refactor repetitive lazy import and caching code into reusable helper functions to improve code maintainability and readability. Changes: - Added _lazy_import_and_cache() generic helper for lazy importing with caching - Added _lazy_import_from() convenience wrapper for common import pattern - Replaced 4 repetitive code blocks with simple function calls - Maintains same performance: imports cached after first access, zero overhead on subsequent calls The helper functions eliminate code duplication while preserving the performance benefits of cached lazy loading.

- Remove eager import of AsyncHTTPHandler and HTTPHandler from __init__.py - Make module_level_aclient and module_level_client lazy-loaded via __getattr__ - HTTP handler clients are now instantiated on first access, not at import time - Reduces memory footprint when importing completion from litellm

Lazy-load Cache, DualCache, RedisCache, and InMemoryCache from caching.caching to avoid loading these dependencies at import time. This further reduces memory usage when importing completion from litellm. Changes: - Made Cache, DualCache, RedisCache, and InMemoryCache lazy-loaded via __getattr__ in __init__.py - Removed top-level caching class imports that were loading at module level - Updated cache type annotation to use forward reference string to avoid runtime import - Caching classes are now only loaded when actually accessed, not when importing completion Performance: - First access: 0.001-0.008ms (negligible latency) - Cached access: 0.000ms (no latency penalty) - Classes are cached in globals() after first access to avoid repeated import overhead This follows the same pattern as HTTP handlers lazy loading and avoids latency issues by caching imported classes after first access.

vercel · 2025-11-20T01:56:26Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
litellm	Error			Nov 25, 2025 0:33am

1. Grouped lazy imports into the same functions. 2. Removed importing more then one lib when its name wasn't called.

…_issue

…e_index_from_tool_calls to reduce import-time memory cost

- Convert most types.utils imports to lazy loading via __getattr__ - Add _lazy_import_types_utils function for on-demand imports - Keep LlmProviders and PriorityReservationSettings as direct imports (needed for module-level initialization) - Add TYPE_CHECKING imports for type annotations (CredentialItem, BudgetConfig, etc.) - Significantly reduces import cascade and memory usage at import time

- Make provider_list and priority_reservation_settings lazy-loaded via __getattr__ - Lazy load types.proxy.management_endpoints.ui_sso imports (DefaultTeamSSOParams, LiteLLM_UpperboundKeyGenerateParams) - Keep LlmProviders and PriorityReservationSettings as direct imports (needed by other modules) - Remove non-essential comments - Significantly reduces import-time memory usage

- Make KeyManagementSystem fully lazy-loaded via __getattr__ - Make KeyManagementSettings lazy-loadable via __getattr__ - Keep KeyManagementSettings as direct import (needed for _key_management_settings initialization during import) - Add TYPE_CHECKING imports for type annotations - Significantly reduces import-time memory usage

@client

- Move client import from line 1053 to right before main.py import (line 1328) - This delays loading utils.py (which imports tiktoken) until after most other imports - client cannot be fully lazy-loaded because main.py needs it at import time for @client decorator - Reduces memory footprint during early import phase

- Remove direct import of BytezChatConfig from early in __init__.py - Add lazy loading via __getattr__ pattern - Delays loading bytez transformation module until BytezChatConfig is accessed - main.py still works (imports directly), utils.py works (accesses via litellm.BytezChatConfig)

- Remove direct import of CustomLLM from early in __init__.py - Add lazy loading via __getattr__ pattern - Delays loading custom_llm module until CustomLLM is accessed - images/main.py still works (imports directly from source) - Proxy examples still work (access via litellm.CustomLLM)

- Remove direct import of AmazonConverseConfig from early in __init__.py - Add lazy loading via __getattr__ pattern - Delays loading converse_transformation module until AmazonConverseConfig is accessed - common_utils.py still works (accesses via litellm.AmazonConverseConfig()) - invoke_handler.py still works (imports directly from source)

Add azure_chat_completions to the _lazy_vars dictionary in __getattr__ to fix ImportError when other modules (e.g., images/main.py) try to import it from litellm.main. This ensures backward compatibility with modules that import these handlers directly.

@client

… memory Make openai and its submodules (_parsing, _pydantic, ResponseFormat, OpenAIError) lazy-loaded in the @client decorator to avoid expensive import when importing the decorator. This defers the openai import until the decorator actually runs, significantly reducing import-time memory cost. Changes: - Remove top-level 'import openai' from utils.py - Add lazy import helpers for openai module and submodules - Replace openai.* references in @client decorator with lazy-loaded versions - Update exception handling to use lazy-loaded openai.APIError, Timeout, etc.

Remove the unused import of litellm._service_logger from utils.py to reduce import-time memory cost. The module is not used in utils.py and can be imported directly where needed.

@client

Make litellm.litellm_core_utils.audio_utils.utils lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The module is only loaded when actually needed (during transcription calls) and cached for subsequent use to maintain performance.

…mory Remove unused top-level imports of litellm.llms and litellm.llms.gemini from utils.py. These are not used directly and submodule imports (from litellm.llms.*) will automatically import the parent package when needed, avoiding expensive imports at module load time.

@client

Make CachingHandlerResponse and LLMCachingHandler lazy-loaded using cached helper functions to avoid expensive import when importing the @client decorator. These classes are only needed when the decorator actually runs, not at import time.

@client

Make CustomGuardrail lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The class is only needed when get_applied_guardrails is called, not at import time.

@client

Make CustomLogger lazy-loaded using a cached helper function and TYPE_CHECKING for type hints to avoid expensive import when importing the @client decorator. All type hints use string literals to support forward references. The class is only loaded when actually needed (isinstance checks), not at import time.

Fix NameError by replacing direct LLMCachingHandler usage with lazy loader function call in the async wrapper. This ensures the class is properly loaded when needed rather than at import time.

Remove the unused import of BaseVectorStore from utils.py to reduce import-time memory cost. The class is not used in utils.py and can be imported directly where needed.

@client

…-time memory Make get_litellm_metadata_from_kwargs lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The function is only needed when get_end_user_id_for_cost_tracking is called, not at import time.

@client

Make CredentialAccessor lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The class is only needed when load_credentials_from_list is called, not at import time.

@client

…t-time memory Make _get_response_headers, exception_type, and get_error_message lazy-loaded using cached helper functions to avoid expensive import when importing the @client decorator. These functions are only needed when exception handling occurs, not at import time.

Update main.py to use lazy-loaded exception_type from utils.py instead of direct import. This fixes the ImportError when importing completion from litellm, since exception_type is now lazy-loaded in utils.py.

@client

Make get_llm_provider lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The function is only needed when provider logic is accessed, not at import time.

Update main.py to use lazy-loaded get_llm_provider from utils.py instead of direct import. This fixes the ImportError when importing completion from litellm, since get_llm_provider is now lazy-loaded in utils.py.

@client

… memory Make get_supported_openai_params lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The function is only needed when optional params are processed, not at import time.

@client

…rt-time memory Make LiteLLMResponseObjectHandler, _handle_invalid_parallel_tool_calls, convert_to_model_response_object, convert_to_streaming_response, and convert_to_streaming_response_async lazy-loaded using cached helper functions and __getattr__ to avoid expensive import when importing the @client decorator. These functions are only needed when response conversion occurs, not at import time.

@client

Make get_api_base lazy-loaded using a cached helper function and __getattr__ to avoid expensive import when importing the @client decorator. The function is only needed when API base resolution occurs, not at import time.

@client

…to reduce import-time memory Make get_formatted_prompt, get_response_headers, ResponseMetadata, _parse_content_for_reasoning, LiteLLMLoggingObject, and redact_message_input_output_from_logging lazy-loaded using cached helper functions and __getattr__ to avoid expensive imports when importing the @client decorator. These are only needed when response processing occurs, not at import time.

Move the TYPE_CHECKING block for LiteLLMLoggingObject to after the typing imports to fix the NameError: name 'TYPE_CHECKING' is not defined error.

@client

Make CustomStreamWrapper lazy-loaded using a cached helper function and __getattr__ to avoid expensive import when importing the @client decorator. The class is only needed when streaming responses are processed, not at import time. This is required since it's imported by litellm/llms/openai_like/chat/handler.py.

@client

…port-time memory Move BaseGoogleGenAIGenerateContentConfig to TYPE_CHECKING block since it's only used in type annotations. Update the type hint to use a string literal to avoid runtime import when importing the @client decorator.

@client

Move BaseOCRConfig to TYPE_CHECKING block since it's only used in type annotations. The type hint already uses a string literal, so no runtime import is needed when importing the @client decorator.

@client

Move BaseSearchConfig to TYPE_CHECKING block since it's only used in type annotations. The type hint already uses a string literal, so no runtime import is needed when importing the @client decorator.

Move Base*Config classes and related imports to TYPE_CHECKING block or lazy load them to reduce import-time memory cost. This follows the same pattern used in __init__.py. Changes: - Move all Base*Config classes used only in type hints to TYPE_CHECKING block - Create lazy loader functions for runtime-used Base*Config classes - Lazy load BedrockModelInfo, CohereModelInfo, MistralOCRConfig - Lazy load HTTPHandler, AsyncHTTPHandler - Lazy load get_num_retries_from_retry_policy, reset_retry_policy, get_secret - Lazy load ANTHROPIC_API_ONLY_HEADERS and AnthropicThinkingParam - Update all type hints to use string literals for forward references - Update all runtime usages to call lazy loader functions - Expose lazy-loaded items via __getattr__ for backward compatibility This significantly reduces import-time memory footprint while maintaining full backward compatibility.

AlexsanderHamir added 8 commits November 18, 2025 14:47

fix: lazy load cost_calculator.py

f0cf772

This change removes 67MB of memory consumption on import time.

fix: lazy-load Prometheus

216b08d

This reduced memory usage when importing the LiteLLM completion function from 200 MB to 140 MB.

fix: lazy load litellm_logging

e8a6a07

This brings us down to 20MB, but something is getting triggered that is causing memory to spike.

AlexsanderHamir requested a review from ishaan-jaff November 21, 2025 01:41

refactor: make lazy imports cleaner

726bb49

1. Grouped lazy imports into the same functions. 2. Removed importing more then one lib when its name wasn't called.

vercel bot deployed to Preview November 21, 2025 20:40 View deployment

fix: lazy load LLMClientCache

505c598

vercel bot deployed to Preview November 21, 2025 21:47 View deployment

Merge remote-tracking branch 'origin/main' into litellm_memory_import…

98fc291

…_issue

vercel bot had a problem deploying to Preview November 22, 2025 18:51 Failure

fix: lazy load COHERE_EMBEDDING_INPUT_TYPES, GuardrailItem, and remov…

da97d2c

…e_index_from_tool_calls to reduce import-time memory cost

vercel bot had a problem deploying to Preview November 22, 2025 19:19 Failure

vercel bot had a problem deploying to Preview November 22, 2025 19:28 Failure

vercel bot had a problem deploying to Preview November 22, 2025 19:43 Failure

vercel bot had a problem deploying to Preview November 22, 2025 20:00 Failure

AlexsanderHamir added 2 commits November 22, 2025 12:03

AlexsanderHamir force-pushed the litellm_memory_import_issue branch from afc07ed to b03746b Compare November 22, 2025 20:15

vercel bot had a problem deploying to Preview November 22, 2025 20:20 Failure

AlexsanderHamir added 2 commits November 22, 2025 12:21

AlexsanderHamir added 7 commits November 24, 2025 13:41

refactor: remove unused _service_logger import from utils.py

423327e

Remove the unused import of litellm._service_logger from utils.py to reduce import-time memory cost. The module is not used in utils.py and can be imported directly where needed.

refactor: lazy load CustomGuardrail to reduce import-time memory

1a016ef

Make CustomGuardrail lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The class is only needed when get_applied_guardrails is called, not at import time.

vercel bot had a problem deploying to Preview November 24, 2025 22:30 Failure

AlexsanderHamir added 11 commits November 24, 2025 14:38

fix: use lazy loader for LLMCachingHandler in async wrapper

fee857c

Fix NameError by replacing direct LLMCachingHandler usage with lazy loader function call in the async wrapper. This ensures the class is properly loaded when needed rather than at import time.

refactor: remove unused BaseVectorStore import from utils.py

be109b0

Remove the unused import of BaseVectorStore from utils.py to reduce import-time memory cost. The class is not used in utils.py and can be imported directly where needed.

refactor: lazy load CredentialAccessor to reduce import-time memory

1452d93

Make CredentialAccessor lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The class is only needed when load_credentials_from_list is called, not at import time.

fix: lazy load exception_type in main.py to fix import error

72c7d17

Update main.py to use lazy-loaded exception_type from utils.py instead of direct import. This fixes the ImportError when importing completion from litellm, since exception_type is now lazy-loaded in utils.py.

refactor: lazy load get_llm_provider to reduce import-time memory

dfbfb47

Make get_llm_provider lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The function is only needed when provider logic is accessed, not at import time.

fix: lazy load get_llm_provider in main.py to fix import error

c82a377

Update main.py to use lazy-loaded get_llm_provider from utils.py instead of direct import. This fixes the ImportError when importing completion from litellm, since get_llm_provider is now lazy-loaded in utils.py.

vercel bot had a problem deploying to Preview November 24, 2025 23:30 Failure

AlexsanderHamir added 7 commits November 24, 2025 15:32

refactor: lazy load get_api_base to reduce import-time memory

1503b0d

Make get_api_base lazy-loaded using a cached helper function and __getattr__ to avoid expensive import when importing the @client decorator. The function is only needed when API base resolution occurs, not at import time.

fix: move TYPE_CHECKING block after typing import to fix NameError

2b40b56

Move the TYPE_CHECKING block for LiteLLMLoggingObject to after the typing imports to fix the NameError: name 'TYPE_CHECKING' is not defined error.

refactor: lazy load BaseOCRConfig to reduce import-time memory

ea248c8

Move BaseOCRConfig to TYPE_CHECKING block since it's only used in type annotations. The type hint already uses a string literal, so no runtime import is needed when importing the @client decorator.

refactor: lazy load BaseSearchConfig to reduce import-time memory

e150757

Move BaseSearchConfig to TYPE_CHECKING block since it's only used in type annotations. The type hint already uses a string literal, so no runtime import is needed when importing the @client decorator.

vercel bot had a problem deploying to Preview November 24, 2025 23:52 Failure

vercel bot had a problem deploying to Preview November 25, 2025 00:33 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reduce memory cost of importing the completion function #16860

Reduce memory cost of importing the completion function #16860

Uh oh!

AlexsanderHamir commented Nov 20, 2025 •

edited

Loading

Uh oh!

vercel bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Reduce memory cost of importing the completion function #16860

Are you sure you want to change the base?

Reduce memory cost of importing the completion function #16860

Uh oh!

Conversation

AlexsanderHamir commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title

Relevant issues

Pre-Submission checklist

Type

Context

Changes

Memory Differences

Uh oh!

vercel bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlexsanderHamir commented Nov 20, 2025 •

edited

Loading

vercel bot commented Nov 20, 2025 •

edited

Loading