-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Reduce memory cost of importing the completion function #16860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
AlexsanderHamir
wants to merge
181
commits into
main
Choose a base branch
from
litellm_memory_import_issue
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This change removes 67MB of memory consumption on import time.
This reduced memory usage when importing the LiteLLM completion function from 200 MB to 140 MB.
This brings us down to 20MB, but something is getting triggered that is causing memory to spike.
Lazy-load most functions and response types from utils.py to avoid loading tiktoken and other heavy dependencies at import time. This significantly reduces memory usage when importing completion from litellm. Changes: - Made utils functions (exception_type, get_litellm_params, ModelResponse, etc.) lazy-loaded via __getattr__ - Made ALL_LITELLM_RESPONSE_TYPES lazy-loaded - Fixed circular imports by updating files to import directly from litellm.utils or litellm.types.utils instead of from litellm - Kept client decorator as immediate import since it's used at function definition time Only client is now imported immediately from utils.py; all other utils functions and response types are loaded on-demand when accessed.
Lazy-load tiktoken and default_encoding from litellm_core_utils to avoid loading these heavy dependencies at import time. This further reduces memory usage when importing completion from litellm. Changes: - Made tiktoken imports lazy-loaded in utils.py, main.py, and token_counter.py - Made default_encoding lazy-loaded in token_counter.py and utils.py - Made get_modified_max_tokens lazy-loaded in utils.py (only used internally) - Made encoding attribute lazy-loaded via __getattr__ in __init__.py - Removed top-level tiktoken and Encoding imports that were loading at module level tiktoken and default_encoding are now only loaded when token counting or encoding functions are actually called, not when importing completion.
Refactor repetitive lazy import and caching code into reusable helper functions to improve code maintainability and readability. Changes: - Added _lazy_import_and_cache() generic helper for lazy importing with caching - Added _lazy_import_from() convenience wrapper for common import pattern - Replaced 4 repetitive code blocks with simple function calls - Maintains same performance: imports cached after first access, zero overhead on subsequent calls The helper functions eliminate code duplication while preserving the performance benefits of cached lazy loading.
- Remove eager import of AsyncHTTPHandler and HTTPHandler from __init__.py - Make module_level_aclient and module_level_client lazy-loaded via __getattr__ - HTTP handler clients are now instantiated on first access, not at import time - Reduces memory footprint when importing completion from litellm
Lazy-load Cache, DualCache, RedisCache, and InMemoryCache from caching.caching to avoid loading these dependencies at import time. This further reduces memory usage when importing completion from litellm. Changes: - Made Cache, DualCache, RedisCache, and InMemoryCache lazy-loaded via __getattr__ in __init__.py - Removed top-level caching class imports that were loading at module level - Updated cache type annotation to use forward reference string to avoid runtime import - Caching classes are now only loaded when actually accessed, not when importing completion Performance: - First access: 0.001-0.008ms (negligible latency) - Cached access: 0.000ms (no latency penalty) - Classes are cached in globals() after first access to avoid repeated import overhead This follows the same pattern as HTTP handlers lazy loading and avoids latency issues by caching imported classes after first access.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
1. Grouped lazy imports into the same functions. 2. Removed importing more then one lib when its name wasn't called.
…e_index_from_tool_calls to reduce import-time memory cost
- Convert most types.utils imports to lazy loading via __getattr__ - Add _lazy_import_types_utils function for on-demand imports - Keep LlmProviders and PriorityReservationSettings as direct imports (needed for module-level initialization) - Add TYPE_CHECKING imports for type annotations (CredentialItem, BudgetConfig, etc.) - Significantly reduces import cascade and memory usage at import time
- Make provider_list and priority_reservation_settings lazy-loaded via __getattr__ - Lazy load types.proxy.management_endpoints.ui_sso imports (DefaultTeamSSOParams, LiteLLM_UpperboundKeyGenerateParams) - Keep LlmProviders and PriorityReservationSettings as direct imports (needed by other modules) - Remove non-essential comments - Significantly reduces import-time memory usage
- Make KeyManagementSystem fully lazy-loaded via __getattr__ - Make KeyManagementSettings lazy-loadable via __getattr__ - Keep KeyManagementSettings as direct import (needed for _key_management_settings initialization during import) - Add TYPE_CHECKING imports for type annotations - Significantly reduces import-time memory usage
- Move client import from line 1053 to right before main.py import (line 1328) - This delays loading utils.py (which imports tiktoken) until after most other imports - client cannot be fully lazy-loaded because main.py needs it at import time for @client decorator - Reduces memory footprint during early import phase
afc07ed to
b03746b
Compare
- Remove direct import of BytezChatConfig from early in __init__.py - Add lazy loading via __getattr__ pattern - Delays loading bytez transformation module until BytezChatConfig is accessed - main.py still works (imports directly), utils.py works (accesses via litellm.BytezChatConfig)
- Remove direct import of CustomLLM from early in __init__.py - Add lazy loading via __getattr__ pattern - Delays loading custom_llm module until CustomLLM is accessed - images/main.py still works (imports directly from source) - Proxy examples still work (access via litellm.CustomLLM)
- Remove direct import of AmazonConverseConfig from early in __init__.py - Add lazy loading via __getattr__ pattern - Delays loading converse_transformation module until AmazonConverseConfig is accessed - common_utils.py still works (accesses via litellm.AmazonConverseConfig()) - invoke_handler.py still works (imports directly from source)
Add azure_chat_completions to the _lazy_vars dictionary in __getattr__ to fix ImportError when other modules (e.g., images/main.py) try to import it from litellm.main. This ensures backward compatibility with modules that import these handlers directly.
… memory Make openai and its submodules (_parsing, _pydantic, ResponseFormat, OpenAIError) lazy-loaded in the @client decorator to avoid expensive import when importing the decorator. This defers the openai import until the decorator actually runs, significantly reducing import-time memory cost. Changes: - Remove top-level 'import openai' from utils.py - Add lazy import helpers for openai module and submodules - Replace openai.* references in @client decorator with lazy-loaded versions - Update exception handling to use lazy-loaded openai.APIError, Timeout, etc.
Remove the unused import of litellm._service_logger from utils.py to reduce import-time memory cost. The module is not used in utils.py and can be imported directly where needed.
Make litellm.litellm_core_utils.audio_utils.utils lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The module is only loaded when actually needed (during transcription calls) and cached for subsequent use to maintain performance.
…mory Remove unused top-level imports of litellm.llms and litellm.llms.gemini from utils.py. These are not used directly and submodule imports (from litellm.llms.*) will automatically import the parent package when needed, avoiding expensive imports at module load time.
Make CachingHandlerResponse and LLMCachingHandler lazy-loaded using cached helper functions to avoid expensive import when importing the @client decorator. These classes are only needed when the decorator actually runs, not at import time.
Make CustomGuardrail lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The class is only needed when get_applied_guardrails is called, not at import time.
Make CustomLogger lazy-loaded using a cached helper function and TYPE_CHECKING for type hints to avoid expensive import when importing the @client decorator. All type hints use string literals to support forward references. The class is only loaded when actually needed (isinstance checks), not at import time.
Fix NameError by replacing direct LLMCachingHandler usage with lazy loader function call in the async wrapper. This ensures the class is properly loaded when needed rather than at import time.
Remove the unused import of BaseVectorStore from utils.py to reduce import-time memory cost. The class is not used in utils.py and can be imported directly where needed.
…-time memory Make get_litellm_metadata_from_kwargs lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The function is only needed when get_end_user_id_for_cost_tracking is called, not at import time.
Make CredentialAccessor lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The class is only needed when load_credentials_from_list is called, not at import time.
…t-time memory Make _get_response_headers, exception_type, and get_error_message lazy-loaded using cached helper functions to avoid expensive import when importing the @client decorator. These functions are only needed when exception handling occurs, not at import time.
Update main.py to use lazy-loaded exception_type from utils.py instead of direct import. This fixes the ImportError when importing completion from litellm, since exception_type is now lazy-loaded in utils.py.
Make get_llm_provider lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The function is only needed when provider logic is accessed, not at import time.
Update main.py to use lazy-loaded get_llm_provider from utils.py instead of direct import. This fixes the ImportError when importing completion from litellm, since get_llm_provider is now lazy-loaded in utils.py.
… memory Make get_supported_openai_params lazy-loaded using a cached helper function to avoid expensive import when importing the @client decorator. The function is only needed when optional params are processed, not at import time.
…rt-time memory Make LiteLLMResponseObjectHandler, _handle_invalid_parallel_tool_calls, convert_to_model_response_object, convert_to_streaming_response, and convert_to_streaming_response_async lazy-loaded using cached helper functions and __getattr__ to avoid expensive import when importing the @client decorator. These functions are only needed when response conversion occurs, not at import time.
Make get_api_base lazy-loaded using a cached helper function and __getattr__ to avoid expensive import when importing the @client decorator. The function is only needed when API base resolution occurs, not at import time.
…to reduce import-time memory Make get_formatted_prompt, get_response_headers, ResponseMetadata, _parse_content_for_reasoning, LiteLLMLoggingObject, and redact_message_input_output_from_logging lazy-loaded using cached helper functions and __getattr__ to avoid expensive imports when importing the @client decorator. These are only needed when response processing occurs, not at import time.
Move the TYPE_CHECKING block for LiteLLMLoggingObject to after the typing imports to fix the NameError: name 'TYPE_CHECKING' is not defined error.
Make CustomStreamWrapper lazy-loaded using a cached helper function and __getattr__ to avoid expensive import when importing the @client decorator. The class is only needed when streaming responses are processed, not at import time. This is required since it's imported by litellm/llms/openai_like/chat/handler.py.
…port-time memory Move BaseGoogleGenAIGenerateContentConfig to TYPE_CHECKING block since it's only used in type annotations. Update the type hint to use a string literal to avoid runtime import when importing the @client decorator.
Move BaseOCRConfig to TYPE_CHECKING block since it's only used in type annotations. The type hint already uses a string literal, so no runtime import is needed when importing the @client decorator.
Move BaseSearchConfig to TYPE_CHECKING block since it's only used in type annotations. The type hint already uses a string literal, so no runtime import is needed when importing the @client decorator.
Move Base*Config classes and related imports to TYPE_CHECKING block or lazy load them to reduce import-time memory cost. This follows the same pattern used in __init__.py. Changes: - Move all Base*Config classes used only in type hints to TYPE_CHECKING block - Create lazy loader functions for runtime-used Base*Config classes - Lazy load BedrockModelInfo, CohereModelInfo, MistralOCRConfig - Lazy load HTTPHandler, AsyncHTTPHandler - Lazy load get_num_retries_from_retry_policy, reset_retry_policy, get_secret - Lazy load ANTHROPIC_API_ONLY_HEADERS and AnthropicThinkingParam - Update all type hints to use string literals for forward references - Update all runtime usages to call lazy loader functions - Expose lazy-loaded items via __getattr__ for backward compatibility This significantly reduces import-time memory footprint while maintaining full backward compatibility.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Title
Reduce memory cost of importing the completion function
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitType
🧹 Refactoring
Context
Our current import strategy pulls in large portions of the codebase—even when only a single function is needed. Many modules perform heavy work at import time or bring in sizable dependencies, so importing the completion function triggers unnecessary initialization and memory allocation.
While this PR reduces the overhead for the completion function, it doesn’t fully resolve the underlying issue. A broader cleanup of our import structure is required for a complete fix.
Changes
Memory Differences
Before
After