Create a light version huggingface partner libs with only hub to Avoid Unnecessary large pytorch Dependency #26027

morgandiverrez · 2024-09-04T09:51:54Z

morgandiverrez
Sep 4, 2024

I checked:

I searched for existing ideas and did not find a similar one.
I added a very descriptive title.
I've clearly described the feature request and its motivation.

Feature Request

Description

When importing HuggingFaceEndpointEmbeddings or HuggingFaceEndpoint from langchain_huggingface.embeddings and langchain_huggingface.llms, it is currently necessary to install the entire langchain-huggingface package. This package includes the PyTorch library as a dependency, which significantly increases the size of container images by up to 6GB. This is problematic for use cases that only require remote embedding or LLM API access and do not need the PyTorch library.

This could be achieved by:

Creating a new lightweight library that only includes the HuggingFaceEndpoint and HuggingFaceEndpointEmbeddings classes, containing only the necessary components for remote embedding API access.
Deprecating HuggingFaceEndpoint and HuggingFaceEndpointEmbeddings from langchain_huggingface.embeddings and langchain_huggingface.llms.

Additional Context

This change is particularly important for users who operate in environments with strict resource limitations or those who prioritize lightweight and efficient deployments.

Impact

This change will make the langchain-huggingface package more modular and user-friendly, especially for those who rely solely on remote services for embedding tasks.

Thank you for considering this proposal. I believe it will greatly enhance the usability and efficiency of the langchain-huggingface package.

Motivation

Reduced Container Image Size: By removing the unnecessary PyTorch dependency, the size of container images can be significantly reduced, making deployments faster and more efficient.
Improved Performance: Smaller container images can lead to quicker startup times and lower memory usage.
Flexibility: Users who only need remote embedding API access will not be forced to install and manage the heavyweight PyTorch library.

Proposal (If applicable)

Create a new lightweight library, HuggingFaceLight, containing only the HuggingFaceEndpoint and HuggingFaceEndpointEmbeddings classes for communication with the server. This new library should inherit from the LangChain base model so that it can be imported and used without requiring PyTorch or other heavy dependencies.

morgandiverrez · 2024-09-04T10:01:43Z

morgandiverrez
Sep 4, 2024
Author

similare issue in this discussion Refactor Import of HuggingFaceEndpointEmbeddings to Avoid Unnecessary large pytorch Dependency #24482 but different proposed solution

0 replies

Benvii · 2024-09-04T13:58:42Z

Benvii
Sep 4, 2024

Completely agree with this suggestion, the core idea is to have an huggingface partner library oriented only for remote inference, meaning that this library will rely mostly on "http clients" that interact with inference servers (the huggingface hub or any custom deployment). This would remove all heavy dependencies.

About the naming, it should reflect the "remote inference" aspect, clearly indicate that this module is designed to be used when the inference occurs on dedicated servers, which is mostly the case for most production projects. Maybe langchain-huggingface-remote.

This new module could be introduce without making any breaking change, as you suggest :

introducing the new module langchain-huggingface-remote with HuggingFaceEndpoint and HuggingFaceEndpointEmbeddings
deprecating HuggingFaceEndpoint and HuggingFaceEndpointEmbeddings in langchain-huggingface, so that users can move to the new library
after 1-2 major release, removing HuggingFaceEndpoint and HuggingFaceEndpointEmbeddings from langchain-huggingface

0 replies

vishah02 · 2024-10-16T15:42:06Z

vishah02
Oct 16, 2024

Hi! We are a group of 4 students from the university of Toronto, and we are looking into implementing this shortly!

0 replies

vishah02 · 2024-11-04T21:03:53Z

vishah02
Nov 4, 2024

Hello! Here is our current proposal. We will implement this soon, and any feedback is apprecaited in the meantime :)

Outline of Changes

New Lightweight Package Creation:
Created a new module named langchain-huggingface-remote within the libs/partners directory.
This package is designed for remote inference using Hugging Face API endpoints, without the need for heavy dependencies like PyTorch.

Code Duplication and Refactoring:
Duplicated the core classes, HuggingFaceEndpoint and HuggingFaceEndpointEmbeddings, from langchain_huggingface to langchain_huggingface_remote.
Refactored these classes to ensure they only interact with remote API endpoints, removing any dependency on PyTorch or local model loading functionality.
Added utility functions for handling HTTP requests and error logging specific to remote API usage.

Dependency Minimization:
Created a dedicated pyproject.toml file for langchain-huggingface-remote with only minimal dependencies (e.g., requests for HTTP calls).
Excluded torch, transformers, and other local model dependencies to keep the package lightweight.
Documentation Updates:
Added a new README.md file to langchain-huggingface-remote that explains the purpose of the lightweight package.
Provided installation instructions and example usage for interacting with Hugging Face’s remote API, emphasizing the package’s minimal footprint.

Deprecation Notices:
Added deprecation warnings in langchain_huggingface for HuggingFaceEndpoint and HuggingFaceEndpointEmbeddings, directing users to langchain-huggingface-remote for remote-only use cases.
Updated main LangChain documentation to mention the new lightweight package and provide guidance on transitioning to it for remote-only needs.

Testing Suite:
Created a tests folder in langchain-huggingface-remote containing unit tests for the refactored API classes.
Used mock API responses in tests to validate functionality without making actual calls to Hugging Face’s servers, ensuring the tests are lightweight and fast.

CI/CD and Packaging Adjustments:
Updated CI/CD workflows (if applicable) to include testing for langchain-huggingface-remote as a standalone package.
Modified the Makefile (if present) and other build scripts to incorporate the new package in the build and deployment pipeline.

Existing Architecture

The langchain-huggingface package in the LangChain repository provides integration with Hugging Face models and APIs, allowing users to leverage large language models (LLMs) and embedding functionalities within LangChain workflows. The architecture is structured to support both local and remote inference by interacting with Hugging Face’s Transformers library and API endpoints. Below is an overview of the existing structure and its main components:

Core Components:
HuggingFaceEndpoint: This class provides an interface for connecting with Hugging Face’s hosted inference API, enabling users to perform remote inference. It acts as a generic endpoint wrapper for making API calls to Hugging Face and retrieving model responses.
HuggingFaceEndpointEmbeddings: Built on top of HuggingFaceEndpoint, this class is specialized for embedding generation tasks. It allows users to obtain text embeddings from Hugging Face’s models, either via remote API or locally if the model is downloaded.

Dual Support for Local and Remote Inference:
The langchain-huggingface package supports both local model loading and remote API calls, making it versatile but potentially heavy in terms of dependencies. For instance, when local models are used, PyTorch and the transformers library are required, as they handle model loading and inference directly within the environment.
Users who only need remote inference (e.g., embedding generation or LLM output via the Hugging Face API) still need to install the full langchain-huggingface package, which includes PyTorch as a dependency. This results in larger container sizes and resource overhead, even when local inference capabilities are not used.

Dependencies:
The current langchain-huggingface package includes several heavy dependencies, primarily:
PyTorch: Required for loading and running models locally.
Transformers: The Hugging Face Transformers library enables both model loading and remote API interfacing.
These dependencies contribute to a substantial increase in package size, which is often unnecessary for users needing only remote API access.

Files to Modify

libs/partners/huggingface/langchain_huggingface/endpoints.py
libs/partners/huggingface/pyproject.toml
libs/partners/huggingface/README.md
New Files
langchain-huggingface-remote/pyproject.toml
langchain-huggingface-remote/langchain_huggingface_remote/init.py
langchain-huggingface-remote/langchain_huggingface_remote/endpoints.py
langchain-huggingface-remote/tests/test_endpoints.py

Pseudocode

Outline of changes made to config.py

# config.py
def get_api_token():
    # Get API token from environment variable or return None if not set
    return os.getenv("HUGGINGFACEHUB_API_TOKEN")

Outline of changes made to utils.py

# utils.py
import requests


def post_to_huggingface(endpoint_url, payload, headers):
    # Make an HTTP POST request to the Hugging Face API
    response = requests.post(endpoint_url, json=payload, headers=headers)
    response.raise_for_status()
    return response.json()

Outline of changes made to embeddings.py

# embeddings.py
from config import get_api_token
from utils import post_to_huggingface


class HuggingFaceEndpointEmbeddings:
    def __init__(self, model, task="feature-extraction", model_kwargs=None):
        self.model = model
        self.task = task
        self.model_kwargs = model_kwargs or {}
        self.api_token = get_api_token()
        self.endpoint_url = f"https://api-inference.huggingface.co/models/{self.model}"
    
    def embed_documents(self, texts):
        # Embed a list of texts
        headers = {"Authorization": f"Bearer {self.api_token}"}
        payload = {"inputs": texts, **self.model_kwargs}
        response = post_to_huggingface(self.endpoint_url, payload, headers)
        return [result["embedding"] for result in response]
    
    def embed_query(self, text):
        # Embed a single query text
        return self.embed_documents([text])[0]

Outline of changes made to generation.py

# generation.py
from config import get_api_token
from utils import post_to_huggingface


class HuggingFaceEndpoint:
    def __init__(self, model, task="text-generation", generation_kwargs=None):
        self.model = model
        self.task = task
        self.generation_kwargs = generation_kwargs or {}
        self.api_token = get_api_token()
        self.endpoint_url = f"https://api-inference.huggingface.co/models/{self.model}"


    def generate_text(self, prompt):
        # Generate text based on a prompt
        headers = {"Authorization": f"Bearer {self.api_token}"}
        payload = {"inputs": prompt, "parameters": self.generation_kwargs}
        response = post_to_huggingface(self.endpoint_url, payload, headers)
        return response[0]["generated_text"]

And modify test file as wanted.

0 replies

vishnu080597 · 2025-01-30T20:08:38Z

vishnu080597
Jan 30, 2025

Hello, having the same issue with deploying huggingface_endpoint with docker, is there any walk around for this issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a light version huggingface partner libs with only hub to Avoid Unnecessary large pytorch Dependency #26027

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Create a light version huggingface partner libs with only hub to Avoid Unnecessary large pytorch Dependency #26027

morgandiverrez Sep 4, 2024

I checked:

Feature Request

Description

Additional Context

Impact

Motivation

Proposal (If applicable)

Replies: 5 comments

morgandiverrez Sep 4, 2024 Author

Benvii Sep 4, 2024

vishah02 Oct 16, 2024

vishah02 Nov 4, 2024

Hello! Here is our current proposal. We will implement this soon, and any feedback is apprecaited in the meantime :)

Outline of Changes

Existing Architecture

Files to Modify

Pseudocode

vishnu080597 Jan 30, 2025

morgandiverrez
Sep 4, 2024

morgandiverrez
Sep 4, 2024
Author

Benvii
Sep 4, 2024

vishah02
Oct 16, 2024

vishah02
Nov 4, 2024

vishnu080597
Jan 30, 2025