Skip to content

[POC] Image generation multi-concurrency idea #2113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

dkalinowski
Copy link
Collaborator

@dkalinowski dkalinowski commented Apr 25, 2025

Increased number of infer requests for each model
Added a way to select infer request by request_idx
Added GenerationRequest class to manage assignment of request_idx

@github-actions github-actions bot added category: image generation Image generation pipelines category: tokenizers Tokenizer class or submodule update category: Python API Python API for GenAI category: LLM samples GenAI LLM samples category: CPP API Changes in GenAI C++ public headers labels Apr 25, 2025
@github-actions github-actions bot added category: continuous batching Continuous batching no-match-files category: Image generation samples GenAI Image generation samples and removed category: LLM samples GenAI LLM samples labels May 6, 2025
@likholat
Copy link
Contributor

likholat commented May 7, 2025

@dkalinowski, we've discussed this POC with @Wovchena and @ilya-lavrenov. It looks like the simplest and most transparent solution is to create new inference_requests using clone() method.

The clone() method should be implemented at the following levels:

  1. Tasks: Text2ImagePipeline, Image2ImagePipeline, InpaintingPipeline
  2. Pipelines: StableDiffusionPipeline, StableDiffusionXLPipeline, ...
  3. Models: SD3Transformer2DModel, UNet2DConditionModel, AutoencoderKL, ...

Example: Text2ImagePipeline:

When calling Text2ImagePipeline::clone(), internally it will:

  1. Call clone() on the corresponding diffusion pipeline (e.g. StableDiffusionPipeline::clone(...)), which:
    1. Reconstructs a new Scheduler from the config of the original
    2. Calls clone() for each model in the pipeline (e.g. CLIPTextModel::clone(...)), where a new inference_request is created internally

@dkalinowski
Copy link
Collaborator Author

@dkalinowski, we've discussed this POC with @Wovchena and @ilya-lavrenov. It looks like the simplest and most transparent solution is to create new inference_requests using clone() method.

The clone() method should be implemented at the following levels:

  1. Tasks: Text2ImagePipeline, Image2ImagePipeline, InpaintingPipeline
  2. Pipelines: StableDiffusionPipeline, StableDiffusionXLPipeline, ...
  3. Models: SD3Transformer2DModel, UNet2DConditionModel, AutoencoderKL, ...

Example: Text2ImagePipeline:

When calling Text2ImagePipeline::clone(), internally it will:

  1. Call clone() on the corresponding diffusion pipeline (e.g. StableDiffusionPipeline::clone(...)), which:

    1. Reconstructs a new Scheduler from the config of the original
    2. Calls clone() for each model in the pipeline (e.g. CLIPTextModel::clone(...)), where a new inference_request is created internally

Thank you for the review. Here is the corrected version with the clone idea: #2190

Could you review before I proceed with remaining pipeline types? @likholat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: Image generation samples GenAI Image generation samples category: image generation Image generation pipelines category: Python API Python API for GenAI category: tokenizers Tokenizer class or submodule update no-match-files WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants