Skip to content

Conversation

stevhliu
Copy link
Member

@stevhliu stevhliu commented Sep 4, 2025

Reduces bloat by removing the ## Model sharding section because it isn't really about distributed inference (its more selectively loading and deleting models) and doesn't show processing multiple prompts in parallel.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@stevhliu stevhliu requested a review from sayakpaul September 4, 2025 18:47
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. We will have to prepare it better once #11941 is in ;)

> [!TIP]
> You can use `device_map` within a [`DiffusionPipeline`] to distribute its model-level components on multiple devices. Refer to the [Device placement](../tutorials/inference_with_big_models#device-placement) guide to learn more.

## Model sharding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it because it seems more like a "recipe" for progressively and strategically fitting models on a GPU by loading and removing them. I don't think a user is really learning anything new/useful about device_map here compared to the device placement docs to which there is a link at the bottom.

I would suggest removing it or at least moving it to Resources > Task Recipes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants