Skip to content

Optimize model device placement by moving to load_model function#3

Merged
GilbertKrantz merged 2 commits into
mainfrom
copilot/optimize-model-loading-time
Oct 24, 2025
Merged

Optimize model device placement by moving to load_model function#3
GilbertKrantz merged 2 commits into
mainfrom
copilot/optimize-model-loading-time

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Oct 23, 2025

Summary

Fixed inefficient model device placement in the Gradio inference application. The model is now moved to the device once during loading instead of repeatedly on every prediction call.

Problem

Previously, the model was being moved to the device on every prediction:

def predict_image(image: np.ndarray, model_type: str) -> tuple[dict, np.ndarray]:
    model = load_model(model_type)
    model.to(device)  # ❌ Inefficient: device transfer on every prediction
    # ... rest of prediction logic

This meant that every time a user requested a prediction, the model would be transferred to the device (CPU or GPU), even though it was already on the correct device from the previous prediction. This added unnecessary overhead to each inference call.

Solution

Moved the model.to(device) call from predict_image() to load_model():

def load_model(model_type: str = "efficientvit") -> nn.Module:
    # ... model creation and weight loading
    model.to(device)  # ✅ Efficient: device transfer happens once
    model.eval()
    return model

def predict_image(image: np.ndarray, model_type: str) -> tuple[dict, np.ndarray]:
    model = load_model(model_type)
    # model.to(device) removed - no longer needed
    # ... rest of prediction logic

Impact

  • Performance: Eliminates redundant device transfers on every prediction call
  • Code Quality: Device placement now happens in the logical place (during model loading)
  • Minimal Changes: Only 2 lines modified, ensuring surgical precision

Fixes issue raised in PR #1

Original prompt

This section details on the original issue you should resolve

<issue_title>Moving the model to device on every prediction call is inefficient. The model should be moved to device once during loading in the load_model function, not repeatedly in predict_image.</issue_title>
<issue_description>Moving the model to device on every prediction call is inefficient. The model should be moved to device once during loading in the load_model function, not repeatedly in predict_image.


Originally posted by @copilot in #1 (comment)
</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #2


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: GilbertKrantz <90319182+GilbertKrantz@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor model loading to optimize prediction efficiency Optimize model device placement by moving to load_model function Oct 23, 2025
Copilot AI requested a review from GilbertKrantz October 23, 2025 15:40
@GilbertKrantz GilbertKrantz marked this pull request as ready for review October 24, 2025 14:54
@GilbertKrantz GilbertKrantz merged commit 39e847c into main Oct 24, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants