Skip to content

Conversation

@jade-cho
Copy link
Contributor

Description

Optimize the image preprocessing of the Phi-3.5 Vision model on GPU.

  • Create ov::Model for image preprocessing.
  • Combine it to vision encoding model as input.

CVS-176393

Copilot AI review requested due to automatic review settings November 24, 2025 14:03
@github-actions github-actions bot added the category: visual language Visual language pipeline label Nov 24, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes image preprocessing for the Phi-3.5 Vision model on GPU by creating an OpenVINO preprocessing model that can be combined with the vision encoding model, moving preprocessing operations from CPU to GPU.

Key changes:

  • Added GPU preprocessing support controlled by device type and environment variable
  • Created create_combined_preprocessing_vision_model() to build preprocessing operations as an OpenVINO model
  • Modified constructor to conditionally compile combined preprocessing+vision model for GPU devices

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/cpp/src/visual_language/phi3_vision/classes.hpp Added private members for GPU preprocessing flag and model creation function
src/cpp/src/visual_language/phi3_vision/classes.cpp Implemented GPU preprocessing logic with device detection, model creation, and conditional preprocessing paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// global_processed: (1, C, H, W)
// hd_sliced: (num_slices, C, H, W)
// Output: (1 + num_slices, C, H, W)
return std::make_shared<v0::Concat>(NodeVector{global_processed, hd_sliced}, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that the lambda isn't needed. Execute that make_shared directly at place where create_concatenate_batch is called

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

auto create_pad_to_max_crops = [&](std::shared_ptr<Node> input_nchw, std::shared_ptr<Node> max_crops_param) {
auto create_constant_i64 = [](const std::vector<int64_t>& val) {
return v0::Constant::create(element::i64, Shape{val.size()}, val);
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert to standalone function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

bool is_gpu_device = (device == "GPU" || device.find("GPU") == 0);

if (!is_gpu_device) {
// Always use CPU preprocessing for non-GPU devices
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the new model block is used only for GPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified it following qwen2vl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to also support CPU.

bool use_ov_image_preprocess = false; // Default to false, will be set based on device and environment

// GPU preprocessing model creation function
std::shared_ptr<ov::Model> create_combined_preprocessing_vision_model(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does create_combined_preprocessing_vision_model() need to be a member function? Convert to standalone function in anonymous namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Share perf results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st token latency on 12th Gen Intel(R) Core(TM) i9-12900K

GPU
(1) Optimized CPU image processing + GPU
(2) OV image processing (GPU) + GPU

iter 2025.4rc2 (1) (2)
warm-up 383.75 345.73 355.34
1 256.18 219.62 207.13
2 152.82 108.02 98.35
3 144.49 109.55 103.41
4 156.77 106.60 100.19
avg. 177.57 135.95 127.27
avg. w/o #1 151.36 108.06 100.65

CPU
(1) Optimized CPU image processing + CPU
(2) OV image processing (CPU) + CPU

iter 2025.4rc2 (1) (2)
warm-up 5851.68 5840.50 5772.77
1 1191.25 1155.78 1144.94
2 1194.99 1152.73 1145.11
3 1186.30 1151.35 1139.59
4 1184.42 1153.32 1141.33
avg. 1189.24 1153.30 1142.74

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be implemented in optimum-intel instead? In that case optimum-intel will also get faster and optimum-intel will export the model with the new block, so GenAI won't need to have it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To work on optimum-intel, manager approval is required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's lack of transparency. I created CVS-177192 in attempt to mitigate that. The problem you are solving here is broader and affects all VLMs. And there seem to be a better and simpler solution involving optimum-intel. It's simpler because Python tends to be simpler then C++.
The reason qwen2vl is implemented that way is that it was specifically optimized for GPU and we didn't think that deep about this problem back then.

Since you already done that partially for Phi-3.5, I suggest finalizing that on optimum-intel side. Let me know if my action is required for manager approval you mentioned.

You posed perf number for CPU and GPU and both improved, so we are good here. For WWB you still need to compare this 0.817479 with the baseline and run the same test for GPU. This can be done using GenAI instead of optimum-intel since you already have the implementation.

Before proceeding, we need @rkazants approval as optimum-intel maintainer. @rkazants, are you OK with the proposal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wovchena This 0.817479 is from GPU test, while using CPU image preprocessing on GPU gives 0.837313. The reason is the mean scaling, and when optimizing CPU image preprocessing, this function was also excluded from optimization. Therefore, I have some questions:

  1. Should this similarity score never decrease? Is a trade-off not acceptable?
  2. What does “baseline” mean? Is there a fixed version of OV and GenAI for baseline?
  3. Is the WWB similarity value for CPU also required?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Minor decrease is possible. I hope it becomes irrelevant after resize is aligned and the model absorbs the difference introduced by mean scaling.
  2. Baseline is the similarity score for the same device on GenAI version without this PR
  3. Yes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. CPU with your patch so you should remove this limitation that it's applied to GPU only like you did for performance

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jade-cho, @Wovchena, our team will take this AR to extend optimum-intel.
We need more context about it, Please describe what exactly is needed in optimum-intel. I also sync with @Wovchena today to ask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkazants am I understanding correctly that you will take over this PR to merge into optimum-intel?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jade-cho, we agreed with @Wovchena do not do it in optimum-intel. We agreed to do such optimization in OpenVINO GenAI side only. Please proceed in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Share WWB results

Copy link
Contributor Author

@jade-cho jade-cho Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2
INFO:whowhatbench.model_loaders:Using OpenVINO GenAI API
INFO:whowhatbench.model_loaders:Using OpenVINO GenAI VLMPipeline API
The repository /home/jade/WW44_llm-optimum_2025.4.0-20329/phi-3.5-vision-instruct/pytorch/ov/OV_FP16-4BIT_DEFAULT/ contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/jade/WW44_llm-optimum_2025.4.0-20329/phi-3.5-vision-instruct/pytorch/ov/OV_FP16-4BIT_DEFAULT .
 You can inspect the repository content at https://hf.co//home/jade/WW44_llm-optimum_2025.4.0-20329/phi-3.5-vision-instruct/pytorch/ov/OV_FP16-4BIT_DEFAULT/.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
Evaluate pipeline: 24it [00:14,  1.62it/s]
...
Similarity evaluation: 24it [00:00, 29.36it/s]                                                                                                                                                          | 0/1 [00:00<?, ?it/s]
INFO:whowhatbench.wwb:Metrics for model: /home/jade/WW44_llm-optimum_2025.4.0-20329/phi-3.5-vision-instruct/pytorch/ov/OV_FP16-4BIT_DEFAULT/
INFO:whowhatbench.wwb:   similarity
0    0.817479

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WWB similarity

Device 2025.4rc2 PR
CPU 0.821846 0.821519
GPU 0.837313 0.817479

@jade-cho jade-cho force-pushed the phi3_gpu_preprocessing branch from 95fcbcf to af2e7c4 Compare November 25, 2025 05:41
Copilot AI review requested due to automatic review settings November 27, 2025 00:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 27, 2025 00:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 27, 2025 00:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 27, 2025 00:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/cpp/src/visual_language/phi3_vision/classes.cpp:1

  • [nitpick] The comment states 'batch=1' but the actual shape uses '{1, -1, -1, 3}' where batch size is fixed at 1. The comment is accurate for this specific usage, but consider clarifying that the batch dimension is fixed to 1 for this preprocessing pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 27, 2025 00:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jade-cho jade-cho force-pushed the phi3_gpu_preprocessing branch from 87f9bc8 to 5804ddb Compare November 27, 2025 08:11
Copilot AI review requested due to automatic review settings December 1, 2025 02:07
@jade-cho jade-cho force-pushed the phi3_gpu_preprocessing branch from 5804ddb to be153a3 Compare December 1, 2025 02:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jade-cho jade-cho requested a review from Wovchena December 1, 2025 05:53
@Wovchena Wovchena requested a review from yatarkan December 3, 2025 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: visual language Visual language pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants