Add Phi-4-multimodal-instruct #2221

Wovchena · 2025-05-16T09:38:44Z

Ticket 162874

Copilot

Pull Request Overview

This PR adds support for a new multimodal instruct model ("phi4mm") to the visual language module.

Adds a new enum value (PHI4MM) and corresponding mapping in configuration files.
Implements new vision encoder and inputs embedder classes for the PHI4MM model.
Updates friend declarations and sample usage to integrate the new model.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/cpp/src/visual_language/vlm_config.hpp	Added PHI4MM to the VLMModelType enum.
src/cpp/src/visual_language/vlm_config.cpp	Added string mapping and model-specific assertions for PHI4MM.
src/cpp/src/visual_language/vision_encoder.cpp	Integrated PHI4MM by adding branches that create VisionEncoderPhi4MM.
src/cpp/src/visual_language/phi4mm/classes.hpp	Introduced new classes for the PHI4MM model’s vision encoding and inputs embedding.
src/cpp/src/visual_language/inputs_embedder.hpp	Added friend declaration for InputsEmbedderPhi4MM.
src/cpp/src/visual_language/inputs_embedder.cpp	Added conditions to construct InputsEmbedderPhi4MM for both initialization paths.
samples/cpp/visual_language_chat/visual_language_chat.cpp	Updated sample application, including changes to exception handling and commented out code.

Comments suppressed due to low confidence (1)

samples/cpp/visual_language_chat/visual_language_chat.cpp:50

The previous try-catch block for error handling was removed from main. Please verify that proper exception management is handled elsewhere or reintroduce error handling to avoid abrupt termination.

}-} catch (const std::exception& error) {

src/cpp/src/visual_language/vlm_config.cpp

samples/cpp/visual_language_chat/visual_language_chat.cpp

* Add Phi4MM to supported models * Add Phi4MM image tag examples

… into add-Phi-4-multimodal-instruct

Copilot

Pull Request Overview

This PR adds support for the new Phi4 multimodal instruct model across both Python and C++ components of the visual language pipeline, updating documentation strings, configuration enums, and model-specific encoder and embedder implementations.

Updated docstrings and configuration enums to include Phi-4-multimodal-instruct.
Added model type cases and implementations for PHI4MM in C++ and Python bindings.
Updated documentation for supported models to reflect the new Phi4MM entry.

Reviewed Changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/python/py_vlm_pipeline.cpp	Added Phi-4 multimodal instruct model description in docstrings
src/python/openvino_genai/py_openvino_genai.pyi	Updated docstrings to include Phi-4 multimodal instruct
src/cpp/src/visual_language/vlm_config.hpp	Extended enum and comments to cover PHI4MM
src/cpp/src/visual_language/vlm_config.cpp	Registered "phi4mm" in model type mapping and applied size assertion changes
src/cpp/src/visual_language/vision_encoder.{hpp,cpp}	Added cases for PHI4MM vision encoder creation
src/cpp/src/visual_language/phi4mm/classes.hpp	Introduced new class definitions for Phi4MM implementations
src/cpp/src/visual_language/phi3_vision/{classes.hpp,classes.cpp}	Updated usage of prompt normalization and token embedding functions with additional parameter support
src/cpp/src/visual_language/inputs_embedder.{hpp,cpp}	Integrated InputsEmbedderPhi4MM into the embedder selection logic
src/cpp/src/debug_utils.hpp	Added type mapping for "<i8" in npy conversion
src/cpp/include/openvino/genai/visual_language/pipeline.hpp	Updated pipeline documentation for the new model
site/docs/supported-models/_components/vlm-models-table/models.ts	Added a new entry for the Phi4MM model

Files not reviewed (1)

site/docs/use-cases/image-processing/_sections/_usage_options/index.mdx: Language not supported

Comments suppressed due to low confidence (1)

src/cpp/src/debug_utils.hpp:147

Mapping '<i8' to ov::element::i64 is unexpected if '<i8' is intended to represent an 8-bit signed integer; please verify that this mapping is correct.

    } else if ("<i8" == type) {

src/cpp/src/visual_language/vlm_config.cpp

… into add-Phi-4-multimodal-instruct

yatarkan and others added 2 commits May 8, 2025 20:22

Add phi4mm classes with mocked vision encoder

24e50cd

Merge branch 'master' into add-Phi-4-multimodal-instruct

8c72e18

Wovchena requested a review from Copilot May 16, 2025 09:38

Wovchena mentioned this pull request May 16, 2025

Add phi4mm classes with mocked vision encoder #2186

Closed

github-actions bot added category: visual language Visual language pipeline category: VLM samples GenAI VLM samples labels May 16, 2025

Copilot AI reviewed May 16, 2025

View reviewed changes

src/cpp/src/visual_language/vlm_config.cpp Outdated Show resolved Hide resolved

samples/cpp/visual_language_chat/visual_language_chat.cpp Outdated Show resolved Hide resolved

Wovchena and others added 13 commits May 16, 2025 13:39

Fix compilation

0ace953

Add docstring

1acef38

Factor common out

2aceefd

rename

3ace40e

add traced

4ace2b1

Merge branch 'master' into add-Phi-4-multimodal-instruct

0783320

Fix compilation

0acebc3

Infer projector

1ace08a

update preprocessor

2ace31f

Merge branch 'master' into add-Phi-4-multimodal-instruct

6133281

Remove line

0acea1a

cast arg

1ace24a

Add Phi4MM docs (#114)

f0c8ca4

* Add Phi4MM to supported models * Add Phi4MM image tag examples

github-actions bot added category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers category: GH Pages Docs Github Pages documentation labels May 21, 2025

Wovchena added 3 commits May 21, 2025 20:13

Fix Windows

0d5b785

Remove model inputs

0acebcd

Retrace preprocessor

1ace3b0

Wovchena changed the base branch from master to releases/2025/2 May 22, 2025 13:30

Mock patch position ids (#116)

56bd09e

github-actions bot added the no-match-files label May 23, 2025

Trace patch_position_ids

0ace7fd

github-actions bot removed the category: VLM samples GenAI VLM samples label May 23, 2025

Wovchena and others added 3 commits May 23, 2025 15:33

try catch

1ace9e6

Remove newline

3867525

Merge remote-tracking branch 'wovchena/add-Phi-4-multimodal-instruct'…

be74a85

… into add-Phi-4-multimodal-instruct

Wovchena marked this pull request as ready for review May 23, 2025 13:30

Wovchena requested a review from Copilot May 23, 2025 13:33

Wovchena changed the title ~~Add phi 4 multimodal instruct~~ Add Phi-4-multimodal-instruct May 23, 2025

Copilot AI reviewed May 23, 2025

View reviewed changes

src/cpp/src/visual_language/vlm_config.cpp Show resolved Hide resolved

andrei-kochin and others added 2 commits May 23, 2025 17:43

Merge branch 'releases/2025/2' into add-Phi-4-multimodal-instruct

0ffd31b

Add phi4mm to python tests

c767561

Wovchena requested review from yatarkan and as-suvorov May 23, 2025 14:25

yatarkan and others added 2 commits May 26, 2025 15:02

Add phi4mm requirements for python tests

f89f240

Add large images preprocessing

61c212a

as-suvorov self-assigned this May 27, 2025

as-suvorov added 2 commits May 27, 2025 17:40

Refactor code

23876ed

Merge remote-tracking branch 'wovchena/add-Phi-4-multimodal-instruct'…

67cf47f

… into add-Phi-4-multimodal-instruct

github-actions bot added the category: VLM samples GenAI VLM samples label May 27, 2025

Revert sample

be5553a

github-actions bot removed the category: VLM samples GenAI VLM samples label May 27, 2025

as-suvorov requested a review from rkazants May 27, 2025 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Phi-4-multimodal-instruct #2221

Add Phi-4-multimodal-instruct #2221

Uh oh!

Wovchena commented May 16, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Add Phi-4-multimodal-instruct #2221

Are you sure you want to change the base?

Add Phi-4-multimodal-instruct #2221

Uh oh!

Conversation

Wovchena commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Wovchena commented May 16, 2025 •

edited

Loading