Skip to content

Add Phi-4-multimodal-instruct #2221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: releases/2025/2
Choose a base branch
from

Conversation

Wovchena
Copy link
Collaborator

@Wovchena Wovchena commented May 16, 2025

Ticket 162874

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for a new multimodal instruct model ("phi4mm") to the visual language module.

  • Adds a new enum value (PHI4MM) and corresponding mapping in configuration files.
  • Implements new vision encoder and inputs embedder classes for the PHI4MM model.
  • Updates friend declarations and sample usage to integrate the new model.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/cpp/src/visual_language/vlm_config.hpp Added PHI4MM to the VLMModelType enum.
src/cpp/src/visual_language/vlm_config.cpp Added string mapping and model-specific assertions for PHI4MM.
src/cpp/src/visual_language/vision_encoder.cpp Integrated PHI4MM by adding branches that create VisionEncoderPhi4MM.
src/cpp/src/visual_language/phi4mm/classes.hpp Introduced new classes for the PHI4MM model’s vision encoding and inputs embedding.
src/cpp/src/visual_language/inputs_embedder.hpp Added friend declaration for InputsEmbedderPhi4MM.
src/cpp/src/visual_language/inputs_embedder.cpp Added conditions to construct InputsEmbedderPhi4MM for both initialization paths.
samples/cpp/visual_language_chat/visual_language_chat.cpp Updated sample application, including changes to exception handling and commented out code.
Comments suppressed due to low confidence (1)

samples/cpp/visual_language_chat/visual_language_chat.cpp:50

  • The previous try-catch block for error handling was removed from main. Please verify that proper exception management is handled elsewhere or reintroduce error handling to avoid abrupt termination.
}-} catch (const std::exception& error) {

@github-actions github-actions bot added category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers category: GH Pages Docs Github Pages documentation labels May 21, 2025
@Wovchena Wovchena changed the base branch from master to releases/2025/2 May 22, 2025 13:30
@github-actions github-actions bot removed the category: VLM samples GenAI VLM samples label May 23, 2025
@Wovchena Wovchena marked this pull request as ready for review May 23, 2025 13:30
@Wovchena Wovchena requested a review from Copilot May 23, 2025 13:33
@Wovchena Wovchena changed the title Add phi 4 multimodal instruct Add Phi-4-multimodal-instruct May 23, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the new Phi4 multimodal instruct model across both Python and C++ components of the visual language pipeline, updating documentation strings, configuration enums, and model-specific encoder and embedder implementations.

  • Updated docstrings and configuration enums to include Phi-4-multimodal-instruct.
  • Added model type cases and implementations for PHI4MM in C++ and Python bindings.
  • Updated documentation for supported models to reflect the new Phi4MM entry.

Reviewed Changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/python/py_vlm_pipeline.cpp Added Phi-4 multimodal instruct model description in docstrings
src/python/openvino_genai/py_openvino_genai.pyi Updated docstrings to include Phi-4 multimodal instruct
src/cpp/src/visual_language/vlm_config.hpp Extended enum and comments to cover PHI4MM
src/cpp/src/visual_language/vlm_config.cpp Registered "phi4mm" in model type mapping and applied size assertion changes
src/cpp/src/visual_language/vision_encoder.{hpp,cpp} Added cases for PHI4MM vision encoder creation
src/cpp/src/visual_language/phi4mm/classes.hpp Introduced new class definitions for Phi4MM implementations
src/cpp/src/visual_language/phi3_vision/{classes.hpp,classes.cpp} Updated usage of prompt normalization and token embedding functions with additional parameter support
src/cpp/src/visual_language/inputs_embedder.{hpp,cpp} Integrated InputsEmbedderPhi4MM into the embedder selection logic
src/cpp/src/debug_utils.hpp Added type mapping for "<i8" in npy conversion
src/cpp/include/openvino/genai/visual_language/pipeline.hpp Updated pipeline documentation for the new model
site/docs/supported-models/_components/vlm-models-table/models.ts Added a new entry for the Phi4MM model
Files not reviewed (1)
  • site/docs/use-cases/image-processing/_sections/_usage_options/index.mdx: Language not supported
Comments suppressed due to low confidence (1)

src/cpp/src/debug_utils.hpp:147

  • Mapping '<i8' to ov::element::i64 is unexpected if '<i8' is intended to represent an 8-bit signed integer; please verify that this mapping is correct.
    } else if ("<i8" == type) {

@Wovchena Wovchena requested review from yatarkan and as-suvorov May 23, 2025 14:25
@as-suvorov as-suvorov self-assigned this May 27, 2025
@github-actions github-actions bot added the category: VLM samples GenAI VLM samples label May 27, 2025
@github-actions github-actions bot removed the category: VLM samples GenAI VLM samples label May 27, 2025
@as-suvorov as-suvorov requested a review from rkazants May 27, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPP API Changes in GenAI C++ public headers category: GH Pages Docs Github Pages documentation category: Python API Python API for GenAI category: visual language Visual language pipeline no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants