Skip to content

Add Phi-4-multimodal-instruct #2221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
24e50cd
Add phi4mm classes with mocked vision encoder
yatarkan May 8, 2025
8c72e18
Merge branch 'master' into add-Phi-4-multimodal-instruct
Wovchena May 16, 2025
0ace953
Fix compilation
Wovchena May 16, 2025
1acef38
Add docstring
Wovchena May 16, 2025
2aceefd
Factor common out
Wovchena May 16, 2025
3ace40e
rename
Wovchena May 16, 2025
4ace2b1
add traced
Wovchena May 20, 2025
0783320
Merge branch 'master' into add-Phi-4-multimodal-instruct
Wovchena May 21, 2025
0acebc3
Fix compilation
Wovchena May 21, 2025
1ace08a
Infer projector
Wovchena May 21, 2025
2ace31f
update preprocessor
Wovchena May 21, 2025
6133281
Merge branch 'master' into add-Phi-4-multimodal-instruct
Wovchena May 21, 2025
0acea1a
Remove line
Wovchena May 21, 2025
1ace24a
cast arg
Wovchena May 21, 2025
f0c8ca4
Add Phi4MM docs (#114)
yatarkan May 21, 2025
0d5b785
Fix Windows
Wovchena May 21, 2025
0acebcd
Remove model inputs
Wovchena May 22, 2025
1ace3b0
Retrace preprocessor
Wovchena May 22, 2025
56bd09e
Mock patch position ids (#116)
yatarkan May 23, 2025
0ace7fd
Trace patch_position_ids
Wovchena May 23, 2025
1ace9e6
try catch
Wovchena May 23, 2025
3867525
Remove newline
as-suvorov May 23, 2025
be74a85
Merge remote-tracking branch 'wovchena/add-Phi-4-multimodal-instruct'…
as-suvorov May 23, 2025
0ffd31b
Merge branch 'releases/2025/2' into add-Phi-4-multimodal-instruct
May 23, 2025
c767561
Add phi4mm to python tests
yatarkan May 23, 2025
f89f240
Add phi4mm requirements for python tests
yatarkan May 26, 2025
61c212a
Add large images preprocessing
as-suvorov May 26, 2025
23876ed
Refactor code
as-suvorov May 27, 2025
67cf47f
Merge remote-tracking branch 'wovchena/add-Phi-4-multimodal-instruct'…
as-suvorov May 27, 2025
be5553a
Revert sample
as-suvorov May 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 2 additions & 12 deletions samples/cpp/visual_language_chat/visual_language_chat.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ bool print_subword(std::string&& subword) {
return !(std::cout << subword << std::flush);
}

int main(int argc, char* argv[]) try {
int main(int argc, char* argv[]) {
if (argc < 3 || argc > 4) {
throw std::runtime_error(std::string{"Usage "} + argv[0] + " <MODEL_DIR> <IMAGE_FILE OR DIR_WITH_IMAGES> <DEVICE>");
}
Expand Down Expand Up @@ -37,7 +37,7 @@ int main(int argc, char* argv[]) try {

std::getline(std::cin, prompt);
pipe.generate(prompt,
ov::genai::images(rgbs),
// ov::genai::images(rgbs),
ov::genai::generation_config(generation_config),
ov::genai::streamer(print_subword));
std::cout << "\n----------\n"
Expand All @@ -50,14 +50,4 @@ int main(int argc, char* argv[]) try {
"question:\n";
}
pipe.finish_chat();
} catch (const std::exception& error) {
try {
std::cerr << error.what() << '\n';
} catch (const std::ios_base::failure&) {}
return EXIT_FAILURE;
} catch (...) {
try {
std::cerr << "Non-exception object thrown\n";
} catch (const std::ios_base::failure&) {}
return EXIT_FAILURE;
}
5 changes: 5 additions & 0 deletions src/cpp/src/visual_language/inputs_embedder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "visual_language/qwen2vl/classes.hpp"
#include "visual_language/qwen2_5_vl/classes.hpp"
#include "visual_language/phi3_vision/classes.hpp"
#include "visual_language/phi4mm/classes.hpp"
#include "visual_language/minicpm/classes.hpp"
#include "visual_language/llava/classes.hpp"
#include "visual_language/llava_next/classes.hpp"
Expand Down Expand Up @@ -210,6 +211,8 @@ InputsEmbedder::InputsEmbedder(const std::filesystem::path& model_dir,
m_impl = std::make_shared<InputsEmbedderInternVLChat>(vlm_config, model_dir, device, device_config);
} else if (vlm_config.model_type == VLMModelType::PHI3_V) {
m_impl = std::make_shared<InputsEmbedderPhi3V>(vlm_config, model_dir, device, device_config);
} else if (vlm_config.model_type == VLMModelType::PHI4MM) {
m_impl = std::make_shared<InputsEmbedderPhi4MM>(vlm_config, model_dir, device, device_config);
} else if (vlm_config.model_type == VLMModelType::QWEN2_VL) {
m_impl = std::make_shared<InputsEmbedderQwen2VL>(vlm_config, model_dir, device, device_config);
} else if (vlm_config.model_type == VLMModelType::QWEN2_5_VL) {
Expand All @@ -236,6 +239,8 @@ InputsEmbedder::InputsEmbedder(const ModelsMap& models_map,
m_impl = std::make_shared<InputsEmbedderInternVLChat>(vlm_config, models_map, tokenizer, config_dir_path, device, device_config);
} else if (vlm_config.model_type == VLMModelType::PHI3_V) {
m_impl = std::make_shared<InputsEmbedderPhi3V>(vlm_config, models_map, tokenizer, config_dir_path, device, device_config);
} else if (vlm_config.model_type == VLMModelType::PHI4MM) {
m_impl = std::make_shared<InputsEmbedderPhi4MM>(vlm_config, models_map, tokenizer, config_dir_path, device, device_config);
} else if (vlm_config.model_type == VLMModelType::QWEN2_VL) {
m_impl = std::make_shared<InputsEmbedderQwen2VL>(vlm_config, models_map, tokenizer, config_dir_path, device, device_config);
} else if (vlm_config.model_type == VLMModelType::QWEN2_5_VL) {
Expand Down
1 change: 1 addition & 0 deletions src/cpp/src/visual_language/inputs_embedder.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ class InputsEmbedder {
friend class InputsEmbedderLLaVANext;
friend class InputsEmbedderInternVLChat;
friend class InputsEmbedderPhi3V;
friend class InputsEmbedderPhi4MM;
friend class InputsEmbedderQwen2VL;
friend class InputsEmbedderQwen2_5_VL;
};
Expand Down
Loading
Loading