Skip to content

Support multi images for vlm benchmarking in samples and llm_bench #2197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 61 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
bffaa94
support multi images for vlm test
wgzintel May 12, 2025
cdd8f90
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 13, 2025
9edc628
code format
wgzintel May 13, 2025
e512c31
using ov::genai::images to convert images
wgzintel May 13, 2025
9fccf73
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 13, 2025
eed1dd7
fix none Type
wgzintel May 13, 2025
82aa6aa
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 13, 2025
503b74e
fix NoneTyPE in optimim-intel pipeline
wgzintel May 13, 2025
7c48e7e
Merge branch 'guozhong/support_multi_files_for_vlm_test' of https://g…
wgzintel May 13, 2025
fa68faa
Support read images from dir
wgzintel May 14, 2025
98e90e4
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 14, 2025
8e62754
fix cmake_list.txt
wgzintel May 15, 2025
3c8b091
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 15, 2025
5585335
Output token size in benchmark_genai.cpp
wgzintel May 15, 2025
2c13704
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 16, 2025
fd8c859
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 18, 2025
771e928
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 20, 2025
e902812
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 21, 2025
f70ab0d
print ov version
wgzintel May 21, 2025
b6240fd
using load_image() in optimum pipeline
wgzintel May 21, 2025
36e3dad
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 22, 2025
e004cea
Make it an error if prompt_file and prompt are given at the same time
wgzintel May 22, 2025
537e2d6
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 22, 2025
10c9940
revert get prompt from default args
wgzintel May 22, 2025
65f5c02
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 22, 2025
9bb9081
Remove redundant code
wgzintel May 22, 2025
e1e5326
get prompt token size from shape[1]
wgzintel May 22, 2025
8a88215
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 22, 2025
6688a09
remove if
wgzintel May 23, 2025
16faddc
Update samples/cpp/text_generation/benchmark_genai.cpp
wgzintel May 23, 2025
f48ae43
Update samples/cpp/visual_language_chat/benchmark_vlm.cpp
wgzintel May 23, 2025
9c7fa07
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 24, 2025
0936b50
Update benchmark_genai.py, benchmark_vlm.py and readme
wgzintel May 27, 2025
2dbba1b
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 27, 2025
9d8aa7f
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 30, 2025
d9000a9
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 2, 2025
41ce10c
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
peterchen-intel Jun 7, 2025
ac12a98
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 10, 2025
72e999f
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
peterchen-intel Jun 12, 2025
1348dd1
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
peterchen-intel Jun 13, 2025
9980152
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 13, 2025
8ac1fc5
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 13, 2025
f484244
Update samples/cpp/text_generation/read_prompt_from_file.cpp
wgzintel Jun 13, 2025
cb07e0b
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 13, 2025
507f48a
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 16, 2025
624e8fc
default values
wgzintel Jun 17, 2025
013fac4
Merge branch 'guozhong/support_multi_files_for_vlm_test' of https://g…
wgzintel Jun 17, 2025
689d264
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel Jun 17, 2025
5121bfb
Use the regular assignment for scheduler_config
wgzintel Jun 17, 2025
4537a2b
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel Jun 17, 2025
d4da4b6
Update samples/cpp/text_generation/read_prompt_from_file.cpp
wgzintel Jun 18, 2025
1b8ecba
Update tools/llm_bench/task/visual_language_generation.py
wgzintel Jun 18, 2025
9cc975b
Update tools/llm_bench/task/visual_language_generation.py
wgzintel Jun 18, 2025
cf910e0
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 18, 2025
3e3a321
print input image nums for vlm
wgzintel Jun 18, 2025
2c6872f
Remove the corresponding return
wgzintel Jun 18, 2025
1a13411
remove if input_data.get("media", None)
wgzintel Jun 18, 2025
9b692e3
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 19, 2025
ed896f5
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel Jun 19, 2025
61a2f22
Merge branch 'guozhong/support_multi_files_for_vlm_test' of https://g…
wgzintel Jun 19, 2025
62e627a
resolve conflict
wgzintel Jun 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion samples/cpp/text_generation/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -46,7 +46,7 @@ FetchContent_Declare(cxxopts
URL_HASH SHA256=523175f792eb0ff04f9e653c90746c12655f10cb70f1d5e6d6d9491420298a08)
FetchContent_MakeAvailable(cxxopts)

add_executable(benchmark_genai benchmark_genai.cpp)
add_executable(benchmark_genai benchmark_genai.cpp read_prompt_from_file.cpp)
target_link_libraries(benchmark_genai PRIVATE openvino::genai cxxopts::cxxopts)
set_target_properties(benchmark_genai PROPERTIES
# Ensure out of box LC_RPATH on macOS with SIP
3 changes: 2 additions & 1 deletion samples/cpp/text_generation/README.md
Original file line number Diff line number Diff line change
@@ -161,7 +161,8 @@ For more information how performance metrics are calculated please follow [perfo
```
#### Options
- `-m, --model`: Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `"The Sky is blue because"`): The prompt to generate text.
- `-p, --prompt` (default: ''): The prompt to generate text. If without `-p` and `--pf`, the default prompt is `"The Sky is blue because"`
- `--pf, --prompt_file` Read prompt from file.
- `--nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `--mt, --max_new_tokens` (default: `20`): Maximal number of new tokens.
- `-n, --num_iter` (default: `3`): Number of iterations.
34 changes: 31 additions & 3 deletions samples/cpp/text_generation/benchmark_genai.cpp
Original file line number Diff line number Diff line change
@@ -3,13 +3,15 @@

#include "openvino/genai/llm_pipeline.hpp"
#include <cxxopts.hpp>
#include "read_prompt_from_file.h"

int main(int argc, char* argv[]) try {
cxxopts::Options options("benchmark_vanilla_genai", "Help command");

options.add_options()
("m,model", "Path to model and tokenizers base directory", cxxopts::value<std::string>())
("p,prompt", "Prompt", cxxopts::value<std::string>()->default_value("The Sky is blue because"))
("p,prompt", "Prompt", cxxopts::value<std::string>()->default_value(""))
("pf,prompt_file", "Read prompt from file", cxxopts::value<std::string>())
("nw,num_warmup", "Number of warmup iterations", cxxopts::value<size_t>()->default_value(std::to_string(1)))
("n,num_iter", "Number of iterations", cxxopts::value<size_t>()->default_value(std::to_string(3)))
("mt,max_new_tokens", "Maximal number of new tokens", cxxopts::value<size_t>()->default_value(std::to_string(20)))
@@ -30,7 +32,22 @@ int main(int argc, char* argv[]) try {
return EXIT_SUCCESS;
}

std::string prompt = result["prompt"].as<std::string>();
std::string prompt;
if (result.count("prompt") && result.count("prompt_file")) {
std::cout << "Prompt and prompt file should not exist together!" << std::endl;
return EXIT_FAILURE;
} else {
if (result.count("prompt_file")) {
prompt = utils::read_prompt(result["prompt_file"].as<std::string>());
} else {
prompt = result["prompt"].as<std::string>().empty() ? "The Sky is blue because" : result["prompt"].as<std::string>();
}
}
if (prompt.empty()) {
std::cout << "Prompt is empty!" << std::endl;
return EXIT_FAILURE;
}

const std::string models_path = result["model"].as<std::string>();
std::string device = result["device"].as<std::string>();
size_t num_warmup = result["num_warmup"].as<size_t>();
@@ -39,7 +56,17 @@ int main(int argc, char* argv[]) try {
ov::genai::GenerationConfig config;
config.max_new_tokens = result["max_new_tokens"].as<size_t>();

ov::genai::LLMPipeline pipe(models_path, device);
ov::genai::SchedulerConfig scheduler_config;
scheduler_config.enable_prefix_caching = false;
scheduler_config.max_num_batched_tokens = std::numeric_limits<std::size_t>::max();

std::cout << ov::get_openvino_version() << std::endl;

ov::genai::LLMPipeline pipe(models_path, device, ov::genai::scheduler_config(scheduler_config));

auto input_data = pipe.get_tokenizer().encode(prompt);
size_t prompt_token_size = input_data.input_ids.get_shape()[1];
std::cout << "Prompt token size:" << prompt_token_size << std::endl;

for (size_t i = 0; i < num_warmup; i++)
pipe.generate(prompt, config);
@@ -52,6 +79,7 @@ int main(int argc, char* argv[]) try {
}

std::cout << std::fixed << std::setprecision(2);
std::cout << "Output token size:" << res.perf_metrics.get_num_generated_tokens() << std::endl;
std::cout << "Load time: " << metrics.get_load_time() << " ms" << std::endl;
std::cout << "Generate time: " << metrics.get_generate_duration().mean << " ± " << metrics.get_generate_duration().std << " ms" << std::endl;
std::cout << "Tokenization time: " << metrics.get_tokenization_duration().mean << " ± " << metrics.get_tokenization_duration().std << " ms" << std::endl;
19 changes: 19 additions & 0 deletions samples/cpp/text_generation/read_prompt_from_file.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// Copyright (C) 2023-2025 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#include <iostream>
#include <fstream>
#include "read_prompt_from_file.h"

std::string utils::read_prompt(const std::string& file_path) {
std::ifstream file(file_path);
if (file.is_open()) {
std::stringstream buffer;
buffer << file.rdbuf();
return buffer.str();
} else {
std::stringstream error_message;
error_message << "Error opening prompt file: '" << file_path << "'";
throw std::runtime_error{error_message.str()};
}
}
11 changes: 11 additions & 0 deletions samples/cpp/text_generation/read_prompt_from_file.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@

// Copyright (C) 2023-2025 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#pragma once

#include <sstream>

namespace utils {
std::string read_prompt(const std::string& file_path);
}
3 changes: 1 addition & 2 deletions samples/cpp/visual_language_chat/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -44,8 +44,7 @@ install(TARGETS encrypted_model_vlm
EXCLUDE_FROM_ALL)

# create benchmark executable

add_executable(benchmark_vlm benchmark_vlm.cpp load_image.cpp)
add_executable(benchmark_vlm benchmark_vlm.cpp load_image.cpp ../text_generation/read_prompt_from_file.cpp)
target_include_directories(benchmark_vlm PRIVATE "${CMAKE_BINARY_DIR}")
target_link_libraries(benchmark_vlm PRIVATE openvino::genai cxxopts::cxxopts)
set_target_properties(benchmark_vlm PROPERTIES
3 changes: 2 additions & 1 deletion samples/cpp/visual_language_chat/README.md
Original file line number Diff line number Diff line change
@@ -40,7 +40,8 @@ benchmark_vlm [OPTIONS]
### Options

- `-m, --model`(default: `.`): Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `What is on the image?`): The prompt to generate text.
- `-p, --prompt` (default: ''): The prompt to generate text. If without `-p` and `--pf`, the default prompt is `"What is on the image?"`
- `--pf, --prompt_file` Read prompt from file.
- `-i, --image` (default: `image.jpg`): Path to the image.
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `-mt, --max_new_tokens` (default: `20`): Maximal number of new tokens.
48 changes: 38 additions & 10 deletions samples/cpp/visual_language_chat/benchmark_vlm.cpp
Original file line number Diff line number Diff line change
@@ -6,14 +6,15 @@

#include "load_image.hpp"
#include <openvino/genai/visual_language/pipeline.hpp>

#include "../text_generation/read_prompt_from_file.h"

int main(int argc, char* argv[]) try {
cxxopts::Options options("benchmark_vlm", "Help command");

options.add_options()
("m,model", "Path to model and tokenizers base directory", cxxopts::value<std::string>()->default_value("."))
("p,prompt", "Prompt", cxxopts::value<std::string>()->default_value("What is on the image?"))
("p,prompt", "Prompt", cxxopts::value<std::string>()->default_value(""))
("pf,prompt_file", "Read prompt from file", cxxopts::value<std::string>())
("i,image", "Image", cxxopts::value<std::string>()->default_value("image.jpg"))
("nw,num_warmup", "Number of warmup iterations", cxxopts::value<size_t>()->default_value(std::to_string(1)))
("n,num_iter", "Number of iterations", cxxopts::value<size_t>()->default_value(std::to_string(3)))
@@ -35,30 +36,57 @@ int main(int argc, char* argv[]) try {
return EXIT_SUCCESS;
}

std::string prompt = result["prompt"].as<std::string>();
std::string prompt;
if (result.count("prompt") && result.count("prompt_file")) {
std::cout << "Prompt and prompt file should not exist together!" << std::endl;
return EXIT_FAILURE;
} else {
if (result.count("prompt_file")) {
prompt = utils::read_prompt(result["prompt_file"].as<std::string>());
} else {
prompt = result["prompt"].as<std::string>().empty() ? "What is on the image?" : result["prompt"].as<std::string>();
}
}
if (prompt.empty()) {
std::cout << "Prompt is empty!" << std::endl;
return EXIT_FAILURE;
}

const std::string models_path = result["model"].as<std::string>();
const std::string image_path = result["image"].as<std::string>();
std::string device = result["device"].as<std::string>();
size_t num_warmup = result["num_warmup"].as<size_t>();
size_t num_iter = result["num_iter"].as<size_t>();
ov::Tensor image = utils::load_image(image_path);
std::vector<ov::Tensor> images = utils::load_images(image_path);

ov::genai::GenerationConfig config;
config.max_new_tokens = result["max_new_tokens"].as<size_t>();
config.ignore_eos = true;

ov::genai::SchedulerConfig scheduler_config;
scheduler_config.enable_prefix_caching = false;
scheduler_config.max_num_batched_tokens = std::numeric_limits<std::size_t>::max();

std::cout << ov::get_openvino_version() << std::endl;

ov::genai::VLMPipeline pipe(models_path, device, ov::genai::scheduler_config(scheduler_config));

auto input_data = pipe.get_tokenizer().encode(prompt);
size_t prompt_token_size = input_data.input_ids.get_shape()[1];
std::cout << "Number of images:" << images.size() << ", prompt token size:" << prompt_token_size << std::endl;

ov::genai::VLMPipeline pipe(models_path, device);

for (size_t i = 0; i < num_warmup; i++)
pipe.generate(prompt, ov::genai::image(image), ov::genai::generation_config(config));
pipe.generate(prompt, ov::genai::images(images), ov::genai::generation_config(config));

auto res = pipe.generate(prompt, ov::genai::image(image), ov::genai::generation_config(config));
auto res = pipe.generate(prompt, ov::genai::images(images), ov::genai::generation_config(config));
auto metrics = res.perf_metrics;
for (size_t i = 0; i < num_iter - 1; i++) {
res = pipe.generate(prompt, ov::genai::image(image), ov::genai::generation_config(config));
res = pipe.generate(prompt, ov::genai::images(images), ov::genai::generation_config(config));
metrics = metrics + res.perf_metrics;
}

std::cout << std::fixed << std::setprecision(2);
std::cout << "Output token size:" << res.perf_metrics.get_num_generated_tokens() << std::endl;
std::cout << "Load time: " << metrics.get_load_time() << " ms" << std::endl;
std::cout << "Generate time: " << metrics.get_generate_duration().mean << " ± " << metrics.get_generate_duration().std << " ms" << std::endl;
std::cout << "Tokenization time: " << metrics.get_tokenization_duration().mean << " ± " << metrics.get_tokenization_duration().std << " ms" << std::endl;
3 changes: 2 additions & 1 deletion samples/python/text_generation/README.md
Original file line number Diff line number Diff line change
@@ -153,7 +153,8 @@ For more information how performance metrics are calculated please follow [perfo
```
#### Options
- `-m, --model`: Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `"The Sky is blue because"`): The prompt to generate text.
- `-p, --prompt` (default: `None`): The prompt to generate text. If without `-p` and `-pf`, the default prompt is `"The Sky is blue because"`
- `-pf, --prompt_file` Read prompt from file.
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `-mt, --max_new_tokens` (default: `20`): Maximal number of new tokens.
- `-n, --num_iter` (default: `3`): Number of iterations.
30 changes: 27 additions & 3 deletions samples/python/text_generation/benchmark_genai.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,38 @@
# Copyright (C) 2023-2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import sys
import argparse
import openvino_genai as ov_genai
from openvino import get_version

def main():
parser = argparse.ArgumentParser(description="Help command")
parser.add_argument("-m", "--model", type=str, required=True, help="Path to model and tokenizers base directory")
parser.add_argument("-p", "--prompt", type=str, default="The Sky is blue because", help="Prompt")
parser.add_argument("-p", "--prompt", type=str, default=None, help="Prompt")
parser.add_argument("-pf", "--prompt_file", type=str, help="Read prompt from file")
parser.add_argument("-nw", "--num_warmup", type=int, default=1, help="Number of warmup iterations")
parser.add_argument("-n", "--num_iter", type=int, default=2, help="Number of iterations")
parser.add_argument("-mt", "--max_new_tokens", type=int, default=20, help="Maximal number of new tokens")
parser.add_argument("-d", "--device", type=str, default="CPU", help="Device")

args = parser.parse_args()

if args.prompt is not None and args.prompt_file is not None:
raise RuntimeError(f'Prompt and prompt file should not exist together!')
else:
if args.prompt_file is not None:
with open(args.prompt_file, 'r', encoding='utf-8') as f:
prompt = [f.read()]
else:
prompt = ['The Sky is blue because'] if args.prompt is None else [args.prompt]
if len(prompt) == 0:
raise RuntimeError(f'Prompt is empty!')

print(f'openvino runtime version: {get_version()}')

# Perf metrics is stored in DecodedResults.
# In order to get DecodedResults instead of a string input should be a list.
prompt = [args.prompt]
models_path = args.model
device = args.device
num_warmup = args.num_warmup
@@ -26,8 +41,16 @@ def main():
config = ov_genai.GenerationConfig()
config.max_new_tokens = args.max_new_tokens

pipe = ov_genai.LLMPipeline(models_path, device)
scheduler_config = ov_genai.SchedulerConfig()
scheduler_config.enable_prefix_caching = False
scheduler_config.max_num_batched_tokens = sys.maxsize

pipe = ov_genai.LLMPipeline(models_path, device, scheduler_config=scheduler_config)

input_data = pipe.get_tokenizer().encode(prompt)
prompt_token_size = input_data.input_ids.get_shape()[1]
print(f"Prompt token size: {prompt_token_size}")

for _ in range(num_warmup):
pipe.generate(prompt, config)

@@ -37,6 +60,7 @@ def main():
res = pipe.generate(prompt, config)
perf_metrics += res.perf_metrics

print(f"Output token size: {res.perf_metrics.get_num_generated_tokens()}")
print(f"Load time: {perf_metrics.get_load_time():.2f} ms")
print(f"Generate time: {perf_metrics.get_generate_duration().mean:.2f} ± {perf_metrics.get_generate_duration().std:.2f} ms")
print(f"Tokenization time: {perf_metrics.get_tokenization_duration().mean:.2f} ± {perf_metrics.get_tokenization_duration().std:.2f} ms")
3 changes: 2 additions & 1 deletion samples/python/visual_language_chat/README.md
Original file line number Diff line number Diff line change
@@ -40,7 +40,8 @@ python benchmark_vlm.py [OPTIONS]
### Options

- `-m, --model`(default: `.`): Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `What is on the image?`): The prompt to generate text.
- `-p, --prompt` (default: `None`): The prompt to generate text. If without `-p` and `-pf`, the default prompt is `"What is on the image?"`
- `-pf, --prompt_file` Read prompt from file.
- `-i, --image` (default: `image.jpg`): Path to the image.
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `-mt, --max_new_tokens` (default: `20`): Maximal number of new tokens.
45 changes: 38 additions & 7 deletions samples/python/visual_language_chat/benchmark_vlm.py
Original file line number Diff line number Diff line change
@@ -2,11 +2,14 @@
# Copyright (C) 2023-2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import sys
import argparse
import openvino_genai as ov_genai
from PIL import Image
from openvino import Tensor
from pathlib import Path
import numpy as np
from openvino import get_version


def read_image(path: str) -> Tensor:
@@ -22,11 +25,18 @@ def read_image(path: str) -> Tensor:
image_data = np.array(pic)
return Tensor(image_data)

def read_images(path: str) -> list[Tensor]:
entry = Path(path)
if entry.is_dir():
return [read_image(str(file)) for file in sorted(entry.iterdir())]
return [read_image(path)]


def main():
parser = argparse.ArgumentParser(description="Help command")
parser.add_argument("-m", "--model", type=str, help="Path to model and tokenizers base directory")
parser.add_argument("-p", "--prompt", type=str, default="The Sky is blue because", help="Prompt")
parser.add_argument("-p", "--prompt", type=str, default=None, help="Prompt")
parser.add_argument("-pf", "--prompt_file", type=str, help="Read prompt from file")
parser.add_argument("-i", "--image", type=str, default="image.jpg", help="Image")
parser.add_argument("-nw", "--num_warmup", type=int, default=1, help="Number of warmup iterations")
parser.add_argument("-n", "--num_iter", type=int, default=2, help="Number of iterations")
@@ -35,29 +45,50 @@ def main():

args = parser.parse_args()

if args.prompt is not None and args.prompt_file is not None:
raise RuntimeError(f'Prompt and prompt file should not exist together!')
else:
if args.prompt_file is not None:
with open(args.prompt_file, 'r', encoding='utf-8') as f:
prompt = f.read()
else:
prompt = 'What is on the image?' if args.prompt is None else args.prompt
if len(prompt) == 0:
raise RuntimeError(f'Prompt is empty!')

print(f'openvino runtime version: {get_version()}')

# Perf metrics is stored in VLMDecodedResults.
# In order to get VLMDecodedResults instead of a string input should be a list.
prompt = args.prompt
models_path = args.model
image = read_image(args.image)
images = read_images(args.image)
device = args.device
num_warmup = args.num_warmup
num_iter = args.num_iter

config = ov_genai.GenerationConfig()
config.max_new_tokens = args.max_new_tokens

pipe = ov_genai.VLMPipeline(models_path, device)
scheduler_config = ov_genai.SchedulerConfig()
scheduler_config.enable_prefix_caching = False
scheduler_config.max_num_batched_tokens = sys.maxsize

pipe = ov_genai.VLMPipeline(models_path, device, scheduler_config=scheduler_config)

input_data = pipe.get_tokenizer().encode(prompt)
prompt_token_size = input_data.input_ids.get_shape()[1]
print(f"Number of images:{len(images)}, Prompt token size: {prompt_token_size}")

for _ in range(num_warmup):
pipe.generate(prompt, images=image, generation_config=config)
pipe.generate(prompt, images=images, generation_config=config)

res = pipe.generate(prompt, images=image, generation_config=config)
res = pipe.generate(prompt, images=images, generation_config=config)
perf_metrics = res.perf_metrics
for _ in range(num_iter - 1):
res = pipe.generate(prompt, images=image, generation_config=config)
res = pipe.generate(prompt, images=images, generation_config=config)
perf_metrics += res.perf_metrics

print(f"Output token size: {res.perf_metrics.get_num_generated_tokens()}")
print(f"Load time: {perf_metrics.get_load_time():.2f} ms")
print(
f"Generate time: {perf_metrics.get_generate_duration().mean:.2f} ± {perf_metrics.get_generate_duration().std:.2f} ms")
29 changes: 21 additions & 8 deletions tools/llm_bench/task/visual_language_generation.py
Original file line number Diff line number Diff line change
@@ -12,13 +12,13 @@
import openvino as ov
import hashlib
import llm_bench_utils.metrics_print as metrics_print
import llm_bench_utils.output_csv
from transformers import set_seed
from transformers.image_utils import load_image
import llm_bench_utils.output_json
import llm_bench_utils.output_file
import llm_bench_utils.gen_output_data as gen_output_data
import llm_bench_utils.parse_json_data as parse_json_data
from pathlib import Path


FW_UTILS = {'pt': llm_bench_utils.pt_utils, 'ov': llm_bench_utils.ov_utils}

@@ -37,10 +37,16 @@ def run_visual_language_generation_optimum(
prompts = []
inputs = [inputs] if not isinstance(inputs, (list, tuple)) else inputs
for input_data in inputs:
if "media" in input_data:
images.append(load_image(input_data["media"]))
if input_data.get("media", None):
entry = Path(input_data["media"])
if entry.is_dir():
for file in sorted(entry.iterdir()):
images.append(load_image(str(file)))
else:
images.append(load_image(input_data["media"]))
prompts.append(input_data["prompt"])

prefix = '[warm-up]' if num == 0 else '[{}]'.format(num)
log.info(f'{prefix}[P{prompt_index}] Input image nums:{len(images)}')
if args["output_dir"] is not None and num == 0:
for bs_index, in_text in enumerate(prompts):
llm_bench_utils.output_file.output_input_text(in_text, args, model_precision, prompt_index, bs_index, proc_id)
@@ -192,8 +198,13 @@ def run_visual_language_generation_genai(
prompts = []
inputs = [inputs] if not isinstance(inputs, (list, tuple)) else inputs
for input_data in inputs:
if "media" in input_data:
images.append(load_image_genai(input_data["media"]))
if input_data.get("media", None):
entry = Path(input_data["media"])
if entry.is_dir():
for file in sorted(entry.iterdir()):
images.append(load_image_genai(str(file)))
else:
images.append(load_image_genai(input_data["media"]))
prompts.append(input_data["prompt"])
if args["output_dir"] is not None and num == 0:
for bs_index, in_text in enumerate(prompts):
@@ -212,7 +223,9 @@ def run_visual_language_generation_genai(
gen_config.ignore_eos = True
kwargs = {}
if len(images) >= 1:
kwargs["images"] = images[0]
kwargs["images"] = images
prefix = '[warm-up]' if num == 0 else '[{}]'.format(num)
log.info(f'{prefix}[P{prompt_index}] Input image nums:{len(images)}')
start = time.perf_counter()
generation_result = model.generate(prompts[0], generation_config=gen_config, **kwargs)
end = time.perf_counter()