Skip to content

Commit c642a7c

Browse files
committed
eagle3 python readme
Signed-off-by: fishbell <[email protected]>
1 parent efe0cb6 commit c642a7c

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

tools/llm_bench/README_EAGLE3.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# SPEULATIVE DECODING for EAGLE3
2+
3+
### 1. Prepare Python Virtual Environment for LLM Benchmarking
4+
5+
``` bash
6+
python3 -m venv ov-llm-bench-env
7+
source ov-llm-bench-env/bin/activate
8+
pip install --upgrade pip
9+
10+
git clone https://github.com/openvinotoolkit/openvino.genai.git
11+
cd openvino.genai/tools/llm_bench
12+
pip install -r requirements.txt
13+
```
14+
15+
### 2. Get main and draft model in OpenVINO IR Format
16+
the main and draft model downloaded from hugging face needs to be converted to openvino IR format.
17+
For now, please get llama3 8B eagle3 main and draft model from below server (password: openvino):
18+
scp -r [email protected]:~/bell/speculative_decoding/eagle3/llama-3.1-8b-instruct-ov-int4/ your_path_to_main/
19+
scp -r [email protected]:~/bell/speculative_decoding/eagle3/EAGLE3-LLaMA3.1-instruct-8B-ov-int4/ your_path_to_draft/
20+
21+
### 3. Benchmark LLM Model using eagle3 speculative decoding
22+
23+
To benchmark the performance of the LLM, use the following command:
24+
25+
python benchmark.py -m /home/openvino-ci-97/bell/speculative_decoding/eagle3/llama-3.1-8b-instruct-ov-int4 -d GPU -pf /home/openvino-ci-97/bell/openvino.genai/tools/llm_bench/test.jsonl -ic 129 --draft_model /home/openvino-ci-97/bell/speculative_decoding/eagle3/EAGLE3-LLaMA3.1-instruct-8B-ov-int4 --draft_device GPU --eagle_config ./eagle.config --disable_prompt_permutation --apply_chat_template
26+
27+
the content of eagle.config is as below:
28+
{"eagle_mode":"EAGLE3", "branching_factor": 1, "tree_depth": 4, "total_tokens": 6}
29+
30+
to tune for better performance, fix the branching_factor to 1, adjust the tree_depth, and the total_tokens should be set as tree_depth + 2 for now (for example, in above config, ajust tree_depth to 5, and total_tokens set to 7)

0 commit comments

Comments
 (0)