eagle3 cb impl with top-1 proposal #3055

songbell · 2025-11-21T08:53:17Z

eagle3 CB impl
Tickets: CVS-173358
ref code: https://github.com/SafeAILab/EAGLE

Copilot

Pull Request Overview

This PR implements EAGLE3 speculative decoding with continuous batching support for improved inference performance. The changes add a new speculative decoding variant that uses hidden state passing between main and draft models for more efficient token generation.

Key Changes

Introduced Eagle3DecodingImpl for EAGLE3-specific speculative decoding logic
Extended model runner to support hidden state import/export for EAGLE3
Added test coverage for EAGLE3 speculative decoding scenarios

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/python_tests/utils/hugging_face.py	Adds eagle3 model detection and handles tokenizer conditionally
tests/python_tests/test_continuous_batching.py	Adds EAGLE3 test cases and refactors test helper functions
tests/python_tests/samples/test_speculative_decoding_lm.py	Extracts common test logic and adds EAGLE3 sample tests
tests/python_tests/samples/conftest.py	Adds model configurations for EAGLE3 models
src/cpp/src/speculative_decoding/update_request_structs.hpp	Extends GeneratedSequence to store hidden states
src/cpp/src/speculative_decoding/speculative_decoding_impl.hpp	Refactors generate logic into template helper and exposes internal state
src/cpp/src/speculative_decoding/speculative_decoding_impl.cpp	Extracts scheduler initialization and refactors generate using strategy pattern
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.hpp	Defines EAGLE3 implementation with model transformations
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp	Implements EAGLE3 decoding with hidden state management
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.hpp	Adds ContinuousBatchingForEagle3DecodingImpl class
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp	Implements hidden state handling in update_requests
src/cpp/src/sequence_group.hpp	Adds hidden state storage and accessor methods to Sequence
src/cpp/src/sampling/sampler.hpp	Adds draft-to-target mapping for EAGLE decoding
src/cpp/src/sampling/sampler.cpp	Implements token index adjustment using draft2target mapping
src/cpp/src/llm/pipeline.cpp	Adds apply_eagle_rt_info helper and draft model configuration
src/cpp/src/continuous_batching/pipeline.cpp	Integrates EAGLE3 mode detection and instantiation
src/cpp/src/continuous_batching/model_runner.hpp	Adds hidden state flag system and sequence mapping structures
src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp	Declares EAGLE3 implementation classes as friends
.github/workflows/windows.yml	Excludes eagle3 tests from main suite and adds dedicated test job
.github/workflows/manylinux_2_28.yml	Excludes eagle3 tests from main suite and adds dedicated test job
.github/workflows/linux.yml	Excludes eagle3 tests from main suite and adds dedicated test job

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp

src/cpp/src/continuous_batching/model_runner.hpp

.github/workflows/windows.yml

.github/workflows/manylinux_2_28.yml

.github/workflows/linux.yml

…genai into HEAD

Signed-off-by: fishbell <[email protected]>

…genai into HEAD

tests/python_tests/utils/hugging_face.py

tests/python_tests/samples/test_speculative_decoding_lm.py

src/cpp/src/continuous_batching/pipeline.cpp

Wovchena · 2025-11-26T14:17:09Z

src/cpp/src/continuous_batching/pipeline.cpp

+    if (config.find("eagle3_mode") != config.end()) {
+        eagle_rt_info.eagle3_mode = config.at("eagle3_mode").as<bool>();
+        config.erase("eagle3_mode");
+        if (config.find("hidden_layers_list") != config.end()) {


Need a test for that feature. I think you usually rely on configs

@Wovchena do you mean to test the generated draft model has these rt_info entry?
@rkazants how do you view the "eagle3_mode" entry in draft model? or shall we use some other way to distinguish it in draft model?

Agreed to delete else branch

src/cpp/src/continuous_batching/pipeline.cpp

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp

src/cpp/src/continuous_batching/pipeline.cpp

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1

[nitpick] The variable name fname is ambiguous. Consider renaming it to friendly_name or node_friendly_name to better indicate that it represents a node's friendly name.

// Copyright (C) 2023-2025 Intel Corporation

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp:1

The error message should be capitalized and more descriptive. Consider: "Missing hidden state from target model to EAGLE draft model. Ensure hidden state export is properly configured."

// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp

src/cpp/src/continuous_batching/pipeline.cpp

tests/python_tests/test_continuous_batching.py

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (3)

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1

Variable names main_model_hidden_size and draft_model_hidden_size should use integer types (size_t) rather than float for representing sizes. The computation result should also be cast appropriately.

// Copyright (C) 2023-2025 Intel Corporation

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp:1

Error message could be more descriptive. Consider: 'Hidden state from main model is required but missing for EAGLE draft model' to clarify which models are involved and why this is an error.

// Copyright (C) 2023-2025 Intel Corporation

tests/python_tests/test_continuous_batching.py:1

The str() wrapper was removed from models_path on line 236 but kept on line 232. For consistency, both should use the same approach to handle the path type.

# Copyright (C) 2018-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/python_tests/utils/hugging_face.py

tests/python_tests/samples/test_speculative_decoding_lm.py

src/cpp/src/continuous_batching/model_runner.hpp

tests/python_tests/test_continuous_batching.py

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/python_tests/test_continuous_batching.py

tests/python_tests/samples/test_speculative_decoding_lm.py

src/cpp/src/speculative_decoding/speculative_decoding_impl.cpp

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp

src/cpp/src/continuous_batching/model_runner.hpp

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp

Signed-off-by: fishbell <[email protected]>

…genai into HEAD

src/cpp/src/continuous_batching/pipeline.cpp

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/python_tests/utils/hugging_face.py

tests/python_tests/test_continuous_batching.py

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.hpp

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp

src/cpp/src/continuous_batching/model_runner.hpp

.github/workflows/windows.yml

Signed-off-by: fishbell <[email protected]>

…genai into HEAD

Signed-off-by: fishbell <[email protected]>

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1

The error message states 'Target state hidden size' but the assertion checks stored_seq_len against total_num_tokens (sequence length). The message should mention 'sequence length' instead of 'hidden size'.

// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/python_tests/test_continuous_batching.py

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp

src/cpp/src/continuous_batching/model_runner.hpp

sbalandi

LGTM

Signed-off-by: fishbell <[email protected]>

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-07T11:53:33Z

tests/python_tests/test_continuous_batching.py

+    extended_perf_metrics = None
+    if draft_model_id is None:


The variable extended_perf_metrics is initialized as None but is only conditionally assigned in one branch. If neither branch executes (when draft_model_id is not None but pipeline_type is not SPECULATIVE_DECODING), assertions on line 600-601 will fail unexpectedly. Consider initializing this variable properly or restructuring the conditional logic to ensure it's always assigned before use.

Copilot · 2025-12-07T11:53:33Z

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp

+            if (!param_node) continue;
+            if (input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue;


[nitpick] The check for parameter node could be combined with the friendly name check on the next line to reduce nesting and improve readability.

Suggested change

if (!param_node) continue;

if (input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue;

if (!param_node || input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue;

Copilot · 2025-12-07T11:53:33Z

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp

+            if (eagle_mode_enabled && !m_is_validation_mode_enabled)
+                m_model_runner->set_initial_hidden_state(request_id,
+                                                     candidates.begin()->second.hidden_states);


Multi-line statement should use braces for clarity and maintainability, especially when the condition is complex.

Suggested change

if (eagle_mode_enabled && !m_is_validation_mode_enabled)

m_model_runner->set_initial_hidden_state(request_id,

candidates.begin()->second.hidden_states);

if (eagle_mode_enabled && !m_is_validation_mode_enabled) {

m_model_runner->set_initial_hidden_state(request_id,

candidates.begin()->second.hidden_states);

}

Copilot · 2025-12-07T11:53:34Z

src/cpp/src/continuous_batching/model_runner.hpp

+                    size_t stored_hidden_size = stored_shape[stored_shape.size() - 1];
+
+                    OPENVINO_ASSERT(stored_hidden_size == hidden_size, "Target state hidden size does not match the expected size for Eagle3 draft model inference.");
+                    OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state hidden size does not match the expected size for Eagle3 draft model inference.");


Error message is misleading - it says 'hidden size' but is checking 'seq_len'. The message should say 'Target state sequence length does not match the expected length for Eagle3 draft model inference.'

Suggested change

OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state hidden size does not match the expected size for Eagle3 draft model inference.");

OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state sequence length does not match the expected length for Eagle3 draft model inference.");

Copilot AI review requested due to automatic review settings November 21, 2025 08:53

songbell changed the title ~~Bell/eagle cb top1 impl~~ eagle3 cb impl with top-1 proposal Nov 21, 2025

songbell mentioned this pull request Nov 21, 2025

eagle3 cb impl with top-1 proposal #2740

Closed

Copilot AI reviewed Nov 21, 2025

View reviewed changes

peterchen-intel requested review from rkazants, wangleis and xipingyan November 21, 2025 09:09

peterchen-intel assigned Wovchena and rkazants Nov 21, 2025

peterchen-intel requested review from peterchen-intel and xufang-lisa November 21, 2025 09:10

songbell added 4 commits November 22, 2025 00:25

eagle3 top1 impl

786b2e5

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

cc20fca

…genai into HEAD

update test

0d67b9e

Signed-off-by: fishbell <[email protected]>

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

bf603eb

…genai into HEAD

Wovchena requested changes Nov 26, 2025

View reviewed changes

tests/python_tests/utils/hugging_face.py Outdated Show resolved Hide resolved

tests/python_tests/samples/test_speculative_decoding_lm.py Outdated Show resolved Hide resolved

Wovchena requested changes Nov 26, 2025

View reviewed changes

sbalandi reviewed Nov 26, 2025

View reviewed changes

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp Outdated Show resolved Hide resolved

sbalandi reviewed Nov 26, 2025

View reviewed changes

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp Show resolved Hide resolved

sbalandi reviewed Nov 26, 2025

View reviewed changes

src/cpp/src/continuous_batching/pipeline.cpp Show resolved Hide resolved

Copilot AI review requested due to automatic review settings November 27, 2025 02:24

Copilot AI review requested due to automatic review settings November 27, 2025 02:25

Copilot AI reviewed Nov 27, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings November 27, 2025 02:37

songbell force-pushed the bell/eagle_cb_top1_impl branch from 5753e27 to f7d8233 Compare November 27, 2025 02:37

Copilot AI reviewed Nov 27, 2025

View reviewed changes

songbell force-pushed the bell/eagle_cb_top1_impl branch from f7d8233 to aab75bd Compare November 27, 2025 02:48

songbell commented Nov 27, 2025

View reviewed changes

tests/python_tests/test_continuous_batching.py Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings November 27, 2025 03:15

Copilot AI reviewed Nov 27, 2025

View reviewed changes

songbell force-pushed the bell/eagle_cb_top1_impl branch from 4650f71 to 948819a Compare November 27, 2025 04:35

songbell added 4 commits November 27, 2025 18:23

apply some review comments, fix merge left over

c16b50b

Signed-off-by: fishbell <[email protected]>

apply copilot

aab75bd

Signed-off-by: fishbell <[email protected]>

tune test

948819a

Signed-off-by: fishbell <[email protected]>

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

3af494c

…genai into HEAD

Wovchena requested changes Nov 28, 2025

View reviewed changes

src/cpp/src/continuous_batching/pipeline.cpp Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings December 2, 2025 02:54

Copilot AI reviewed Dec 2, 2025

View reviewed changes

songbell added 4 commits December 2, 2025 18:22

apply review comments

066aa64

Signed-off-by: fishbell <[email protected]>

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

026ed71

…genai into HEAD

fix merge failure

4906e2b

Signed-off-by: fishbell <[email protected]>

apply copilot

7c73208

Signed-off-by: fishbell <[email protected]>

Copilot AI review requested due to automatic review settings December 3, 2025 13:56

Copilot AI reviewed Dec 3, 2025

View reviewed changes

tests/python_tests/test_continuous_batching.py Show resolved Hide resolved

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp Outdated Show resolved Hide resolved

src/cpp/src/continuous_batching/model_runner.hpp Show resolved Hide resolved

sbalandi reviewed Dec 3, 2025

View reviewed changes

songbell added 2 commits December 4, 2025 05:56

remove hidden layer member variable, clean up some logic

f88d67a

Signed-off-by: fishbell <[email protected]>

illustrate token rewind with more friendly variable name

c5085d8

Signed-off-by: fishbell <[email protected]>

GuoliangShiIntel added a commit to GuoliangShiIntel/openvino.genai that referenced this pull request Dec 5, 2025

GPU eagle3 cb impl with top-1 proposal openvinotoolkit#3055

f4ba015

Merge branch 'master' into bell/eagle_cb_top1_impl

f01f057

Copilot AI review requested due to automatic review settings December 7, 2025 11:52

Copilot AI reviewed Dec 7, 2025

View reviewed changes

		if (!param_node) continue;
		if (input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue;

	if (!param_node) continue;
	if (input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue;
	if (!param_node \|\| input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue;

	OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state hidden size does not match the expected size for Eagle3 draft model inference.");
	OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state sequence length does not match the expected length for Eagle3 draft model inference.");

eagle3 cb impl with top-1 proposal #3055

Are you sure you want to change the base?

eagle3 cb impl with top-1 proposal #3055

Uh oh!

Conversation

songbell commented Nov 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wovchena Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

songbell Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Wovchena Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!