Skip to content

Conversation

@songbell
Copy link

eagle3 CB impl
Tickets: CVS-173358
ref code: https://github.com/SafeAILab/EAGLE

Copilot AI review requested due to automatic review settings November 21, 2025 08:53
@github-actions github-actions bot added category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms category: speculative decoding Speculative decoding category: GHA CI based on Github actions category: LLM samples GenAI LLM samples category: CPP API Changes in GenAI C++ public headers no-match-files category: GGUF GGUF file reader labels Nov 21, 2025
@songbell songbell changed the title Bell/eagle cb top1 impl eagle3 cb impl with top-1 proposal Nov 21, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements EAGLE3 speculative decoding with continuous batching support for improved inference performance. The changes add a new speculative decoding variant that uses hidden state passing between main and draft models for more efficient token generation.

Key Changes

  • Introduced Eagle3DecodingImpl for EAGLE3-specific speculative decoding logic
  • Extended model runner to support hidden state import/export for EAGLE3
  • Added test coverage for EAGLE3 speculative decoding scenarios

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/python_tests/utils/hugging_face.py Adds eagle3 model detection and handles tokenizer conditionally
tests/python_tests/test_continuous_batching.py Adds EAGLE3 test cases and refactors test helper functions
tests/python_tests/samples/test_speculative_decoding_lm.py Extracts common test logic and adds EAGLE3 sample tests
tests/python_tests/samples/conftest.py Adds model configurations for EAGLE3 models
src/cpp/src/speculative_decoding/update_request_structs.hpp Extends GeneratedSequence to store hidden states
src/cpp/src/speculative_decoding/speculative_decoding_impl.hpp Refactors generate logic into template helper and exposes internal state
src/cpp/src/speculative_decoding/speculative_decoding_impl.cpp Extracts scheduler initialization and refactors generate using strategy pattern
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.hpp Defines EAGLE3 implementation with model transformations
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp Implements EAGLE3 decoding with hidden state management
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.hpp Adds ContinuousBatchingForEagle3DecodingImpl class
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp Implements hidden state handling in update_requests
src/cpp/src/sequence_group.hpp Adds hidden state storage and accessor methods to Sequence
src/cpp/src/sampling/sampler.hpp Adds draft-to-target mapping for EAGLE decoding
src/cpp/src/sampling/sampler.cpp Implements token index adjustment using draft2target mapping
src/cpp/src/llm/pipeline.cpp Adds apply_eagle_rt_info helper and draft model configuration
src/cpp/src/continuous_batching/pipeline.cpp Integrates EAGLE3 mode detection and instantiation
src/cpp/src/continuous_batching/model_runner.hpp Adds hidden state flag system and sequence mapping structures
src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp Declares EAGLE3 implementation classes as friends
.github/workflows/windows.yml Excludes eagle3 tests from main suite and adds dedicated test job
.github/workflows/manylinux_2_28.yml Excludes eagle3 tests from main suite and adds dedicated test job
.github/workflows/linux.yml Excludes eagle3 tests from main suite and adds dedicated test job

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if (config.find("eagle3_mode") != config.end()) {
eagle_rt_info.eagle3_mode = config.at("eagle3_mode").as<bool>();
config.erase("eagle3_mode");
if (config.find("hidden_layers_list") != config.end()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a test for that feature. I think you usually rely on configs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wovchena do you mean to test the generated draft model has these rt_info entry?
@rkazants how do you view the "eagle3_mode" entry in draft model? or shall we use some other way to distinguish it in draft model?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to delete else branch

Copilot AI review requested due to automatic review settings November 27, 2025 02:24
Copilot AI review requested due to automatic review settings November 27, 2025 02:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1

  • [nitpick] The variable name fname is ambiguous. Consider renaming it to friendly_name or node_friendly_name to better indicate that it represents a node's friendly name.
// Copyright (C) 2023-2025 Intel Corporation

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp:1

  • The error message should be capitalized and more descriptive. Consider: "Missing hidden state from target model to EAGLE draft model. Ensure hidden state export is properly configured."
// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 27, 2025 02:37
@songbell songbell force-pushed the bell/eagle_cb_top1_impl branch from 5753e27 to f7d8233 Compare November 27, 2025 02:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (3)

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1

  • Variable names main_model_hidden_size and draft_model_hidden_size should use integer types (size_t) rather than float for representing sizes. The computation result should also be cast appropriately.
// Copyright (C) 2023-2025 Intel Corporation

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp:1

  • Error message could be more descriptive. Consider: 'Hidden state from main model is required but missing for EAGLE draft model' to clarify which models are involved and why this is an error.
// Copyright (C) 2023-2025 Intel Corporation

tests/python_tests/test_continuous_batching.py:1

  • The str() wrapper was removed from models_path on line 236 but kept on line 232. For consistency, both should use the same approach to handle the path type.
# Copyright (C) 2018-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@songbell songbell force-pushed the bell/eagle_cb_top1_impl branch from f7d8233 to aab75bd Compare November 27, 2025 02:48
Copilot AI review requested due to automatic review settings November 27, 2025 03:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@songbell songbell force-pushed the bell/eagle_cb_top1_impl branch from 4650f71 to 948819a Compare November 27, 2025 04:35
Copilot AI review requested due to automatic review settings December 2, 2025 02:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings December 3, 2025 13:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1

  • The error message states 'Target state hidden size' but the assertion checks stored_seq_len against total_num_tokens (sequence length). The message should mention 'sequence length' instead of 'hidden size'.
// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@sbalandi sbalandi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

GuoliangShiIntel added a commit to GuoliangShiIntel/openvino.genai that referenced this pull request Dec 5, 2025
Copilot AI review requested due to automatic review settings December 7, 2025 11:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +587 to +588
extended_perf_metrics = None
if draft_model_id is None:
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable extended_perf_metrics is initialized as None but is only conditionally assigned in one branch. If neither branch executes (when draft_model_id is not None but pipeline_type is not SPECULATIVE_DECODING), assertions on line 600-601 will fail unexpectedly. Consider initializing this variable properly or restructuring the conditional logic to ensure it's always assigned before use.

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +64
if (!param_node) continue;
if (input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue;
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The check for parameter node could be combined with the friendly name check on the next line to reduce nesting and improve readability.

Suggested change
if (!param_node) continue;
if (input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue;
if (!param_node || input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue;

Copilot uses AI. Check for mistakes.
Comment on lines +274 to +276
if (eagle_mode_enabled && !m_is_validation_mode_enabled)
m_model_runner->set_initial_hidden_state(request_id,
candidates.begin()->second.hidden_states);
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-line statement should use braces for clarity and maintainability, especially when the condition is complex.

Suggested change
if (eagle_mode_enabled && !m_is_validation_mode_enabled)
m_model_runner->set_initial_hidden_state(request_id,
candidates.begin()->second.hidden_states);
if (eagle_mode_enabled && !m_is_validation_mode_enabled) {
m_model_runner->set_initial_hidden_state(request_id,
candidates.begin()->second.hidden_states);
}

Copilot uses AI. Check for mistakes.
size_t stored_hidden_size = stored_shape[stored_shape.size() - 1];

OPENVINO_ASSERT(stored_hidden_size == hidden_size, "Target state hidden size does not match the expected size for Eagle3 draft model inference.");
OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state hidden size does not match the expected size for Eagle3 draft model inference.");
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message is misleading - it says 'hidden size' but is checking 'seq_len'. The message should say 'Target state sequence length does not match the expected length for Eagle3 draft model inference.'

Suggested change
OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state hidden size does not match the expected size for Eagle3 draft model inference.");
OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state sequence length does not match the expected length for Eagle3 draft model inference.");

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: GHA CI based on Github actions category: LLM samples GenAI LLM samples category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms category: speculative decoding Speculative decoding no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants