-
Notifications
You must be signed in to change notification settings - Fork 303
eagle3 cb impl with top-1 proposal #3055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
eagle3 cb impl with top-1 proposal #3055
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements EAGLE3 speculative decoding with continuous batching support for improved inference performance. The changes add a new speculative decoding variant that uses hidden state passing between main and draft models for more efficient token generation.
Key Changes
- Introduced Eagle3DecodingImpl for EAGLE3-specific speculative decoding logic
- Extended model runner to support hidden state import/export for EAGLE3
- Added test coverage for EAGLE3 speculative decoding scenarios
Reviewed Changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/python_tests/utils/hugging_face.py | Adds eagle3 model detection and handles tokenizer conditionally |
| tests/python_tests/test_continuous_batching.py | Adds EAGLE3 test cases and refactors test helper functions |
| tests/python_tests/samples/test_speculative_decoding_lm.py | Extracts common test logic and adds EAGLE3 sample tests |
| tests/python_tests/samples/conftest.py | Adds model configurations for EAGLE3 models |
| src/cpp/src/speculative_decoding/update_request_structs.hpp | Extends GeneratedSequence to store hidden states |
| src/cpp/src/speculative_decoding/speculative_decoding_impl.hpp | Refactors generate logic into template helper and exposes internal state |
| src/cpp/src/speculative_decoding/speculative_decoding_impl.cpp | Extracts scheduler initialization and refactors generate using strategy pattern |
| src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.hpp | Defines EAGLE3 implementation with model transformations |
| src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp | Implements EAGLE3 decoding with hidden state management |
| src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.hpp | Adds ContinuousBatchingForEagle3DecodingImpl class |
| src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp | Implements hidden state handling in update_requests |
| src/cpp/src/sequence_group.hpp | Adds hidden state storage and accessor methods to Sequence |
| src/cpp/src/sampling/sampler.hpp | Adds draft-to-target mapping for EAGLE decoding |
| src/cpp/src/sampling/sampler.cpp | Implements token index adjustment using draft2target mapping |
| src/cpp/src/llm/pipeline.cpp | Adds apply_eagle_rt_info helper and draft model configuration |
| src/cpp/src/continuous_batching/pipeline.cpp | Integrates EAGLE3 mode detection and instantiation |
| src/cpp/src/continuous_batching/model_runner.hpp | Adds hidden state flag system and sequence mapping structures |
| src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp | Declares EAGLE3 implementation classes as friends |
| .github/workflows/windows.yml | Excludes eagle3 tests from main suite and adds dedicated test job |
| .github/workflows/manylinux_2_28.yml | Excludes eagle3 tests from main suite and adds dedicated test job |
| .github/workflows/linux.yml | Excludes eagle3 tests from main suite and adds dedicated test job |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: fishbell <[email protected]>
| if (config.find("eagle3_mode") != config.end()) { | ||
| eagle_rt_info.eagle3_mode = config.at("eagle3_mode").as<bool>(); | ||
| config.erase("eagle3_mode"); | ||
| if (config.find("hidden_layers_list") != config.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a test for that feature. I think you usually rely on configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed to delete else branch
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp
Outdated
Show resolved
Hide resolved
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (2)
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1
- [nitpick] The variable name
fnameis ambiguous. Consider renaming it tofriendly_nameornode_friendly_nameto better indicate that it represents a node's friendly name.
// Copyright (C) 2023-2025 Intel Corporation
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp:1
- The error message should be capitalized and more descriptive. Consider: "Missing hidden state from target model to EAGLE draft model. Ensure hidden state export is properly configured."
// Copyright (C) 2023-2025 Intel Corporation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp
Outdated
Show resolved
Hide resolved
5753e27 to
f7d8233
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (3)
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1
- Variable names
main_model_hidden_sizeanddraft_model_hidden_sizeshould use integer types (size_t) rather thanfloatfor representing sizes. The computation result should also be cast appropriately.
// Copyright (C) 2023-2025 Intel Corporation
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp:1
- Error message could be more descriptive. Consider: 'Hidden state from main model is required but missing for EAGLE draft model' to clarify which models are involved and why this is an error.
// Copyright (C) 2023-2025 Intel Corporation
tests/python_tests/test_continuous_batching.py:1
- The
str()wrapper was removed frommodels_pathon line 236 but kept on line 232. For consistency, both should use the same approach to handle the path type.
# Copyright (C) 2018-2025 Intel Corporation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f7d8233 to
aab75bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp
Show resolved
Hide resolved
4650f71 to
948819a
Compare
Signed-off-by: fishbell <[email protected]>
Signed-off-by: fishbell <[email protected]>
Signed-off-by: fishbell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: fishbell <[email protected]>
Signed-off-by: fishbell <[email protected]>
Signed-off-by: fishbell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1
- The error message states 'Target state hidden size' but the assertion checks stored_seq_len against total_num_tokens (sequence length). The message should mention 'sequence length' instead of 'hidden size'.
// Copyright (C) 2023-2025 Intel Corporation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp
Outdated
Show resolved
Hide resolved
sbalandi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: fishbell <[email protected]>
Signed-off-by: fishbell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| extended_perf_metrics = None | ||
| if draft_model_id is None: |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable extended_perf_metrics is initialized as None but is only conditionally assigned in one branch. If neither branch executes (when draft_model_id is not None but pipeline_type is not SPECULATIVE_DECODING), assertions on line 600-601 will fail unexpectedly. Consider initializing this variable properly or restructuring the conditional logic to ensure it's always assigned before use.
| if (!param_node) continue; | ||
| if (input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue; |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The check for parameter node could be combined with the friendly name check on the next line to reduce nesting and improve readability.
| if (!param_node) continue; | |
| if (input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue; | |
| if (!param_node || input_node->get_friendly_name().find("hidden_states") == std::string::npos) continue; |
| if (eagle_mode_enabled && !m_is_validation_mode_enabled) | ||
| m_model_runner->set_initial_hidden_state(request_id, | ||
| candidates.begin()->second.hidden_states); |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi-line statement should use braces for clarity and maintainability, especially when the condition is complex.
| if (eagle_mode_enabled && !m_is_validation_mode_enabled) | |
| m_model_runner->set_initial_hidden_state(request_id, | |
| candidates.begin()->second.hidden_states); | |
| if (eagle_mode_enabled && !m_is_validation_mode_enabled) { | |
| m_model_runner->set_initial_hidden_state(request_id, | |
| candidates.begin()->second.hidden_states); | |
| } |
| size_t stored_hidden_size = stored_shape[stored_shape.size() - 1]; | ||
|
|
||
| OPENVINO_ASSERT(stored_hidden_size == hidden_size, "Target state hidden size does not match the expected size for Eagle3 draft model inference."); | ||
| OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state hidden size does not match the expected size for Eagle3 draft model inference."); |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message is misleading - it says 'hidden size' but is checking 'seq_len'. The message should say 'Target state sequence length does not match the expected length for Eagle3 draft model inference.'
| OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state hidden size does not match the expected size for Eagle3 draft model inference."); | |
| OPENVINO_ASSERT(stored_seq_len == total_num_tokens, "Target state sequence length does not match the expected length for Eagle3 draft model inference."); |
eagle3 CB impl
Tickets: CVS-173358
ref code: https://github.com/SafeAILab/EAGLE