-
Notifications
You must be signed in to change notification settings - Fork 105
feat(flowcontrol): Add Foundational Types and Architecture #997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(flowcontrol): Add Foundational Types and Architecture #997
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: LukeAVanDrie The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @LukeAVanDrie. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I have the followup PRs for the rest of the type system that contextualize this queued up, but I am currently on a 🚂 and my connection is spotty. Will send those out soon. The next PR will detail the |
/ok-to-test |
c86d3d8
to
0645012
Compare
Introduces the foundational packages for the new Flow Controller component. This change includes: - The top-level README outlining the motivation, high-level architecture, and component pillars. - The `types` package, which defines the core data contracts, request lifecycle interfaces, error-handling vocabulary, and final outcome enums for the entire module. This foundational PR establishes the core concepts and data models upon which the rest of the Flow Controller implementation will be built.
2c6952e
to
25a1e08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a followup PR that uses those types, it is easier to reason about those interfaces and types when they are actually used.
The Flow Controller is a sophisticated library designed to solve these problems. It acts as a crucial gatekeeper that | ||
decides *if* and *when* a request should proceed to be scheduled. Its primary mission is to enable predictable, fair, | ||
and efficient utilization of shared backend resources by enforcing prioritization, applying fairness policies, managing | ||
request queuing under saturation, and orchestrating displacement (the eviction of lower-priority queued items to make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the word shedding instead of eviction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named the specific extension point "displacement". I documented the justification in: https://github.com/LukeAVanDrie/gateway-api-inference-extension/blob/flow-control/pkg/epp/flowcontrol/framework/README.md#terminology-dispatch-vs-displacement.
Eviction, preemption, and shedding were also considered.
It is designed for extensibility, allowing custom logic for policies and queuing mechanisms to be plugged into a robust, | ||
high-performance orchestration engine. | ||
|
||
### Role in the Gateway API Inference Extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend to define goals and non-goals, here is my rough list:
Goals:
- Enable priority and fairness across workloads.
- Optimize request scheduling by shifting a portion of request queueing from the model server to a centralized queue.
- O(seconds) average and O(single digit minutes) tail queueing time
Non-goals:
- Persistence: Queueing is handled in the endpoint picker's memory.
- Scale: Scale is limited by available memory and number of ext-proc connections of the endpoint picker and L7LB. While they can horizontally scale, maintaining connections for many minutes to hours is not a goal.
- A substitute for a message queue: The queue manages requests with open connections via the L7LB and is not intended for asynchronous request handling.
class A2 downstream_err; | ||
class D,E,F,P,SD default; | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should clearly distinguish between request flow (the top one?) and configuration flow (the bottom one?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, I will add some controller (e.g., a k8s operator) that configures the Flow Registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to completely detach from epp, so I don't think we want to describe a generic k8s operator that does this.
I am just asking to distinguish between the arrows that are describing a request flow (traversed for each request) and the ones that are configuration (registering plugins and whatnot). I would split this actually into two diagrams if necessary. I would like to see the data flow separately from the configuration flow
concurrent-safe request storage. It uses a `QueueCapability` system that allows for diverse and extensible queue | ||
implementations (e.g., FIFO, Priority Heap) while maintaining a stable interface. | ||
|
||
4. **The `FlowRegistry` (`./registry`, `./ports`)**: This is the stateful control plane of the system. It manages the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is a port in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using a "Ports and Adapters" (or "Hexagonal") architecture pattern here. The ports in this flow control system are documented here: https://github.com/LukeAVanDrie/gateway-api-inference-extension/blob/flow-control/pkg/epp/flowcontrol/ports/README.md.
@@ -0,0 +1,33 @@ | |||
# Flow Control Core Types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend having this documentation in the code as comments, I worry that as the code evolves, such detailed documentation will quickly go out of sync.
// Priority returns the numerical priority level currently associated with this flow within the Flow Registry. | ||
// | ||
// Convention: Lower numerical values indicate higher priority. | ||
Priority() uint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, so we are not limited to discrete values, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep; however, the priority bands (defining the possible values of Priority()) are startup-time configured in the Flow Registry. Most of the system supports dynamic updates, just not this.
Flows can migrate between priorities at runtime, you just cannot change what the priority options are.
// An object implementing this interface is the primary input to `FlowController.EnqueueAndWait()`. The controller then | ||
// wraps this object with its own internal structures (which implement `QueueItemAccessor`) to manage the request's | ||
// lifecycle without modifying the original. | ||
type FlowControlRequest interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect to have more than one implementation of this interface? if so, is it really necessary to have this as an interface vs a struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it this way so this can be imported outside EPP. All of this is functional as a portable library. Also, I have an escape hatch with OriginalRequest
that allows back casting to get richer request metadata not commonly shared between policies. This is useful for prototyping for advanced use cases. I think there is some value in allowing any type implementing this interface to be enqueued.
This also makes testing a bit easier.
Description
This PR introduces the foundational building blocks for the new Flow Control system, designed to manage priority, fairness, and queuing for inference workloads.
The goal is to provide a sophisticated mechanism for managing diverse SLOs and preventing issues like Head-of-Line blocking and system overload, which are not addressed by simple FCFS temportal scheduling. This is a relatively sophisticated and complex system that will need to be split across several PRs. This initial submission lays the groundwork for the entire module by establishing its core concepts and data contracts (basically all types with no cross-package dependencies).
This work tracks issue #674
For historical context, the initial KEP-like proposal can be found here. Please note that the design has evolved significantly, and this PR represents the most current architecture (or at least a small slice of it). I will do some documentation cleanup once more of these PRs with the canonical architectural decisions are out for review.
Contribution
This PR includes two main pieces:
Top-Level README.md: Provides a high-level overview of the Flow Controller, including:
Controller
,Framework
,Registry
, etc.).The
types
Package: This new package establishes the core "vocabulary" for the entire Flow Controller module. It includes:FlowControlRequest
,QueueItemAccessor
,QueueItemHandle
).QueueOutcome
enum and a structured set of error types (ErrRejected
,ErrEvicted
).FlowSpecification
interface for defining workload identity and priority.README.md
detailing the concepts within the types package.Review Focus
As this is a foundational PR, I'm particularly looking for feedback on:
types
package clear, logical, and sufficient for the tasks ahead?QueueOutcome
, the error hierarchy) well-defined and intuitive?Future PRs will build upon these types to implement the
framework
,registry
, andcontroller
packages.