|
| 1 | +# Flow Control Module |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +In a multi-tenant, heterogeneous inference serving environment, managing diverse SLOs and fairness requirements is |
| 6 | +critical. Today, the serving stack often relies on a simple "best-effort" or FIFO (First-In, First-Out) basis for |
| 7 | +handling requests. This is insufficient and leads to significant problems: |
| 8 | + |
| 9 | +* **Head-of-Line Blocking**: A long-running, low-priority request can block short, high-priority requests, violating |
| 10 | + SLOs. |
| 11 | +* **Lack of Predictability**: Without proper queuing and prioritization, it's impossible to provide predictable latency |
| 12 | + guarantees to different tenants. |
| 13 | +* **Inability to Handle Saturation**: Under heavy load, the system has no graceful way to manage overload, leading to |
| 14 | + cascading failures instead of controlled degradation. |
| 15 | + |
| 16 | +The Flow Controller is a sophisticated library designed to solve these problems. It acts as a crucial gatekeeper that |
| 17 | +decides *if* and *when* a request should proceed to be scheduled. Its primary mission is to enable predictable, fair, |
| 18 | +and efficient utilization of shared backend resources by enforcing prioritization, applying fairness policies, managing |
| 19 | +request queuing under saturation, and orchestrating displacement (the eviction of lower-priority queued items to make |
| 20 | +space for higher-priority ones). |
| 21 | + |
| 22 | +It is designed for extensibility, allowing custom logic for policies and queuing mechanisms to be plugged into a robust, |
| 23 | +high-performance orchestration engine. |
| 24 | + |
| 25 | +### Role in the Gateway API Inference Extension |
| 26 | + |
| 27 | +Within the Gateway API Inference Extension's Endpoint Picker (EPP), the Flow Controller acts as a crucial gatekeeper |
| 28 | +between the Routing and Scheduling layers. It decides *if* and *when* a request, already assigned to a logical flow |
| 29 | +(e.g., a specific workload or tenant), should proceed to be scheduled onto a backend resource. It is the primary |
| 30 | +mechanism for managing diverse SLOs, ensuring fairness among competing workloads, and maintaining system stability under |
| 31 | +high load. |
| 32 | + |
| 33 | +### High Level Architecture |
| 34 | + |
| 35 | +The following diagram illustrates the high-level dependency model and request flow for the system. It shows how |
| 36 | +concurrent client requests are managed by the central `FlowController`, which in turn relies on a set of decoupled |
| 37 | +components to make its decisions. Each component package in this module will contain its own more detailed architectural |
| 38 | +diagrams. |
| 39 | + |
| 40 | +```mermaid |
| 41 | +graph LR |
| 42 | + %% Style Definitions |
| 43 | + classDef default fill:#fff,stroke:#333,stroke-width:1.5px,color:#000; |
| 44 | + classDef client fill:#dcfce7,stroke:#333; |
| 45 | + classDef system_entry fill:#fef9c3,stroke:#333; |
| 46 | + classDef downstream_ok fill:#dbeafe,stroke:#333; |
| 47 | + classDef downstream_err fill:#fee2e2,stroke:#333; |
| 48 | +
|
| 49 | + %% Client Goroutines (Fan-In) |
| 50 | + subgraph Client Goroutines |
| 51 | + direction TB |
| 52 | + R1(Goroutine 1); |
| 53 | + R2(Goroutine N); |
| 54 | + end |
| 55 | +
|
| 56 | + %% Flow Control System |
| 57 | + subgraph Flow Control System |
| 58 | + C{Flow Controller Engine}; |
| 59 | +
|
| 60 | + subgraph Internal Interactions |
| 61 | + direction LR |
| 62 | + D(Ports) -- "abstracts state" --> E(Flow Registry); |
| 63 | + D -- "abstracts load" --> SD(Saturation Detector); |
| 64 | + E -- "configures" --> F(Framework); |
| 65 | + F -- "defines" --> P(Plugins: Queues & Policies); |
| 66 | + end |
| 67 | +
|
| 68 | + C -- "Orchestrates via<br>abstractions" --> D; |
| 69 | + end |
| 70 | +
|
| 71 | + %% Downstream Actions (Fan-Out) |
| 72 | + subgraph Downstream Actions |
| 73 | + direction TB |
| 74 | + A1(Outcome: Dispatched<br>Proceed to Scheduler); |
| 75 | + A2(Outcome: Rejected<br>Return Error); |
| 76 | + end |
| 77 | +
|
| 78 | + %% Connections |
| 79 | + R1 -- "calls & blocks" --> C; |
| 80 | + R2 -- "calls & blocks" --> C; |
| 81 | + C -- "unblocks 'goroutine 1'" --> A1; |
| 82 | + C -- "unblocks 'goroutine N'" --> A2; |
| 83 | +
|
| 84 | + %% Apply Classes |
| 85 | + class R1,R2 client; |
| 86 | + class C system_entry; |
| 87 | + class A1 downstream_ok; |
| 88 | + class A2 downstream_err; |
| 89 | + class D,E,F,P,SD default; |
| 90 | +``` |
| 91 | + |
| 92 | +## Architectural Pillars |
| 93 | + |
| 94 | +The Flow Controller framework is built on several key components that work in concert. This architecture is designed to |
| 95 | +be highly modular and scalable, with clear separation of concerns. For a deep dive into the specific design choices and |
| 96 | +their justifications, please refer to the detailed documentation within the relevant sub-packages. |
| 97 | + |
| 98 | +1. **The `FlowController` Engine (`./controller`)**: The central, sharded orchestrator responsible for the main request |
| 99 | + processing loop. It manages a pool of workers that distribute incoming requests, apply policies, and dispatch |
| 100 | + requests to the backends. Its design focuses on high throughput and backpressure. |
| 101 | + |
| 102 | +2. **Pluggable `Policy` Framework (`./framework`)**: This defines the core interfaces for all pluggable logic. It |
| 103 | + features a two-tier policy system for `InterFlow` (decisions *between* different flows) and `IntraFlow` |
| 104 | + (decisions *within* a single flow) logic, covering both request dispatch and displacement. |
| 105 | + |
| 106 | +3. **Extensible `SafeQueue` System (`./framework`)**: This defines the `framework.SafeQueue` interface for |
| 107 | + concurrent-safe request storage. It uses a `QueueCapability` system that allows for diverse and extensible queue |
| 108 | + implementations (e.g., FIFO, Priority Heap) while maintaining a stable interface. |
| 109 | + |
| 110 | +4. **The `FlowRegistry` (`./registry`, `./ports`)**: This is the stateful control plane of the system. It manages the |
| 111 | + configuration and lifecycle of all flows, policies, and queues. It presents a sharded view of its state to the |
| 112 | + `FlowController` workers to enable parallel operation with minimal lock contention. |
| 113 | + |
| 114 | +5. **Core Types and Service Ports (`./types`, `./ports`)**: These packages define the foundational data structures |
| 115 | + (e.g., `FlowControlRequest`), errors, and service interfaces that decouple the engine from its dependencies, |
| 116 | + following a "Ports and Adapters" architectural style. |
0 commit comments