Problem Statement
X-SMG-Target-Worker is useful, but it does not work if we have a different number of Prefill and Decode workers and need to specify the target worker index separately.
Proposed Solution
Support something like "X-SMG-Target-Prefill-Worker" or "X-SMG-Target-Decode-Worker". It would be even better if this works with DP-aware.
Alternatives Considered
No response
Feature Area
Routing & Load Balancing
Affected Component(s)
model-gateway
Use Case
In agentic RL we may have a fixed number of clients the RL system may want to load balance manually.
Priority to You
Nice to have
Contribution
Additional Context
No response
Problem Statement
X-SMG-Target-Worker is useful, but it does not work if we have a different number of Prefill and Decode workers and need to specify the target worker index separately.
Proposed Solution
Support something like "X-SMG-Target-Prefill-Worker" or "X-SMG-Target-Decode-Worker". It would be even better if this works with DP-aware.
Alternatives Considered
No response
Feature Area
Routing & Load Balancing
Affected Component(s)
model-gateway
Use Case
In agentic RL we may have a fixed number of clients the RL system may want to load balance manually.
Priority to You
Nice to have
Contribution
Additional Context
No response