Skip to content

Conversation

@googs1025
Copy link
Collaborator

Pull Request Description

[Please provide a clear and concise description of your changes here]

Related Issues

Resolves: #1728

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @googs1025, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the gateway plugin by adding a new session affinity routing algorithm. This feature allows the system to maintain sticky sessions with specific backend pods, improving consistency for client requests by attempting to route them to the same pod using a session ID. The implementation includes a robust fallback mechanism for scenarios where session affinity cannot be maintained and updates the gateway's response header processing to correctly propagate session-related headers back to the client. This change aims to provide a more stable and predictable routing experience for users.

Highlights

  • New Session Affinity Routing Algorithm: Introduced a new routing algorithm named 'session-affinity' for the gateway plugin. This algorithm aims to route subsequent requests from the same client to the same backend pod.
  • Session ID Based Routing: The session affinity is achieved by using a session ID, x-session-id, stored in the request header. This ID encodes the target pod's IP:Port using base64 encoding.
  • Fallback Mechanism: If a session ID is missing, invalid, or the target pod specified by the session ID is not ready, the router gracefully falls back to selecting a random ready pod.
  • Response Header Propagation: The gateway's response header handling has been updated to propagate router-generated response headers (such as the x-session-id for sticky sessions) back to the client, while correctly skipping HTTP/2 pseudo-headers.
  • Comprehensive Testing: New unit tests have been added for both the sessionAffinityRouter and the updated HandleResponseHeaders logic to ensure correctness and robustness.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a session affinity routing plugin, which is a great feature. The implementation is mostly solid, with good test coverage for the new logic. I've identified a couple of areas for improvement. Firstly, there's a potential bug in RoutingContext where the new RespHeaders field isn't reset when the context is reused from a pool, which could lead to stale data. Secondly, the fallback mechanism in the session affinity router could be made more robust to handle cases where a randomly selected pod is misconfigured. My detailed comments and suggestions are below.

Comment on lines 101 to 116
pods := readyPodList.All()

selected := pods[rand.Intn(len(pods))]
port := utils.GetModelPortForPod(ctx.RequestID, selected)
if port == 0 || selected.Status.PodIP == "" {
return "", fmt.Errorf("selected pod has no valid network address")
}
addr := net.JoinHostPort(selected.Status.PodIP, strconv.Itoa(int(port)))

ctx.SetTargetPod(selected)
r.setSessionHeader(ctx, addr)
klog.V(5).Infof("Fallback to random pod: %s (%s)", selected.Name, addr)

return ctx.TargetAddress(), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of fallbackRoute randomly selects a single pod and fails the entire request if that pod happens to have an invalid network address (e.g., port configured to 0). This is not very robust, as one misconfigured pod could cause random failures even if other healthy pods are available. A better approach would be to iterate through the available pods in a random order until a valid one is found.

pods := readyPodList.All()
	rand.Shuffle(len(pods), func(i, j int) { pods[i], pods[j] = pods[j], pods[i] })

	for _, selected := range pods {
		port := utils.GetModelPortForPod(ctx.RequestID, selected)
		// A routable pod must have a valid IP and port.
		if port == 0 || selected.Status.PodIP == "" {
			klog.V(4).Infof("Fallback skipping pod %s with invalid network address (IP: %s, Port: %d)", selected.Name, selected.Status.PodIP, port)
			continue
		}
		addr := net.JoinHostPort(selected.Status.PodIP, strconv.Itoa(int(port)))

		ctx.SetTargetPod(selected)
		r.setSessionHeader(ctx, addr)
		klog.V(5).Infof("Fallback to random pod: %s (%s)", selected.Name, addr)

		return ctx.TargetAddress(), nil
	}

	return "", fmt.Errorf("no fallback pod found with a valid network address")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@googs1025
Copy link
Collaborator Author

+---------------------+
|      Client         |
+----------+----------+
           |
           |  HTTP Request
           |  (with optional x-session-id header)
           v
+---------------------+
|   Aibrix Gateway Plugin  |
+----------+----------+
           |
           | Routing Decision
           v
+-----------------------------+
| Session Affinity Router     |  ←───┐
| (sessionAffinityRouter)     |      │
+-----------------------------+      │
           |                         │ Uses
           | 1. Reads x-session-id   │
           | 2. Decodes → IP:Port    │
           | 3. Matches Pod          │
           v                         │
+---------------------+             │
|   Fallback Router   | <───────────┘
+----------+----------+
           |
           | Selected Pod Address
           v
+---------------------+
|   Ready Pod List    |
|  Endpoints / Pods)  |
+----------+----------+
           |
           | Target Pod Info
           v
+---------------------+
|    Backend Pod      |
| (vLLM / LLM Server) |
+----------+----------+
           |
           | HTTP Response
           | (Set-Cookie or x-session-id)
           v
+---------------------+
|      Client         |
+---------------------+

@googs1025 googs1025 force-pushed the session_affinity_plugin branch 2 times, most recently from 014d0b3 to 6fa2832 Compare November 13, 2025 12:29
@varungup90
Copy link
Collaborator

Can you describe how the workflow will be.

  • User directly sends "session-id": "1.1.1.1:8000" in first request OR user does first request then takes the backup of target-pod-address and then uses for next requests?

My thought process is that, we should have user provide UID as session-id and session affinity router maintains small structure to track session-id to target pod address with 1 hour TTL.

@googs1025
Copy link
Collaborator Author

Thanks for the great question!

In the current design, I use a lightweight, stateless session affinity ID where the client carries an opaque session token that encodes the target pod’s IP:Port. Here’s how it works:

  • First request: The client sends a request without the x-session-id header.
    The gateway falls back to a standard routing strategy (e.g., random selection) and picks a ready pod.
    It then base64-encodes the pod’s IP:Port (e.g., "10.244.1.5:8000" → "MTAuMjQ0LjEuNTo4MDAw") and returns it in the response header (x-session-id).
    The client (or frontend SDK) stores this value—ideally in a secure, HttpOnly cookie—for future requests.

  • Subsequent requests: The client includes the same x-session-id in the request header.
    The gateway decodes it to retrieve the intended IP:Port, checks if that pod is still ready, and routes the request accordingly.
    If the pod is no longer available (e.g., scaled down, restarted), the gateway automatically falls back to a new ready pod and issues a new session ID in the response.

@googs1025
Copy link
Collaborator Author

Can you describe how the workflow will be.

  • User directly sends "session-id": "1.1.1.1:8000" in first request OR user does first request then takes the backup of target-pod-address and then uses for next requests?

My thought process is that, we should have user provide UID as session-id and session affinity router maintains small structure to track session-id to target pod address with 1 hour TTL.

I agree that the UUID approach you suggested 😄. The client uses an abstract session identifier and the gateway plugin maintains a short lived mapping from UUID to pod address (e.g., with a 1-hour TTL)—is also a strong alternative. It would improve security by hiding backend topology and allow more flexible session management.

@googs1025
Copy link
Collaborator Author

I will refer to this approach to update the existing implementation.

Can you describe how the workflow will be.

  • User directly sends "session-id": "1.1.1.1:8000" in first request OR user does first request then takes the backup of target-pod-address and then uses for next requests?

My thought process is that, we should have user provide UID as session-id and session affinity router maintains small structure to track session-id to target pod address with 1 hour TTL.

@varungup90
Copy link
Collaborator

varungup90 commented Nov 14, 2025

To summarize, user directly starts with session-id header as UUID, from first request (reducing client burden to read session-id header from first request and applying to subsequent requests).

For gateway, it tracks UUID to pod (has TTL), and internally if the pod fails or gateway finds better pod for that session, it can change the pod associated with that session transparently without user knowing.

@googs1025
Copy link
Collaborator Author

To summarize, user directly starts with session-id header as UUID, from first request (reducing client burden to read session-id header from first request and applying to subsequent requests).

For gateway, it tracks UUID to pod (has TTL), and internally if the pod fails or gateway finds better pod for that session, it can change the pod associated with that session transparently without user knowing.

will update implementation 😄

@googs1025
Copy link
Collaborator Author

googs1025 commented Nov 17, 2025

To summarize, user directly starts with session-id header as UUID, from first request (reducing client burden to read session-id header from first request and applying to subsequent requests).

For gateway, it tracks UUID to pod (has TTL), and internally if the pod fails or gateway finds better pod for that session, it can change the pod associated with that session transparently without user knowing.

Hi @varungup90

Thanks for the great discussion on moving to a uuid based session affinity. I have a quick question about edge-case:

What should we do if the client send an x-session-id that is not a valid UUID ?

My proposed behavior:

  • Validate the x-session-id using uuid.Parse()
  • If invalid, ignore it, fall back to selecting a ready pod randomly
  • Issue a new valid UUID in the response header
  • Log a warning/info message (with requestID) for observability

This ensures robustness and backward compatibility while avoiding request failures due to client-side errors.

@varungup90
Copy link
Collaborator

Sounds good

@googs1025
Copy link
Collaborator Author

@varungup90
I come back to here again:
Session affinity will break in multi-replica deployments because each gateway instance maintains its own in-memory UUID → pod mapping. If requests from the same client hit different replicas, the session ID won’t be recognized, causing unnecessary fallbacks and loss of continuity.

The original approach—where the client carries a token encoding the target pod address (e.g., base64(IP:Port))—works reliably across replicas and avoids this issue.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Nov 27, 2025

@varungup90 @googs1025 what's the status of this PR? ready to go?

@googs1025 googs1025 force-pushed the session_affinity_plugin branch from a098f46 to b6953ed Compare November 28, 2025 06:52
pods := readyPodList.All()
rand.Shuffle(len(pods), func(i, j int) { pods[i], pods[j] = pods[j], pods[i] })

for _, selected := range pods {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pods passed here are in ready state and will have valid IP and port. Since podList is an array, for loop will always select same pod. Can you use rand.Intn based selection.

Copy link
Collaborator Author

@googs1025 googs1025 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! The current approach uses rand.Shuffle to randomize the order of ready pods and then picks the first one with a valid IP and port. This ensures we avoid invalid pods while maintaining randomness.

@varungup90
Copy link
Collaborator

Overall LGTM. One nit comment to randomize fallback route and you can add documentation with a sample.

@googs1025
Copy link
Collaborator Author

you can add documentation with a sample.

I agree that adding documentation with a sample would be helpful. I'd prefer to address this in a follow-up PR where I can update the docs consistently

@googs1025 googs1025 force-pushed the session_affinity_plugin branch from b6953ed to 0ddd4e4 Compare December 2, 2025 02:12
@googs1025 googs1025 requested a review from varungup90 December 3, 2025 08:51
@googs1025 googs1025 force-pushed the session_affinity_plugin branch from 0ddd4e4 to b13e454 Compare December 3, 2025 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Simpler Session Affinity Router Plugin for Consistent Pod Routing Within a Session

3 participants