Skip to content

Conversation

@Hecate0821
Copy link

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Hecate0821, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces new documentation for the 'Miles Router' component, a lightweight HTTP router/proxy designed for training-oriented capabilities in the miles framework. The documentation clarifies its function in RL rollout, detailing how it enables features like radix-tree caching for transparent token management and Rollout Routing Replay (R3) for MoE models. It also provides a comparative analysis with the SGLang Model Gateway, guiding users on when to leverage each router based on their specific needs.

Highlights

  • New Documentation for Miles Router: A comprehensive documentation page has been added for the 'Miles Router', explaining its purpose, functionality, and role within the miles framework for RL rollout and data generation.
  • Key Features Detailed: The new documentation elaborates on two critical features of Miles Router: the Radix-tree cache for transparent token management and Rollout Routing Replay (R3) for Mixture-of-Experts (MoE) models, highlighting how they address specific challenges in RL training.
  • Comparison with SGLang Model Gateway: The document provides a clear distinction between Miles Router and SGLang Model Gateway, outlining their different optimization goals and use cases, particularly emphasizing Miles Router's ability to preserve metadata crucial for RL-specific features.
  • Documentation Integration: Existing documentation files have been updated to link to the new Miles Router guide, ensuring discoverability and proper integration within the project's advanced features section.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds comprehensive documentation for the Miles Router, explaining its purpose, features like the radix-tree cache and Rollout Routing Replay (R3), and its differences from the SGLang Model Gateway. However, the features it describes have significant security vulnerabilities in their current implementation. Specifically, there is a high risk of Server-Side Request Forgery (SSRF) due to unauthenticated worker registration, and a risk of cross-user data leakage in the shared radix-tree cache. These issues should be addressed in the code and noted in the documentation. The documentation itself is well-written, but there are a few minor suggestions to improve clarity and readability. Additionally, please correct the typo in the pull request title from 'slime' to 'miles'.

Comment on lines +32 to +47
### 2.1 Radix-tree cache (transparent token management)

> Use this when your rollout pipeline is text-in/text-out and you cannot reliably persist token IDs; if you already control token-in/token-out (e.g. search r1, multiturn VLM examples), you likely don't need the radix-tree cache.

Text-in text-out interfaces can cause token retokenization mismatches - re-tokenizing text at training time may produce different token sequences than rollout, breaking per-token alignment needed for PPO/GRPO losses.

The radix-tree cache solves this transparently: it intercepts text-based requests, tokenizes them, and stores trajectories (text, token IDs, logprobs, loss masks) keyed by the text prefix. After rollout finishes, calling `/retrieve_from_text` returns the exact token sequence with aligned metadata, without requiring any changes to your rollout code.

Implementation-wise, the radix-tree cache:

- Accepts text plus tokens/metadata and stores them in a radix tree
- Uses longest-prefix matching to reuse cached token sequences (enabling token-in/token-out downstream)
- Allows insertion of new text continuations as rollout proceeds (multiple trajectories per prompt, e.g. GRPO)
- Periodically cleans up stale nodes to control memory usage

Use the radix cache when you have text-based rollout code and want token-level precision without rewriting, or when running GRPO with multiple trajectories sharing the same prompt prefix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The radix-tree cache mechanism described here lacks user isolation, as it is shared across all users and keyed solely by text prefixes. This allows for cross-user data leakage, where one user can retrieve cached token sequences and metadata for another user's prompts. It also enables cache poisoning attacks. The documentation should be updated to reflect these security considerations, and the implementation should be revised to provide data isolation.

Comment on lines +11 to +13
- Registers workers (SGLang HTTP servers) into a local pool
- Routes requests to a selected worker (simple least-inflight load balancing)
- Proxies arbitrary paths to the selected worker (e.g. `/generate`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The MilesRouter implements a proxying mechanism that allows unauthenticated registration of arbitrary worker URLs via the /add_worker endpoint. This creates a significant Server-Side Request Forgery (SSRF) risk, as an attacker can register internal or restricted URLs and then use the router to access them. Since this documentation introduces the feature, it should include a prominent security warning, and the underlying implementation in miles/router/router.py should be updated to include authentication and URL validation.

- Runs periodic health checks and quarantines unhealthy workers
- Supports middleware plugins (via `--miles-router-middleware-paths`) to implement rollout-specific processing (e.g. caching, request/response transforms)

In miles's architecture, the router is part of the rollout system ("SGLang + router") that generates samples and pushes them into the data buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The possessive form miles's is a bit awkward to read. For better flow, I suggest rephrasing to "In the miles architecture".

Suggested change
In miles's architecture, the router is part of the rollout system ("SGLang + router") that generates samples and pushes them into the data buffer.
In the miles architecture, the router is part of the rollout system ("SGLang + router") that generates samples and pushes them into the data buffer.

- `--enable-return-routed-experts`: server argument to enable routing capture
- `RoutedExpertsCapturer`: captures `topk_ids` (selected expert IDs) at each MoE layer during forward pass
- `return_routed_experts`: request parameter to retrieve routing data
- Returns `routed_experts` in response `meta_info` - a `[seq_len - 1, num_layers, top_k]` tensor of expert IDs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This sentence could be clearer. I suggest rephrasing to explicitly state that routed_experts is in the meta_info field and then describe the tensor.

Suggested change
- Returns `routed_experts` in response `meta_info` - a `[seq_len - 1, num_layers, top_k]` tensor of expert IDs
- Returns `routed_experts` in the response's `meta_info` field - a `[seq_len - 1, num_layers, top_k]` tensor of expert IDs


MilesRouter is a lightweight Python/FastAPI proxy that acts as a passthrough to SGLang workers. This passthrough design enables RL-specific features like radix-tree trajectory caching and R3 (which require preserving raw response metadata like `routed_experts`).

SGLang Model Gateway is a high-performance Rust-based router optimized for large-scale inference: async non-blocking routing, advanced fault tolerance (retries, circuit breakers), multiple load balancing policies (including cache-aware routing), and PD disaggregation support. However, it reconstructs responses with a fixed schema, so it does not preserve the metadata needed for miles's R3 flow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The possessive form miles's is a bit awkward. For better readability, I suggest changing "miles's R3 flow" to "the miles R3 flow".

Suggested change
SGLang Model Gateway is a high-performance Rust-based router optimized for large-scale inference: async non-blocking routing, advanced fault tolerance (retries, circuit breakers), multiple load balancing policies (including cache-aware routing), and PD disaggregation support. However, it reconstructs responses with a fixed schema, so it does not preserve the metadata needed for miles's R3 flow.
SGLang Model Gateway is a high-performance Rust-based router optimized for large-scale inference: async non-blocking routing, advanced fault tolerance (retries, circuit breakers), multiple load balancing policies (including cache-aware routing), and PD disaggregation support. However, it reconstructs responses with a fixed schema, so it does not preserve the metadata needed for the miles R3 flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant