Skip to content

docs: add Miles server arguments#517

Merged
guapisolo merged 62 commits intoradixark:mainfrom
Ratish1:docs/server-args
Feb 11, 2026
Merged

docs: add Miles server arguments#517
guapisolo merged 62 commits intoradixark:mainfrom
Ratish1:docs/server-args

Conversation

@Ratish1
Copy link
Copy Markdown
Contributor

@Ratish1 Ratish1 commented Jan 25, 2026

This PR adds complete docs for Miles server arguments.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Ratish1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the documentation for the Miles server by introducing a dedicated guide that meticulously lists and explains all command-line arguments. This new resource provides users with an exhaustive reference for fine-tuning Miles' behavior, covering everything from hardware resource allocation and training backend selection to advanced reinforcement learning parameters and debugging options, ultimately streamlining the configuration process and improving the overall user experience.

Highlights

  • New Documentation File: A new comprehensive documentation file, docs/en/advanced/miles_server_args.md, has been added to detail all available command-line arguments for configuring the Miles server.
  • Detailed Argument Listing: The new documentation meticulously lists and describes arguments across various categories, including Cluster and Resource Management, Training Backend, Rollout Management, Data Handling, Evaluation, RL Algorithms, Logging, Fault Tolerance, Miles Router, Reward Models, Buffer Management, Multi-Token Prediction (MTP), SGLang, FSDP, Debugging, and Environment Variables.
  • Minor Code Cleanup: A small formatting correction was made in miles/utils/arguments.py to remove a trailing comma from the help string of the --log-reward-category argument.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive documentation for the Miles server arguments in a new markdown file. The documentation is well-organized and detailed. My review focuses on improving clarity and consistency in the argument descriptions. I've pointed out a couple of minor inconsistencies that could confuse users. Overall, this is a valuable addition to the project.

Comment thread docs/en/advanced/miles_server_args.md Outdated
Comment thread docs/en/advanced/miles_server_args.md Outdated
Comment thread miles/utils/arguments.py
@zijiexia
Copy link
Copy Markdown
Contributor

Hi @Ratish1 does it makes sense to add the arguments for checkpointings? Also, I think there're some arguments not covered in this PR, e.g., FSDP: --deterministic-mode. It might make sense to provide a comprehensive overview of all arguments

@Ratish1
Copy link
Copy Markdown
Contributor Author

Ratish1 commented Jan 27, 2026

Hi @Ratish1 does it makes sense to add the arguments for checkpointings? Also, I think there're some arguments not covered in this PR, e.g., FSDP: --deterministic-mode. It might make sense to provide a comprehensive overview of all arguments

Hey @zijiexia , I have added even more server arguments, could you let me know if it looks good now?. Thanks

Comment thread docs/en/advanced/miles_server_args.md Outdated
Comment thread docs/en/advanced/miles_server_args.md Outdated
Comment thread docs/en/advanced/miles_server_args.md Outdated
Comment thread docs/en/advanced/miles_server_args.md Outdated
@Ratish1 Ratish1 requested a review from zijiexia January 28, 2026 05:51
@zijiexia
Copy link
Copy Markdown
Contributor

Hi @Ratish1 , I've made some changes to the docs, see this PR: Ratish1#1, please let me know what you think. Thanks!

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

  1. Total GPUs per node on the machine. Specify if using fewer than 8 GPUs per node in colocate mode.

This do not affect disaggregated mode?

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

These expressions are clear. I just feel quite strange.

image

What if users turn on these params when using disaggregate? In other words, I think we should not have these parameters. Just have --collocate is enough.

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

Disable weights backuper to save host memory. By default, this feature is enabled.

Please explain in one or two lines what is weights-backuper and it's trade off.

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

image

explain in what case we shall keep the old actor, and the trade off to keeep it.

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

Path to the Huggingface checkpoint used to initialize SGLang and provide the tokenizer. It must have the same architecture as the model being trained. It doesn't necessarily need to contain the most up-to-date parameters.

looks weird to me. I think we can only keep this: Path to the Huggingface checkpoint used to initialize SGLang and provide the tokenizer.

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

zhaochenyang20 commented Jan 30, 2026

image

explain what's the relationship between these three parameters.

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

Skip special tokens in the response. Useful when the response is used as a prompt for the next rollout.

is this needed in multi-turn RL. If so, please stress this

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

It may be hard to pass special tokens in command line, in that case --rollout-stop-token-ids can be used.

what does this mean?

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

image

explain the relationship

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

If both --num-epoch and --num-rollout are set, --num-epoch will be ignored.

--num-epoch will overwrite --num-rollout if both are set.

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

image

Put prompt data before --disable-rollout-global-dataset

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

Disable the global dataset for rollout. If set, the rollout will use the --prompt-data as the prompt dataset, and the prompts for rollout will be sampled from the dataset. If not set, you need to manage the data by yourself.

What do you mean by manage the data by yourself? Please make it rather clearer.

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

image

I think this If you want to use a custom template, you can set --apply-chat-template to true is redundant. Put these descriptions in --apply-chat-template and explain how to use customized chat template

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

image Put the first two arguements later with its related arguements.

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

image

What is gbs here? And, please state that settng --num-steps-per-rollout to n means that each batch of rollout data should update the policy model n times.

And, should the default value be 1 but not None?

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

Did you guys checked the consistency with https://github.com/radixark/miles/blob/main/docs/en/get_started/quick_start.md?

If not, please check it 😂

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

image

I think "max tokens per gpu should be around max_response_len // cp_size instead of max_response_len"

max tokens per gpu should strongly related with sequence length (promt + response). Not only response length?

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

image

I am quite confused with this parameters. I think for a given response, we should calculate all the tokens' log probs. The current explanation let me feel that some tokens out of the --log-probs-max-tokens-per-gpu will not calculate the log probs.
This seems to be batch size parameter for megatron that how many log probs are calcualted each batch?

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

I quite don't understand:

image

why this is related with verl? Make clearer explanation and trade off.

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

image
  1. explain the first one more detailedly.
  2. are you sure it's for per GPU? Micro batch size per GPU? the parame name does not has per GPU.

Comment thread docs/en/advanced/miles_server_args.md Outdated
| `--tis-clip-low` | Lower bound clipping threshold C for importance sampling ratios to control variance. | `0.0` | Type: float | Miles Native |
| `--custom-tis-function-path` | Path to a custom TIS or MIS function. [Ref](../get_started/customization.md#10-custom-tisrs-function---custom-tis-function-path) | `None` | Type: str | Miles Native |
| `--custom-pg-loss-reducer-function-path` | Custom reducer function for policy gradient loss. [Ref](../get_started/customization.md#11-custom-pg-loss-reducer---custom-pg-loss-reducer-function-path) | `None` | Type: str | Miles Native |
| `--use-routing-replay` | Enable [Routing Replay](https://arxiv.org/abs/2507.18071). | `False` | bool flag (set to enable) | Miles Native |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable r2 for MoE: record expert routing decisions during forward and replay them during backward. Will be automatically set to True when enable --use-rollout-routing-replay

| `--balance-data` | Repartition each rollout batch so each data-parallel rank gets a similar total token count via the Karmarkar-Karp method. It may be beneficial for training speed, but changes per-rank sample grouping and adds a small CPU scheduling overhead. | `False` | bool flag (set to enable) | Miles Native |
| `--data-pad-size-multiplier` | Multiplier used to calculate the sequence padding boundary. Miles rounds sequence lengths up to a multiple of `tensor_parallel_size * data_pad_size_multiplier`. This optimization ensures that matrix dimensions are aligned with NVIDIA Tensor Core requirements, maximizing throughput and reducing VRAM fragmentation. | `128` | Type: int | Miles Native |
| `--micro-batch-size` | Micro batch size per GPU. Ignored when `--use-dynamic-batch-size` is enabled. | `1` | Type: int | Megatron-LM (Reset by Miles) |

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add one more parameter --seq-length? It's a very confusing param in Megatron but not effective in miles at all. Ref #574 (comment)

Comment thread docs/en/advanced/miles_server_args.md Outdated
| `--n-samples-per-prompt` | Number of responses to generate for each prompt, e.g., the group size of GRPO. | `1` | Type: int | Miles Native |
| `--global-batch-size` | Total samples per optimizer step. Automatically calculated or **overridden** if `num_steps_per_rollout` is set. | `None` | Type: int | Megatron-LM (Reset by Miles) |
| `--num-steps-per-rollout` | The number of training steps to perform using the data collected in a single rollout round. Setting this to `n` means the policy model will be updated `n` times using the same batch of rollout data. Miles ensures that `(rollout-batch-size * n-samples-per-prompt) = (global-batch-size * num-steps-per-rollout)`. If this value is not provided, you have to set `--global-batch-size` explicitly. If both are provided, `--num-steps-per-rollout` will **override** the global batch size with `num_steps_per_rollout = (rollout_batch_size * n_samples_per_prompt) // num_steps_per_rollout`. | `None` | Type: int | Miles Native |
| `--use-dynamic-batch-size` | Dynamically packs variable-length samples into micro-batches to maximize GPU utilization, ensuring the total token count per batch does not exceed `--max-tokens-per-gpu`. For example, with a 300-token limit, samples of lengths 100, 200, and 300 would be packed into two batches: `[100, 200]` and `[300]`. **Note:** Miles ensures that enabling this optimization does not affect the mathematical correctness of per-sample or per-token loss calculation. It is **strongly recommended** to enable this for maximum efficiency. | `False` | bool flag (set to enable) | Miles Native |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can only be enabled when --qkv-format is thd, not work for bshd

Comment thread docs/en/advanced/miles_server_args.md Outdated
| Argument | Description | Default | Options | Source |
| :--- | :--- | :--- | :--- | :--- |
| `--train-backend` | The backend for training. Highly suggest Megatron for numerical stability and efficiency. | `"megatron"` | `megatron`, `fsdp` | Miles Native |
| `--qkv-format` | The QKV layout. | `"thd"` | `thd`, `bshd` | Miles Native |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write more about this param. New models may not support thd but bshd only.

Copy link
Copy Markdown
Collaborator

@yueming-yuan yueming-yuan Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may say sth like: If or not to pack all sequences with variant lengths into tokens dimension. By default use thd format because it will faster than bshd by saving padding overhead. However, for new models with novel attention architectures (e.g. sparse, attention sink), thd format may lack training backend support. Use bshd to train those models.

Comment thread docs/en/advanced/miles_server_args.md Outdated
| `--log-probs-max-tokens-per-gpu` | The maximum number of tokens per GPU for calculating log probs. This is used to calculate the log probs of the responses during rollout, and should be set to a larger value than `max_tokens_per_gpu` if you want better performance. | `None` | Type: int | Miles Native |
| `--balance-data` | Repartition each rollout batch so each data-parallel rank gets a similar total token count via the Karmarkar-Karp method. It may be beneficial for training speed, but changes per-rank sample grouping and adds a small CPU scheduling overhead. | `False` | bool flag (set to enable) | Miles Native |
| `--data-pad-size-multiplier` | Multiplier used to calculate the sequence padding boundary. Miles rounds sequence lengths up to a multiple of `tensor_parallel_size * data_pad_size_multiplier`. This optimization ensures that matrix dimensions are aligned with NVIDIA Tensor Core requirements, maximizing throughput and reducing VRAM fragmentation. | `128` | Type: int | Miles Native |
| `--micro-batch-size` | Micro batch size per GPU. Ignored when `--use-dynamic-batch-size` is enabled. | `1` | Type: int | Megatron-LM (Reset by Miles) |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not work for --qkv-format=thd

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean --micro-batch-size? this also works for thd; both dynamic/specified micro batch size work for thd but only specified one works for bshd

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Yueming is right. Ignore my words.

Comment thread docs/en/advanced/miles_server_args.md Outdated
| `--sglang-mem-fraction-static` | Fraction of GPU memory to reserve for SGLang KV cache. | `0.9` | Type: float | SGLang |
| `--sglang-server-concurrency` | Maximum number of concurrent requests. | `512` | Type: int | SGLang |
| `--sglang-router-ip` | IP address of the SGLang router. | `None` | Type: str | SGLang Gateway |
| `--sglang-router-port` | Port of the SGLang router. | `None` | Type: int | SGLang Gateway |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And miles router.


| Argument | Description | Default | Options | Source |
| :--- | :--- | :--- | :--- | :--- |
| `--sglang-mem-fraction-static` | Fraction of GPU memory to reserve for SGLang KV cache. | `0.9` | Type: float | SGLang |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0.9 here ? It's too large. 0.7~0.8 is good.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default value for sglang mem fraction static is 0.9.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay That's fine..

Comment thread docs/en/advanced/miles_server_args.md Outdated
| Argument | Description | Default | Options | Source |
| :--- | :--- | :--- | :--- | :--- |
| `--check-weight-update-equal` | Verify that weight updates are equal across ranks. | `False` | bool flag (set to enable) | Miles Native |
| `--save-debug-rollout-data` | Path to save rollout data for offline analysis. | `None` | Type: str | Miles Native |
Copy link
Copy Markdown
Collaborator

@guapisolo guapisolo Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add --save-debug-rollout-data, --load-debug-rollout-data, --debug-rollout-only, --debug-train-only refers to debug.md

Comment thread docs/en/advanced/miles_server_args.md Outdated
| `--disable-grpo-std-normalization` | Disable standard deviation normalization for GRPO. From [Dr.GRPO](https://arxiv.org/pdf/2503.20783) | `False` | bool flag (set to enable) | Miles Native |
| `--disable-rewards-normalization` | Disable the default group-wise reward normalization for GRPO, GSPO, and REINFORCE++. This effectively skips the baseline subtraction step. | `False` | bool flag (set to enable) | Miles Native |
| `--use-rollout-entropy` | Enable entropy calculation when calculating the logprobs from actor and reference model. This is useful for implementing custom entropy-based loss masking. | `False` | bool flag (set to enable) | Miles Native |
| `--use-rollout-logprobs` | Use rollout logprobs for importance sampling ratios, use the logprobs from the actor model if not set. If `--get-mismatch-metrics` is set, the log probs will be recomputed by training engine, one more forward pass will be applied. | `False` | bool flag (set to enable) | Miles Native |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz check code logics in loss.py. Maybe it's should be "use rollout logprobs as the old policy logprobs for importance sampling ratios in GRPO/GSPO" ?

| `--rollout-batch-size` | Number of prompts per rollout batch. The total data returned should be `rollout_batch_size` * `n_samples_per_prompt`. | Required | Type: int | Miles Native |
| `--n-samples-per-prompt` | Number of responses to generate for each prompt, e.g., the group size of GRPO. | `1` | Type: int | Miles Native |
| `--global-batch-size` | Total samples per optimizer step. Automatically calculated or **overridden** if `num_steps_per_rollout` is set. | `None` | Type: int | Megatron-LM (Reset by Miles) |
| `--num-steps-per-rollout` | The number of training steps to perform using the data collected in a single rollout round. Setting this to `n` means the policy model will be updated `n` times using the same batch of rollout data. Miles ensures that `(rollout-batch-size * n-samples-per-prompt) = (global-batch-size * num-steps-per-rollout)`. If this value is not provided, you have to set `--global-batch-size` explicitly. If both are provided, `--num-steps-per-rollout` will **override** the global batch size with `num_steps_per_rollout = (rollout_batch_size * n_samples_per_prompt) // num_steps_per_rollout`. | `None` | Type: int | Miles Native |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check what will happend if multi samples are returned in the generate function (maybe the refactored one)?

Comment thread miles/utils/arguments.py
choices=["thd", "bshd"],
default="thd",
help="The qkv layout for Megatron backend.",
help="The qkv layout.",
Copy link
Copy Markdown
Collaborator

@guapisolo guapisolo Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change here? More details needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I changed here because the same parameter is also applies to FSDP backend. I wonder if this could make it less confusing.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, this was a historical issue, at the beginning I only supported for megatron, but after refactor, as general utils, it should also work for fsdp... (though I did not test fsdp+bshd)

Comment thread miles/utils/arguments.py
"which will be used as the prompt and the label respectively. "
"If you want to use a custom template, you can set --apply-chat-template to true, in that case, "
"the input should be the same structure as an openai message, e.g. [{'role': 'user', 'content': 'blabla'}]. "
"which will be used as the prompt and the label respectively."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change here?

Comment thread docs/en/advanced/miles_server_args.md Outdated
| `--max-tokens-per-gpu` | The maximum number of tokens (Prompt + Response combined) per GPU for dynamic batch size. This parameter defines the total sequence length budget for packing samples into micro-batches during training. Note that when enabling context parallel (CP), the effective capacity is shared, so the value should be approximately `(Total_Sequence_Length) // cp_size`. | `None` | Type: int | Miles Native |
| `--log-probs-max-tokens-per-gpu` | The maximum number of tokens per GPU for calculating log probs. This is used to calculate the log probs of the responses during rollout, and should be set to a larger value than `max_tokens_per_gpu` if you want better performance. | `None` | Type: int | Miles Native |
| `--balance-data` | Repartition each rollout batch so each data-parallel rank gets a similar total token count via the Karmarkar-Karp method. It may be beneficial for training speed, but changes per-rank sample grouping and adds a small CPU scheduling overhead. | `False` | bool flag (set to enable) | Miles Native |
| `--data-pad-size-multiplier` | Multiplier used to calculate the sequence padding boundary. Miles rounds sequence lengths up to a multiple of `tensor_parallel_size * data_pad_size_multiplier`. This optimization ensures that matrix dimensions are aligned with NVIDIA Tensor Core requirements, maximizing throughput and reducing VRAM fragmentation. | `128` | Type: int | Miles Native |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a notice: Better not change this. <128 may trigger accuracy loss under thd with TP>=4.

@guapisolo
Copy link
Copy Markdown
Collaborator

guapisolo commented Feb 10, 2026

This line-level comment is hard to use :( —if a comment spans multiple lines, the line to change should be the last one.

Comment thread docs/en/advanced/miles_server_args.md Outdated

| Argument | Description | Default | Options | Source |
| :--- | :--- | :--- | :--- | :--- |
| `--check-weight-update-equal` | Verify that weight updates are equal across ranks. | `False` | bool flag (set to enable) | Miles Native |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `--check-weight-update-equal` | Verify that weight updates are equal across ranks. | `False` | bool flag (set to enable) | Miles Native |
| `--check-weight-update-equal` | Use SGLang's weight checker to check and ensure that the loaded weight from HF checkpoint and received from Megatron are bit-wise equal. | `False` | bool flag (set to enable) | Miles Native |

(suggest to be more specific on this)

@Ratish1
Copy link
Copy Markdown
Contributor Author

Ratish1 commented Feb 10, 2026

This line-level comment is hard to use :( —if a comment spans multiple lines, the line to change should be the last one.

Correct me if i'm wrong, but I think you can drag with the + sign specifically like this. I'm not sure if this already existed since I dont reveiw PR's often. lmk about it.
image

@Ratish1
Copy link
Copy Markdown
Contributor Author

Ratish1 commented Feb 10, 2026

Hey @guapisolo @yueming-yuan , thank you so much for the reviews. I have addressed all of them, let me know if you need more changes.

Also @guapisolo , thanks for the note about “multi-sample” returns from the generate function. I clarified this under --custom-generate-function-path: in the refactored interface a custom generate can return list[Sample], but the default rollout/training pipelines expect each prompt group to be a flat list[Sample] of length --n-samples-per-prompt (they assert len(group) == n_samples_per_prompt). so if users return multi-samples per generate call, they’ll need a compatible rollout pipeline that handles that structure. Lmk if this sounds good. Thanks again

Copy link
Copy Markdown
Collaborator

@guapisolo guapisolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only the description of group-rm needs change. @Ratish1 @zijiexia

@zhaochenyang20
Copy link
Copy Markdown
Collaborator

zhaochenyang20 commented Feb 10, 2026

after fixing Jiajun's comment:

#517 (review)

I will merge this PR.

@Ratish1 @zijiexia then, we can further work on this. #578

Great job so far! with over 130+ conversations 😂

@guapisolo guapisolo merged commit 9286b1a into radixark:main Feb 11, 2026
12 checks passed
@Ratish1 Ratish1 deleted the docs/server-args branch February 11, 2026 07:08
dougyster pushed a commit that referenced this pull request Feb 14, 2026
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
fzyzcjy pushed a commit that referenced this pull request Mar 19, 2026
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
JD-ETH pushed a commit to JensenFire/miles that referenced this pull request Apr 11, 2026
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
GuanxingLu pushed a commit to GuanxingLu/miles that referenced this pull request Apr 21, 2026
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants