docs: add Miles server arguments #517

Ratish1 · 2026-01-25T10:12:57Z

This PR adds complete docs for Miles server arguments.

gemini-code-assist · 2026-01-25T10:13:11Z

Summary of Changes

Hello @Ratish1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the documentation for the Miles server by introducing a dedicated guide that meticulously lists and explains all command-line arguments. This new resource provides users with an exhaustive reference for fine-tuning Miles' behavior, covering everything from hardware resource allocation and training backend selection to advanced reinforcement learning parameters and debugging options, ultimately streamlining the configuration process and improving the overall user experience.

Highlights

New Documentation File: A new comprehensive documentation file, docs/en/advanced/miles_server_args.md, has been added to detail all available command-line arguments for configuring the Miles server.
Detailed Argument Listing: The new documentation meticulously lists and describes arguments across various categories, including Cluster and Resource Management, Training Backend, Rollout Management, Data Handling, Evaluation, RL Algorithms, Logging, Fault Tolerance, Miles Router, Reward Models, Buffer Management, Multi-Token Prediction (MTP), SGLang, FSDP, Debugging, and Environment Variables.
Minor Code Cleanup: A small formatting correction was made in miles/utils/arguments.py to remove a trailing comma from the help string of the --log-reward-category argument.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces comprehensive documentation for the Miles server arguments in a new markdown file. The documentation is well-organized and detailed. My review focuses on improving clarity and consistency in the argument descriptions. I've pointed out a couple of minor inconsistencies that could confuse users. Overall, this is a valuable addition to the project.

docs/en/advanced/miles_server_args.md

Ratish1 · 2026-01-25T10:17:41Z

miles/utils/arguments.py

                help=(
                    "Log statistics of the category of reward, such as why the reward function considers it as failed. "
-                    "Specify the key in the reward dict using this argument.",
+                    "Specify the key in the reward dict using this argument."


This was needed so that the command python3 train.py --help would work or else it would return an error

this is merged through another PR now.

zijiexia · 2026-01-27T04:28:40Z

Hi @Ratish1 does it makes sense to add the arguments for checkpointings? Also, I think there're some arguments not covered in this PR, e.g., FSDP: --deterministic-mode. It might make sense to provide a comprehensive overview of all arguments

Ratish1 · 2026-01-27T06:13:40Z

Hi @Ratish1 does it makes sense to add the arguments for checkpointings? Also, I think there're some arguments not covered in this PR, e.g., FSDP: --deterministic-mode. It might make sense to provide a comprehensive overview of all arguments

Hey @zijiexia , I have added even more server arguments, could you let me know if it looks good now?. Thanks

docs/en/advanced/miles_server_args.md

zijiexia · 2026-01-30T07:23:56Z

Hi @Ratish1 , I've made some changes to the docs, see this PR: Ratish1#1, please let me know what you think. Thanks!

Co-authored-by: Ratish P <[email protected]>

Improve Miles Server Args docs

zhaochenyang20 · 2026-01-30T22:27:02Z

Total GPUs per node on the machine. Specify if using fewer than 8 GPUs per node in colocate mode.

This do not affect disaggregated mode?

zhaochenyang20 · 2026-01-30T22:30:15Z

These expressions are clear. I just feel quite strange.

What if users turn on these params when using disaggregate? In other words, I think we should not have these parameters. Just have --collocate is enough.

Updated the description of the --true-on-policy-mode parameter for clarity and added a reference link.

zhaochenyang20 · 2026-01-30T22:58:32Z

Disable weights backuper to save host memory. By default, this feature is enabled.

Please explain in one or two lines what is weights-backuper and it's trade off.

Updated descriptions for several server arguments to improve clarity and added references for better understanding.

zhaochenyang20 · 2026-01-30T23:12:44Z

explain in what case we shall keep the old actor, and the trade off to keeep it.

zhaochenyang20 · 2026-01-31T00:12:15Z

Put prompt data before --disable-rollout-global-dataset

zhaochenyang20 · 2026-01-31T00:16:04Z

Disable the global dataset for rollout. If set, the rollout will use the --prompt-data as the prompt dataset, and the prompts for rollout will be sampled from the dataset. If not set, you need to manage the data by yourself.

What do you mean by manage the data by yourself? Please make it rather clearer.

zhaochenyang20 · 2026-01-31T00:17:31Z

I think this If you want to use a custom template, you can set --apply-chat-template to true is redundant. Put these descriptions in --apply-chat-template and explain how to use customized chat template

zhaochenyang20 · 2026-01-31T00:21:42Z

Put the first two arguements later with its related arguements.

zhaochenyang20 · 2026-01-31T00:29:46Z

What is gbs here? And, please state that settng --num-steps-per-rollout to n means that each batch of rollout data should update the policy model n times.

And, should the default value be 1 but not None?

zhaochenyang20 · 2026-01-31T00:37:48Z

Did you guys checked the consistency with https://github.com/radixark/miles/blob/main/docs/en/get_started/quick_start.md?

If not, please check it 😂

zhaochenyang20 · 2026-01-31T00:43:04Z

I think "max tokens per gpu should be around max_response_len // cp_size instead of max_response_len"

max tokens per gpu should strongly related with sequence length (promt + response). Not only response length?

zhaochenyang20 · 2026-01-31T00:45:33Z

I am quite confused with this parameters. I think for a given response, we should calculate all the tokens' log probs. The current explanation let me feel that some tokens out of the --log-probs-max-tokens-per-gpu will not calculate the log probs.
This seems to be batch size parameter for megatron that how many log probs are calcualted each batch?

zhaochenyang20 · 2026-01-31T00:46:56Z

I quite don't understand:

why this is related with verl? Make clearer explanation and trade off.

zhaochenyang20 · 2026-01-31T00:48:34Z

explain the first one more detailedly.
are you sure it's for per GPU? Micro batch size per GPU? the parame name does not has per GPU.

zhaochenyang20 · 2026-01-31T04:20:41Z

two more suggestions:

Based on our discussion, here are the documentation optimization requirements for the miles_server_args.md file, focused on classification and mapping:

Clear Argument Attribution and Categorization

Source Labeling: Each argument must be explicitly labeled with its origin to distinguish between Miles native arguments, Megatron passthroughs, and SGLang/SGLang Model Gateway arguments.

Explicit Mapping of Passthrough Relationships

Passthrough Logic: Clearly define the relationship between Miles and its underlying frameworks.
Naming Conventions: Document the existing prefix rules to help users identify the target backend:
--sglang-*: Arguments passed directly to SGLang.
--router-*: Arguments directed to the SGLang Model Gateway/Router.
No Prefix: Default arguments corresponding to Megatron-LM.

Ratish1 · 2026-01-31T12:44:30Z

What is gbs here? And, please state that settng `--num-steps-per-rollout` to `n` means that each batch of rollout data should update the policy model `n` times.
And, should the default value be 1 but not None?

Currently in the codebase it seems to be None. is that intended behaviour?, Should I update it in this PR itself?

Ratish1 · 2026-01-31T14:00:48Z

1. explain the first one more detailedly. 2. are you sure it's for per GPU? Micro batch size per GPU? the parame name does not has per GPU.

Yes micro batch size is per GPU

zijiexia · 2026-01-31T22:48:22Z

I am quite confused with this parameters. I think for a given response, we should calculate all the tokens' log probs. The current explanation let me feel that some tokens out of the `--log-probs-max-tokens-per-gpu` will not calculate the log probs. This seems to be batch size parameter for megatron that how many log probs are calcualted each batch?

I feel like this argument is deprecated, I couldn't find anywhere it was referred.

zijiexia · 2026-01-31T18:42:13Z

docs/en/advanced/miles_server_args.md

+| `--use-dynamic-batch-size` | Dynamically packs variable-length samples into micro-batches to maximize GPU utilization, ensuring the total token count per batch does not exceed `--max-tokens-per-gpu`. For example, with a 300-token limit, samples of lengths 100, 200, and 300 would be packed into two batches: `[100, 200]` and `[300]`. **Note:** Miles ensures that enabling this optimization does not affect the mathematical correctness of per-sample or per-token loss calculation. It is **strongly recommended** to enable this for maximum efficiency. | `False` | bool flag (set to enable) | Miles Native |
+| `--max-tokens-per-gpu` | The maximum number of tokens (Prompt + Response combined) per GPU for dynamic batch size. This parameter defines the total sequence length budget for packing samples into micro-batches during training. Note that when enabling context parallel (CP), the effective capacity is shared, so the value should be approximately `(Total_Sequence_Length) // cp_size`. | `None` | Type: int | Miles Native |
+| `--log-probs-max-tokens-per-gpu` | The maximum number of tokens per GPU for calculating log probs. This is used to calculate the log probs of the responses during rollout, and should be set to a larger value than `max_tokens_per_gpu` if you want better performance. | `None` | Type: int | Miles Native |
+| `--balance-data` | Balance the number of tokens between data parallel ranks with `karmarkar_karp` for verl. Note that this may allocate the different response of the same prompt into different training steps. | `False` | Type: bool | Megatron-LM |


Suggested change

| `--balance-data` | Balance the number of tokens between data parallel ranks with `karmarkar_karp` for verl. Note that this may allocate the different response of the same prompt into different training steps. | `False` | Type: bool | Megatron-LM |

| `--balance-data` | Repartition each rollout batch so each data-parallel rank gets a similar total token count via Karmarkar-Karp method. It may be beneficial for training speed but changes per-rank sample grouping and adds a small CPU scheduling overhead. | `False` | Type: bool | Miles Native |

Hi @Ratish1 , can you also change the help in arguments.py accordingly?

zijiexia · 2026-01-31T19:21:28Z

docs/en/advanced/miles_server_args.md

+| `--true-on-policy-mode` | Strictly align SGLang's log probs and training engine's log probs to bit-wise equal. This parameter is only used for FSDP right now. [Ref](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/mismatch/blog-en.md#truly-on-policy-training) | `False` | bool flag (set to enable) | Miles Native |
+| `--train-env-vars` | Extra environment variables for training process, e.g., PyTorch memory management ones. | `{}` | Type: JSON / Dict | Miles Native |
+| `--train-memory-margin-bytes` | Reserved memory margin for training in bytes. Defaults to 1GB. | `1073741824` | Type: int | Miles Native |
+| `--disable-weights-backuper` | Disables the system that backups model weights (Actor, Ref, Old Actor) to CPU RAM. Disabling saves significant host memory but prevents weight-swapping features like KL-divergence. | `False` | bool flag (set to disable) | Miles Native |


Suggested change

| `--disable-weights-backuper` | Disables the system that backups model weights (Actor, Ref, Old Actor) to CPU RAM. Disabling saves significant host memory but prevents weight-swapping features like KL-divergence. | `False` | bool flag (set to disable) | Miles Native |

| `--disable-weights-backuper` | Applies to `megatron` training backend only. Disables the system that backups model weights (Actor, Ref, Old Actor) to CPU RAM. Disabling saves significant host memory but prevents features that rely on weight-swapping, such as computing KL-divergence against a reference model. **Note**: do not set `--ref-load` and `--keep-old-actor` if disable weights backuper. | `False` | bool flag (set to disable) | Miles Native |

zijiexia · 2026-01-31T21:59:40Z

docs/en/advanced/miles_server_args.md

+
+| Argument | Description | Default | Options | Source |
+| :--- | :--- | :--- | :--- | :--- |
+| `--prompt-data` | Path to the prompt dataset (JSONL format) and each line should contains `--input-key` and `--label-key` which will be used as the prompt and the label respectively. If you want to use a custom template, you can set `--apply-chat-template` to true | `None` | Type: str | Miles Native |


Suggested change

| `--prompt-data` | Path to the prompt dataset (JSONL format) and each line should contains `--input-key` and `--label-key` which will be used as the prompt and the label respectively. If you want to use a custom template, you can set `--apply-chat-template` to true | `None` | Type: str | Miles Native |

| `--prompt-data` | Path to the prompt dataset (JSONL format) and each line should contains `--input-key` and `--label-key` which will be used as the prompt and the label respectively. | `None` | Type: str | Miles Native |

zijiexia · 2026-01-31T22:35:29Z

docs/en/advanced/miles_server_args.md

+| Argument | Description | Default | Options | Source |
+| :--- | :--- | :--- | :--- | :--- |
+| `--prompt-data` | Path to the prompt dataset (JSONL format) and each line should contains `--input-key` and `--label-key` which will be used as the prompt and the label respectively. If you want to use a custom template, you can set `--apply-chat-template` to true | `None` | Type: str | Miles Native |
+| `--disable-rollout-global-dataset` | Disable the global dataset for rollout. If set, the rollout will use the `--prompt-data` as the prompt dataset, and the prompts for rollout will be sampled from the dataset. If not set, you need to manage the data by your self. | `False` | bool flag (set to disable) | Miles Native |


Suggested change

| `--disable-rollout-global-dataset` | Disable the global dataset for rollout. If set, the rollout will use the `--prompt-data` as the prompt dataset, and the prompts for rollout will be sampled from the dataset. If not set, you need to manage the data by your self. | `False` | bool flag (set to disable) | Miles Native |

| `--disable-rollout-global-dataset` | Disable the global dataset for rollout. By default, Miles loads `--prompt-data` into a global dataset and samples from it for rollout. Setting this flag turns off this behavior, Use this flag only when providing a custom `--rollout-function-path` (and usually a custom `--data-source-path`) that handles data loading independently. | `False` | bool flag (set to disable) | Miles Native |

zijiexia · 2026-01-31T22:51:35Z

docs/en/advanced/miles_server_args.md

+| `--lr` | Learning rate for the Actor. | `1e-6` | Type: float | Megatron-LM |
+| `--lr-warmup-init` | Initial learning rate for warmup. | `0.0` | Type: float | Megatron-LM |
+| `--min-lr` | Minimum learning rate after decay. | `0.0` | Type: float | Megatron-LM |
+| `--lr-decay-style` | Learning rate decay style. | `constant`(FSDP), `linear`(Megatron) | Type: str | Megatron-LM |


Hi @Ratish1 , I think most of the arguments I put outside the Megatron/FSDP sections should be able to source back to both backend, that's why I marked both FSDP and Megatron defaults here. Could you help me double check? Thanks!

docs: add Miles server arguments

2b6b7d6

Ratish1 requested review from fzyzcjy and yueming-yuan as code owners January 25, 2026 10:12

gemini-code-assist bot reviewed Jan 25, 2026

View reviewed changes

docs/en/advanced/miles_server_args.md Outdated Show resolved Hide resolved

docs/en/advanced/miles_server_args.md Outdated Show resolved Hide resolved

Ratish1 commented Jan 25, 2026

View reviewed changes

address bot comments

744a629

add more

23ac90a

zijiexia suggested changes Jan 27, 2026

View reviewed changes

Ratish1 added 2 commits January 28, 2026 10:45

upd

caae701

upd

165bacd

Ratish1 requested a review from zijiexia January 28, 2026 05:51

modification on docs

87d4ec3

zijiexia and others added 7 commits January 29, 2026 23:52

Apply suggestions from code review

3da07a0

Co-authored-by: Ratish P <[email protected]>

fix

5b53428

fix

cb0c2ce

fix

c0948e1

Apply suggestions from code review

234ef9c

Co-authored-by: Ratish P <[email protected]>

fix

59a8fe5

Merge pull request #1 from zijiexia/zijie_fix

d04ecac

Improve Miles Server Args docs

zhaochenyang20 added 2 commits January 30, 2026 14:35

Clarify rollout GPU settings in documentation

6c03095

Clarify --true-on-policy-mode parameter description

b7d950f

Updated the description of the --true-on-policy-mode parameter for clarity and added a reference link.

Enhance documentation for server arguments

3ac2589

Updated descriptions for several server arguments to improve clarity and added references for better understanding.

Update descriptions for model-name and metadata-key arguments

aa2d387

Ratish1 added 9 commits January 31, 2026 12:41

remove offload args

ba42e0e

update weights backuper desc

8b98719

update keep old actor desc

43cf1b0

simplify hf chckpt arg

7ef8a0d

move prompt data

46c1d8d

add relationship between rollout len args

a2c84f1

clarify num gpus per node

af161d8

update skip special token for multi-turn RL

34c5c89

upd rollout num and num epoch args

2308195

Ratish1 added 2 commits January 31, 2026 18:17

upd num steps per roll

42a4c4c

upd max tokens and data pad size arg

f4cb7ca

Ratish1 added 3 commits January 31, 2026 20:50

more updates

bbef7e5

more

6515026

more

e878cf9

zijiexia suggested changes Jan 31, 2026

View reviewed changes

	\| `--balance-data` \| Balance the number of tokens between data parallel ranks with `karmarkar_karp` for verl. Note that this may allocate the different response of the same prompt into different training steps. \| `False` \| Type: bool \| Megatron-LM \|
	\| `--balance-data` \| Repartition each rollout batch so each data-parallel rank gets a similar total token count via Karmarkar-Karp method. It may be beneficial for training speed but changes per-rank sample grouping and adds a small CPU scheduling overhead. \| `False` \| Type: bool \| Miles Native \|

	\| `--prompt-data` \| Path to the prompt dataset (JSONL format) and each line should contains `--input-key` and `--label-key` which will be used as the prompt and the label respectively. If you want to use a custom template, you can set `--apply-chat-template` to true \| `None` \| Type: str \| Miles Native \|
	\| `--prompt-data` \| Path to the prompt dataset (JSONL format) and each line should contains `--input-key` and `--label-key` which will be used as the prompt and the label respectively. \| `None` \| Type: str \| Miles Native \|

docs: add Miles server arguments #517

Are you sure you want to change the base?

docs: add Miles server arguments #517

Conversation

Ratish1 commented Jan 25, 2026

Uh oh!

gemini-code-assist bot commented Jan 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zijiexia commented Jan 27, 2026

Uh oh!

Ratish1 commented Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zijiexia commented Jan 30, 2026

Uh oh!

zhaochenyang20 commented Jan 30, 2026

Uh oh!

zhaochenyang20 commented Jan 30, 2026

Uh oh!

zhaochenyang20 commented Jan 30, 2026

Uh oh!

zhaochenyang20 commented Jan 30, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026

Uh oh!

zhaochenyang20 commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ratish1 commented Jan 31, 2026

Uh oh!

Ratish1 commented Jan 31, 2026

Uh oh!

zijiexia commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

zhaochenyang20 commented Jan 31, 2026 •

edited

Loading

zijiexia commented Jan 31, 2026 •

edited

Loading