-
Notifications
You must be signed in to change notification settings - Fork 94
docs: add Miles server arguments #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @Ratish1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the documentation for the Miles server by introducing a dedicated guide that meticulously lists and explains all command-line arguments. This new resource provides users with an exhaustive reference for fine-tuning Miles' behavior, covering everything from hardware resource allocation and training backend selection to advanced reinforcement learning parameters and debugging options, ultimately streamlining the configuration process and improving the overall user experience. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces comprehensive documentation for the Miles server arguments in a new markdown file. The documentation is well-organized and detailed. My review focuses on improving clarity and consistency in the argument descriptions. I've pointed out a couple of minor inconsistencies that could confuse users. Overall, this is a valuable addition to the project.
| help=( | ||
| "Log statistics of the category of reward, such as why the reward function considers it as failed. " | ||
| "Specify the key in the reward dict using this argument.", | ||
| "Specify the key in the reward dict using this argument." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed so that the command python3 train.py --help would work or else it would return an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is merged through another PR now.
|
Hi @Ratish1 does it makes sense to add the arguments for checkpointings? Also, I think there're some arguments not covered in this PR, e.g., FSDP: --deterministic-mode. It might make sense to provide a comprehensive overview of all arguments |
Hey @zijiexia , I have added even more server arguments, could you let me know if it looks good now?. Thanks |
Co-authored-by: Ratish P <[email protected]>
Co-authored-by: Ratish P <[email protected]>
Improve Miles Server Args docs
This do not affect disaggregated mode? |
Updated the description of the --true-on-policy-mode parameter for clarity and added a reference link.
|
Disable weights backuper to save host memory. By default, this feature is enabled. Please explain in one or two lines what is weights-backuper and it's trade off. |
Updated descriptions for several server arguments to improve clarity and added references for better understanding.
What do you mean by manage the data by yourself? Please make it rather clearer. |
|
Did you guys checked the consistency with https://github.com/radixark/miles/blob/main/docs/en/get_started/quick_start.md? If not, please check it 😂 |
|
two more suggestions: Based on our discussion, here are the documentation optimization requirements for the
|
| | `--use-dynamic-batch-size` | Dynamically packs variable-length samples into micro-batches to maximize GPU utilization, ensuring the total token count per batch does not exceed `--max-tokens-per-gpu`. For example, with a 300-token limit, samples of lengths 100, 200, and 300 would be packed into two batches: `[100, 200]` and `[300]`. **Note:** Miles ensures that enabling this optimization does not affect the mathematical correctness of per-sample or per-token loss calculation. It is **strongly recommended** to enable this for maximum efficiency. | `False` | bool flag (set to enable) | Miles Native | | ||
| | `--max-tokens-per-gpu` | The maximum number of tokens (Prompt + Response combined) per GPU for dynamic batch size. This parameter defines the total sequence length budget for packing samples into micro-batches during training. Note that when enabling context parallel (CP), the effective capacity is shared, so the value should be approximately `(Total_Sequence_Length) // cp_size`. | `None` | Type: int | Miles Native | | ||
| | `--log-probs-max-tokens-per-gpu` | The maximum number of tokens per GPU for calculating log probs. This is used to calculate the log probs of the responses during rollout, and should be set to a larger value than `max_tokens_per_gpu` if you want better performance. | `None` | Type: int | Miles Native | | ||
| | `--balance-data` | Balance the number of tokens between data parallel ranks with `karmarkar_karp` for verl. Note that this may allocate the different response of the same prompt into different training steps. | `False` | Type: bool | Megatron-LM | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | `--balance-data` | Balance the number of tokens between data parallel ranks with `karmarkar_karp` for verl. Note that this may allocate the different response of the same prompt into different training steps. | `False` | Type: bool | Megatron-LM | | |
| | `--balance-data` | Repartition each rollout batch so each data-parallel rank gets a similar total token count via Karmarkar-Karp method. It may be beneficial for training speed but changes per-rank sample grouping and adds a small CPU scheduling overhead. | `False` | Type: bool | Miles Native | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Ratish1 , can you also change the help in arguments.py accordingly?
| | `--true-on-policy-mode` | Strictly align SGLang's log probs and training engine's log probs to bit-wise equal. This parameter is only used for FSDP right now. [Ref](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/mismatch/blog-en.md#truly-on-policy-training) | `False` | bool flag (set to enable) | Miles Native | | ||
| | `--train-env-vars` | Extra environment variables for training process, e.g., PyTorch memory management ones. | `{}` | Type: JSON / Dict | Miles Native | | ||
| | `--train-memory-margin-bytes` | Reserved memory margin for training in bytes. Defaults to 1GB. | `1073741824` | Type: int | Miles Native | | ||
| | `--disable-weights-backuper` | Disables the system that backups model weights (Actor, Ref, Old Actor) to CPU RAM. Disabling saves significant host memory but prevents weight-swapping features like KL-divergence. | `False` | bool flag (set to disable) | Miles Native | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | `--disable-weights-backuper` | Disables the system that backups model weights (Actor, Ref, Old Actor) to CPU RAM. Disabling saves significant host memory but prevents weight-swapping features like KL-divergence. | `False` | bool flag (set to disable) | Miles Native | | |
| | `--disable-weights-backuper` | Applies to `megatron` training backend only. Disables the system that backups model weights (Actor, Ref, Old Actor) to CPU RAM. Disabling saves significant host memory but prevents features that rely on weight-swapping, such as computing KL-divergence against a reference model. **Note**: do not set `--ref-load` and `--keep-old-actor` if disable weights backuper. | `False` | bool flag (set to disable) | Miles Native | |
|
|
||
| | Argument | Description | Default | Options | Source | | ||
| | :--- | :--- | :--- | :--- | :--- | | ||
| | `--prompt-data` | Path to the prompt dataset (JSONL format) and each line should contains `--input-key` and `--label-key` which will be used as the prompt and the label respectively. If you want to use a custom template, you can set `--apply-chat-template` to true | `None` | Type: str | Miles Native | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | `--prompt-data` | Path to the prompt dataset (JSONL format) and each line should contains `--input-key` and `--label-key` which will be used as the prompt and the label respectively. If you want to use a custom template, you can set `--apply-chat-template` to true | `None` | Type: str | Miles Native | | |
| | `--prompt-data` | Path to the prompt dataset (JSONL format) and each line should contains `--input-key` and `--label-key` which will be used as the prompt and the label respectively. | `None` | Type: str | Miles Native | |
| | Argument | Description | Default | Options | Source | | ||
| | :--- | :--- | :--- | :--- | :--- | | ||
| | `--prompt-data` | Path to the prompt dataset (JSONL format) and each line should contains `--input-key` and `--label-key` which will be used as the prompt and the label respectively. If you want to use a custom template, you can set `--apply-chat-template` to true | `None` | Type: str | Miles Native | | ||
| | `--disable-rollout-global-dataset` | Disable the global dataset for rollout. If set, the rollout will use the `--prompt-data` as the prompt dataset, and the prompts for rollout will be sampled from the dataset. If not set, you need to manage the data by your self. | `False` | bool flag (set to disable) | Miles Native | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | `--disable-rollout-global-dataset` | Disable the global dataset for rollout. If set, the rollout will use the `--prompt-data` as the prompt dataset, and the prompts for rollout will be sampled from the dataset. If not set, you need to manage the data by your self. | `False` | bool flag (set to disable) | Miles Native | | |
| | `--disable-rollout-global-dataset` | Disable the global dataset for rollout. By default, Miles loads `--prompt-data` into a global dataset and samples from it for rollout. Setting this flag turns off this behavior, Use this flag only when providing a custom `--rollout-function-path` (and usually a custom `--data-source-path`) that handles data loading independently. | `False` | bool flag (set to disable) | Miles Native | |
| | `--lr` | Learning rate for the Actor. | `1e-6` | Type: float | Megatron-LM | | ||
| | `--lr-warmup-init` | Initial learning rate for warmup. | `0.0` | Type: float | Megatron-LM | | ||
| | `--min-lr` | Minimum learning rate after decay. | `0.0` | Type: float | Megatron-LM | | ||
| | `--lr-decay-style` | Learning rate decay style. | `constant`(FSDP), `linear`(Megatron) | Type: str | Megatron-LM | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Ratish1 , I think most of the arguments I put outside the Megatron/FSDP sections should be able to source back to both backend, that's why I marked both FSDP and Megatron defaults here. Could you help me double check? Thanks!













This PR adds complete docs for Miles server arguments.