-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Address VDR feedback for NeMo FW evaluations #13701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Abhishree <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed Tech Pubs review of docs/source/evaluation/evaluation-doc.rst and provided a few copyedits, formatting updates, and suggested text revisions.
@@ -51,15 +51,18 @@ facilitates easier debugging. However, for running evaluations on clusters, it i | |||
ease of use. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a list of line-by-line edits to read-only text. these edits should also be addressed.
Lines 4/4 to 13/13
Please fix bullet list syntax (I think it is causing the following paragraphs to be bold). See the reStructuredText Guide here: https://aschilling.gitlab-master-pages.nvidia.com/documentation/repo_docs/latest/rst-guide.html. Replace asterisk with hyphen, delete leading indent, and separate sentence from bulleted list with a blank line.
This guide provides detailed instructions on evaluating NeMo 2.0 checkpoints using the NVIDIA Evals Factory <https://pypi.org/project/nvidia-lm-eval/>
__ within the NeMo Framework. Supported benchmarks include:
- GPQA
- GSM8K
- IFEval
- MGSM
- MMLU
- MMLU-Pro
- MMLU-Redux
- Wikilingua
Line 29/29 - 34/34
Same comment about bulleted list
The NVIDIA Evals Factory provides the following predefined configurations for evaluating the completions endpoint:
gsm8k
mgsm
mmlu
mmlu_pro
mmlu_redux
Line 36/36 - Line 44/44
same comment about bulleted list
It also provides the following configurations for evaluating the chat endpoint:
gpqa_diamond_cot
gsm8k_cot_instruct
ifeval
mgsm_cot
mmlu_instruct
mmlu_pro_instruct
mmlu_redux_instruct
wikilingua
Line 67/70 - 70/73
revise sentence (no "killed")
The entrypoint for evaluation is the `evaluatemethod defined in
nemo/collections/llm/api.py``. To run evaluations on the deployed model, use the following command. Make sure to open a new terminal within the same container to execute it. For longer evaluations, it is advisable to run both the deploy and evaluate commands in tmux sessions to prevent the processes from being terminated unexpectedly and aborting the runs.
Line 86/89
revise note
.. note::
Please refer to the deploy
and evaluate
methods in nemo/collections/llm/api.py
to review all available argument options, as the provided commands are only examples and do not include all arguments or their default values. For more detailed information on the arguments used in the ApiEndpoint
and ConfigParams
classes for evaluation, see the source code at nemo/collections/llm/evaluation/api.py <https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/evaluation/api.py>
__.
Line 96/99
Make sure the # characters match the exact length of the heading text.
Line 98/101 - 99/102
remove period
The evaluation.py <https://github.com/NVIDIA/NeMo/blob/main/scripts/llm/evaluation.py>
__ script serves as a reference for launching evaluations with NeMo-Run.
Line 120/123
Make sure the # characters match the exact length of the heading text.
Line 137/140
Make sure the - characters match the exact length of the heading text.
Line 164/167 - Line 168/178
revise sentence (no "killed")
The evaluate
method defined in nemo/collections/llm/api.py
supports the legacy way of evaluating models. To run evaluations on the deployed model, use the following command. Make sure to pass the nemo_checkpoint_path and url parameters, as they are required for using the legacy evaluation code. Open a new terminal within the same container to execute it. For longer evaluations, it is advisable to run both the deploy and evaluate commands in tmux sessions to prevent the processes from being interrupted or terminated unexpectedly, which could cause the runs to abort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgerh could you please review again, I have addressed all your comments. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Co-authored-by: jgerh <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]>
Signed-off-by: Abhishree <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed tech pubs review of latest copyedits and approved.
Fast merging since this is a docs change only |
Important
The
Update branch
button must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Adds details to evaluation docs addressing the VDR feedback for evaluations in NeMo FW.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information