-
Notifications
You must be signed in to change notification settings - Fork 154
Fix incorrect prompt tokens count due to HF api update #1264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,7 +13,35 @@ | |
| # limitations under the License. | ||
|
|
||
|
|
||
| from nemo_skills.prompt.utils import get_prompt | ||
| from transformers import AutoTokenizer | ||
|
|
||
| from nemo_skills.prompt.utils import get_prompt, get_token_count | ||
|
|
||
|
|
||
| def test_get_token_count(): | ||
| tokenizer = AutoTokenizer.from_pretrained("nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", trust_remote_code=True) | ||
| messages = [{"role": "user", "content": "hello"}] | ||
|
|
||
| tools = [ | ||
| { | ||
| "type": "function", | ||
| "function": { | ||
| "name": "get_weather", | ||
| "description": "Get the weather", | ||
| "parameters": { | ||
| "type": "object", | ||
| "properties": {"location": {"type": "string"}}, | ||
| "required": ["location"], | ||
| }, | ||
| }, | ||
| } | ||
| ] | ||
|
|
||
| assert get_token_count(tokenizer, "hello") == 1 | ||
| assert get_token_count(tokenizer, messages) == 17 | ||
| assert get_token_count(tokenizer, messages, tools=tools) == 266 | ||
| assert get_token_count(None, "hello") is None | ||
| assert get_token_count(tokenizer, None) is None | ||
|
Comment on lines
+21
to
+44
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hard-coded token counts and a network-bound tokenizer make this test fragile. Two concerns:
The core intent — verifying that the dict-return path produces the correct token count rather than 🤖 Prompt for AI Agents |
||
|
|
||
|
|
||
| def test_generic_math_problem_augmentation_prompt(): | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except Exceptionmasks the newKeyErrorpath; addraise … from e.After the new lines 400-401, if
resultis neither alistnor a dict-like object that containsinput_ids, the resultingKeyErrorfalls into the catch-all handler at line 404 and surfaces as"Invalid chat message format: 'input_ids'"— a misleading message that hides the real cause. More broadly, the pre-existingexcept Exceptionblock also violates the project guideline to let code fail with clear errors instead of silently misbehaving.At minimum, preserve the exception chain with
from e(static analysis B904) so the original traceback is not swallowed:🔗 Proposed fix: preserve exception chain
Ideally, drop the wrapper entirely and let
apply_chat_template(andresult["input_ids"]) fail with their own clear errors, in line with the guideline to avoid silently misbehaving.📝 Committable suggestion
🧰 Tools
🪛 Ruff (0.15.1)
[warning] 404-404: Do not catch blind exception:
Exception(BLE001)
[warning] 405-405: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
[warning] 405-405: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents