Skip to content

Comments

Fix incorrect prompt tokens count due to HF api update#1264

Merged
Kipok merged 5 commits intomainfrom
igitman/fix-prompt-count
Feb 20, 2026
Merged

Fix incorrect prompt tokens count due to HF api update#1264
Kipok merged 5 commits intomainfrom
igitman/fix-prompt-count

Conversation

@Kipok
Copy link
Collaborator

@Kipok Kipok commented Feb 20, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Restored compatibility with newer HuggingFace tokenizer outputs so prompt handling, inference prefix measurement, and token counting work correctly across varied tokenizer return formats.
  • Tests

    • Added tests using a real tokenizer to validate token counting for strings, message lists, optional tool data, and null inputs.

Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
@Kipok Kipok requested review from ekmb and jiacheng-xu February 20, 2026 20:34
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 20, 2026

📝 Walkthrough

Walkthrough

Updates three modules and tests to handle HuggingFace tokenizer versions that may return a dict (BatchEncoding) from apply_chat_template; code now extracts input_ids from dicts while retaining support for list returns. Adds tests for token counting with a real tokenizer.

Changes

Cohort / File(s) Summary
Tokenizer Compatibility
nemo_skills/inference/model/utils.py, nemo_skills/inference/prover.py, nemo_skills/prompt/utils.py
Add conditional extraction of input_ids when apply_chat_template returns a dict/BatchEncoding instead of a list; preserves backward compatibility with list returns for encoding and token counting.
Token Count Tests
tests/test_prompts.py
Add test_get_token_count using AutoTokenizer and get_token_count, covering strings, message lists, tools schema, and None inputs; adjust imports accordingly.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing token count issues caused by HuggingFace tokenizer API updates across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch igitman/fix-prompt-count

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
nemo_skills/prompt/utils.py (1)

404-405: Broad except Exception now silences KeyError from the newly-added result["input_ids"] access.

Because lines 400–402 are inside the try block, a KeyError (e.g., the tokenizer returns a dict without an "input_ids" key) is caught here and re-raised as ValueError("Invalid chat message format: 'input_ids'") — a misleading message that hides the real failure.

Additionally, the missing from e in the re-raise discards the original traceback.

Recommended fix: move the dict-normalization step outside the try so only apply_chat_template errors are wrapped, and add from e for proper exception chaining.

♻️ Proposed refactor
     elif isinstance(messages, list):
         messages = [
             message if isinstance(message, dict) else message_to_dict(copy.deepcopy(message)) for message in messages
         ]
         try:
             result = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, tools=tools)
-            # Handle newer HF tokenizer versions that return a dict instead of a list
-            if isinstance(result, dict):
-                result = result["input_ids"]
-            return len(result)
-
         except Exception as e:
-            raise ValueError(f"Invalid chat message format: {e}")
+            raise ValueError(f"Invalid chat message format: {e}") from e
+        # Handle newer HF tokenizer versions that return a dict instead of a list
+        if isinstance(result, dict):
+            result = result["input_ids"]
+        return len(result)

As per coding guidelines: "Do not catch exceptions when they are not normally expected to be raised; let code fail with clear errors instead of silently misbehaving" and "Follow the Zen of Python principles: ensure errors never pass silently unless explicitly silenced."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/prompt/utils.py` around lines 404 - 405, The current broad except
in the function around apply_chat_template is catching KeyError from the new
dict-normalization (accessing result["input_ids"]) and re-raising a misleading
ValueError without chaining; fix by moving the dict normalization/validation
(the access of result["input_ids"] and any token dict checks) outside the
try/except so only apply_chat_template() is wrapped, change the except to catch
only expected exceptions from apply_chat_template (e.g., ValueError or
TemplateError if one exists), and when re-raising include exception chaining
(use "from e") so the original traceback is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_prompts.py`:
- Around line 21-44: The test uses network-bound AutoTokenizer.from_pretrained
and brittle hard-coded token counts; instead, remove exact numeric assertions
and either (a) mock AutoTokenizer.from_pretrained (or inject a dummy tokenizer)
so the test does not hit the network and assert a sanity bound (e.g., token
count > 10) for get_token_count(tokenizer, messages) and
get_token_count(tokenizer, messages, tools), or (b) mock apply_chat_template to
return a deterministic dict and assert get_token_count correctly computes length
from that dict (and assert get_token_count(None, ...) remains None); focus fixes
around get_token_count, AutoTokenizer.from_pretrained, and apply_chat_template
to eliminate network calls and fragile exact-count checks.

---

Nitpick comments:
In `@nemo_skills/prompt/utils.py`:
- Around line 404-405: The current broad except in the function around
apply_chat_template is catching KeyError from the new dict-normalization
(accessing result["input_ids"]) and re-raising a misleading ValueError without
chaining; fix by moving the dict normalization/validation (the access of
result["input_ids"] and any token dict checks) outside the try/except so only
apply_chat_template() is wrapped, change the except to catch only expected
exceptions from apply_chat_template (e.g., ValueError or TemplateError if one
exists), and when re-raising include exception chaining (use "from e") so the
original traceback is preserved.

Comment on lines +21 to +44
def test_get_token_count():
tokenizer = AutoTokenizer.from_pretrained("nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", trust_remote_code=True)
messages = [{"role": "user", "content": "hello"}]

tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the weather",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
},
}
]

assert get_token_count(tokenizer, "hello") == 1
assert get_token_count(tokenizer, messages) == 17
assert get_token_count(tokenizer, messages, tools=tools) == 266
assert get_token_count(None, "hello") is None
assert get_token_count(tokenizer, None) is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hard-coded token counts and a network-bound tokenizer make this test fragile.

Two concerns:

  1. Hard-coded expected values (== 1, == 17, == 266) are tied to the exact current state of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16's chat template and vocabulary. Any tokenizer patch from the model owner will silently break these assertions, requiring manual inspection to distinguish a real regression from a tokenizer update.

  2. Network dependency. AutoTokenizer.from_pretrained(...) downloads the tokenizer files from HuggingFace Hub at test time. This makes the test slow and may fail in air-gapped or rate-limited CI environments. The existing prompt tests (e.g., test_generic_math_prompt) avoid this concern by also downloading models but at least they test deterministic string rendering rather than numeric token counts that can drift.

The core intent — verifying that the dict-return path produces the correct token count rather than len(dict) — is sound. Consider asserting count > 10 (sanity-bound) instead of an exact value, or mock apply_chat_template to explicitly return a dict and assert the extracted length is correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_prompts.py` around lines 21 - 44, The test uses network-bound
AutoTokenizer.from_pretrained and brittle hard-coded token counts; instead,
remove exact numeric assertions and either (a) mock
AutoTokenizer.from_pretrained (or inject a dummy tokenizer) so the test does not
hit the network and assert a sanity bound (e.g., token count > 10) for
get_token_count(tokenizer, messages) and get_token_count(tokenizer, messages,
tools), or (b) mock apply_chat_template to return a deterministic dict and
assert get_token_count correctly computes length from that dict (and assert
get_token_count(None, ...) remains None); focus fixes around get_token_count,
AutoTokenizer.from_pretrained, and apply_chat_template to eliminate network
calls and fragile exact-count checks.

Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/prompt/utils.py`:
- Around line 398-405: The current try/except around
tokenizer.apply_chat_template(...) swallows and masks underlying errors (e.g.
KeyError from accessing result["input_ids"]); either remove the outer try/except
so underlying exceptions surface with their original tracebacks, or if you must
keep the wrapper, re-raise while preserving the exception chain by using raise
ValueError(f"Invalid chat message format: {e}") from e; update the handler that
references result["input_ids"] and the call to tokenizer.apply_chat_template to
ensure errors are not hidden.

Comment on lines +398 to 405
result = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, tools=tools)
# Handle newer HF tokenizer versions that return a BatchEncoding instead of a list
if not isinstance(result, list):
result = result["input_ids"]
return len(result)

except Exception as e:
raise ValueError(f"Invalid chat message format: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

except Exception masks the new KeyError path; add raise … from e.

After the new lines 400-401, if result is neither a list nor a dict-like object that contains input_ids, the resulting KeyError falls into the catch-all handler at line 404 and surfaces as "Invalid chat message format: 'input_ids'" — a misleading message that hides the real cause. More broadly, the pre-existing except Exception block also violates the project guideline to let code fail with clear errors instead of silently misbehaving.

At minimum, preserve the exception chain with from e (static analysis B904) so the original traceback is not swallowed:

🔗 Proposed fix: preserve exception chain
         except Exception as e:
-            raise ValueError(f"Invalid chat message format: {e}")
+            raise ValueError(f"Invalid chat message format: {e}") from e

Ideally, drop the wrapper entirely and let apply_chat_template (and result["input_ids"]) fail with their own clear errors, in line with the guideline to avoid silently misbehaving.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
result = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, tools=tools)
# Handle newer HF tokenizer versions that return a BatchEncoding instead of a list
if not isinstance(result, list):
result = result["input_ids"]
return len(result)
except Exception as e:
raise ValueError(f"Invalid chat message format: {e}")
result = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, tools=tools)
# Handle newer HF tokenizer versions that return a BatchEncoding instead of a list
if not isinstance(result, list):
result = result["input_ids"]
return len(result)
except Exception as e:
raise ValueError(f"Invalid chat message format: {e}") from e
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 404-404: Do not catch blind exception: Exception

(BLE001)


[warning] 405-405: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


[warning] 405-405: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/prompt/utils.py` around lines 398 - 405, The current try/except
around tokenizer.apply_chat_template(...) swallows and masks underlying errors
(e.g. KeyError from accessing result["input_ids"]); either remove the outer
try/except so underlying exceptions surface with their original tracebacks, or
if you must keep the wrapper, re-raise while preserving the exception chain by
using raise ValueError(f"Invalid chat message format: {e}") from e; update the
handler that references result["input_ids"] and the call to
tokenizer.apply_chat_template to ensure errors are not hidden.

@Kipok Kipok merged commit 1f1a2e7 into main Feb 20, 2026
5 checks passed
@Kipok Kipok deleted the igitman/fix-prompt-count branch February 20, 2026 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants