Update promt_config to working with openai format + inline setup#1210
Update promt_config to working with openai format + inline setup#1210
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughGeneration prompt handling updated: GenerationTaskConfig accepts Changes
Sequence Diagram(s)sequenceDiagram
participant Data as DataPoint
participant Gen as GenerationTask
participant Prompt as PromptUtils
participant Model as Model
Data->>Gen: request generation with data
Gen->>Gen: validate prompt_format & prompt_config
alt template-based (prompt_config provided)
Gen->>Prompt: get_prompt(prompt_config, user_message, system_message)
Prompt-->>Gen: template-generated messages
Gen->>Gen: _merge_audio_from_data(data -> generated messages)
else data-driven (prompt_config is None, OpenAI-style)
Gen->>Prompt: build prompt from data + user_message
Prompt-->>Gen: messages derived from data
end
Gen->>Gen: apply prompt_suffix (if present)
Gen->>Model: send assembled messages
Model-->>Gen: response
Gen-->>Data: attach outputs and metadata
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@nemo_skills/inference/generate.py`:
- Around line 614-620: The loop that injects audio into user messages currently
uses msg.get("role") which can silently ignore malformed messages; change it to
use direct key access (msg["role"]) so missing "role" keys raise an exception
and fail fast; update the block in nemo_skills/inference/generate.py where
variables prompt, msg, and data_point are used (the for msg in prompt loop that
checks role and assigns msg["audio"] / msg["audios"]) to replace msg.get("role")
with msg["role"] and leave the rest of the injection logic unchanged.
- Around line 621-631: The code returns a raw string from self.cfg.new_prompt
which will break chat-style OpenAI payloads when prompt_format == "openai";
update the logic in the block handling string overrides (where prompt is
assigned from self.cfg.new_prompt and audio/audios are added from data_point) to
detect if self.cfg.prompt_format == "openai" and, in that case, wrap the string
into a chat message list like [{"role":"user","content": self.cfg.new_prompt}]
while preserving any "audio"/"audios" fields on that message (or alternatively
raise a clear error if wrapping is not desired); ensure the change touches the
branch that currently sets prompt = self.cfg.new_prompt so chat endpoints always
receive the proper list structure.
🧹 Nitpick comments (1)
tests/test_prompts.py (1)
1174-1222: Consider exercising the production YAML-load path to avoid drift.The test re-implements the loading logic; a minimal
GenerationTaskinit (with heavy pieces mocked) would keep this tied to real behavior.
ca4a297 to
4a183e2
Compare
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
4a183e2 to
9f49f58
Compare
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Kipok
left a comment
There was a problem hiding this comment.
I'm confused about the motivation. Why can't we use an existing prompt_config for this?
|
@Jorjeous does this PR cover cases with /no_think tag in the prompt that we had before? |
|
@Kipok That's for openai promt format, for some reason promt config is disabled for it |
|
@melllinia it fully replaces existing promt so it's kinda covered |
|
@Kipok I see it's better to refactor promt_config rather than creating new field. |
ability to override user_message promnt config works with openai format inline promt override Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
nemo_skills/inference/generate.py
Outdated
| for msg in data_point["messages"]: | ||
| if msg["role"] == "user": | ||
| msg["content"] = self.cfg.user_message | ||
| break |
There was a problem hiding this comment.
modifying data_point["messages"] in place without deepcopy causes mutation of original data
the openai path without prompt_config (line 608-622) modifies data_point["messages"] directly, but doesn't use deepcopy like the other paths do (lines 626, 649). This means the original data gets mutated, which could cause issues if data_point is reused elsewhere.
| for msg in data_point["messages"]: | |
| if msg["role"] == "user": | |
| msg["content"] = self.cfg.user_message | |
| break | |
| if self.cfg.prompt_format == "openai" and self.prompt is None: | |
| # Pure openai path -- messages come from the data | |
| data_point = deepcopy(data_point) | |
| if self.cfg.user_message: |
copy data from messages added assert single user message Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
23d0330 to
6bad619
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nemo_skills/inference/generate.py (1)
407-422:⚠️ Potential issue | 🟡 Minor
setup_prompt: Missing validation forcode_executionwithprompt_format="openai".When
prompt_format="openai"andprompt_configis omitted,setup_promptreturnsNone. The code then callsself.prompt.get_code_execution_args()at line 370 ifcode_execution=True, raisingAttributeError: 'NoneType' object has no attribute 'get_code_execution_args'.This combination is not prevented by validation (unlike
prompt_format="ns"which requiresprompt_config). Add explicit validation in_post_init_validate_paramsto preventcode_execution=Truewhen usingprompt_format="openai"withoutprompt_config, or guard the call at line 370.
🤖 Fix all issues with AI agents
In `@nemo_skills/inference/generate.py`:
- Around line 255-256: BFCLGenerationConfig._post_init_validate_params currently
asserts that prompt_config must be None when prompt_format == "openai"; remove
or relax that assertion so BFCLGenerationConfig accepts an optional
prompt_config with prompt_format "openai" (matching the parent class behavior).
Update the validation in BFCLGenerationConfig._post_init_validate_params to
allow prompt_config to be non-None for prompt_format == "openai" (or only
validate conflicting combinations still unsupported), ensuring tests like
test_prompt_config_with_openai_format and
test_prompt_config_with_openai_and_suffix pass.
🧹 Nitpick comments (2)
nemo_skills/inference/generate.py (1)
584-598: Positional index matching for audio merge is fragile when message counts diverge.
_merge_audio_from_dataaligns template-generated messages with original data messages by list index. If the template emits a different number of messages (e.g., adds/removes a system message), audio metadata will be attached to the wrong message or silently dropped wheni >= len(original_messages).Consider matching by role instead of by position to make this more robust:
♻️ Proposed role-based matching alternative
def _merge_audio_from_data(self, messages, data_point): - """Copy audio metadata from original data messages into template-generated messages. - - Normalizes to a single "audios" list on each message. Openai-format data always - has messages; top-level audio is not used in practice. - """ + """Copy audio metadata from original data messages into template-generated messages.""" if "messages" not in data_point or not isinstance(data_point["messages"], (list, ListConfig)): return original_messages = data_point["messages"] - for i, msg in enumerate(messages): - if i < len(original_messages): - orig_msg = original_messages[i] - audios = orig_msg.get("audios") or ([orig_msg["audio"]] if "audio" in orig_msg else None) - if audios: - msg["audios"] = audios + # Build a mapping from role to audio metadata (first match per role) + role_audio = {} + for orig_msg in original_messages: + audios = orig_msg.get("audios") or ([orig_msg["audio"]] if "audio" in orig_msg else None) + if audios and orig_msg["role"] not in role_audio: + role_audio[orig_msg["role"]] = audios + for msg in messages: + if msg["role"] in role_audio: + msg["audios"] = role_audio[msg["role"]]tests/test_prompts.py (1)
1173-1195: Consider adding a test where message counts differ between template output and original data.This would validate (or document) the behavior of
_merge_audio_from_datawhen the template produces fewer or more messages than the original data — e.g., template produces only[user]while original has[system, user_with_audio].
nemo_skills/inference/generate.py
Outdated
| if self.prompt_format == "ns": | ||
| assert self.prompt_config is not None, "prompt_config is required when prompt_format == 'ns'" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for any validation or conditional logic that checks prompt_config alongside prompt_format == "openai"
rg -n --type=py -C3 'prompt_config.*openai|openai.*prompt_config'Repository: NVIDIA-NeMo/Skills
Length of output: 5175
🏁 Script executed:
#!/bin/bash
# Check if bfcl.py is a key downstream caller and look for any configs that use it
rg -n --type=py 'class.*BFCL|from.*bfcl|import.*bfcl' | head -20Repository: NVIDIA-NeMo/Skills
Length of output: 2283
🏁 Script executed:
#!/bin/bash
# Look for any benchmark or eval configs that might pass prompt_config with openai format
fd -e yaml -e json | xargs rg -l 'prompt_format.*openai|prompt_config' 2>/dev/null | head -10Repository: NVIDIA-NeMo/Skills
Length of output: 643
🏁 Script executed:
#!/bin/bash
# Check the actual GenerationTask class to see if it's inherited by BFCL
rg -n --type=py 'class BFCL' -A 3Repository: NVIDIA-NeMo/Skills
Length of output: 1493
🏁 Script executed:
#!/bin/bash
# View the exact validation code in bfcl.py
sed -n '100,115p' nemo_skills/inference/eval/bfcl.pyRepository: NVIDIA-NeMo/Skills
Length of output: 869
Update BFCLGenerationConfig validation in nemo_skills/inference/eval/bfcl.py (lines 106-107).
The assertion assert self.prompt_config is None for prompt_format == "openai" conflicts with the parent class change allowing optional prompt_config with OpenAI format. Tests confirm this combination is now supported (test_prompt_config_with_openai_format, test_prompt_config_with_openai_and_suffix). Remove or update this validation in BFCLGenerationConfig._post_init_validate_params() to align with the parent class behavior.
🤖 Prompt for AI Agents
In `@nemo_skills/inference/generate.py` around lines 255 - 256,
BFCLGenerationConfig._post_init_validate_params currently asserts that
prompt_config must be None when prompt_format == "openai"; remove or relax that
assertion so BFCLGenerationConfig accepts an optional prompt_config with
prompt_format "openai" (matching the parent class behavior). Update the
validation in BFCLGenerationConfig._post_init_validate_params to allow
prompt_config to be non-None for prompt_format == "openai" (or only validate
conflicting combinations still unsupported), ensuring tests like
test_prompt_config_with_openai_format and
test_prompt_config_with_openai_and_suffix pass.
Kipok
left a comment
There was a problem hiding this comment.
a few small comments, but hopefully can merge this soon!
nemo_skills/inference/generate.py
Outdated
|
|
||
| return remaining_data | ||
|
|
||
| def _merge_audio_from_data(self, messages, data_point): |
There was a problem hiding this comment.
can we do this automatically inside prompt/utils? If this is called after prompt.fill anyway, maybe we can do it inside, so that anywhere prompt.fill is used, this functionality is available?
nemo_skills/inference/generate.py
Outdated
| return | ||
| original_messages = data_point["messages"] | ||
| for i, msg in enumerate(messages): | ||
| if i < len(original_messages): |
There was a problem hiding this comment.
shouldn't we just assert that len(messages) == len(original_messages)? Maybe we can also rename "messages" to "template_filled_messages" or something like this for clarity?
There was a problem hiding this comment.
Isn't this too stict?
i mean template can change the len ?
Renaming - ok will be done
| if isinstance(filled_prompt, list) and filled_prompt: | ||
| filled_prompt[-1]["content"] += self.cfg.prompt_suffix | ||
| else: | ||
| elif isinstance(filled_prompt, str): |
There was a problem hiding this comment.
what's the other possible case? Why do we need elif?
There was a problem hiding this comment.
no other case, but that's "no silent approach"
Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nemo_skills/inference/generate.py (1)
369-370:⚠️ Potential issue | 🔴 Critical
AttributeErrorat startup whencode_execution=Trueandprompt_config=None.
setup_prompt()now returnsNonewhenprompt_config is None, but line 370 unconditionally dereferencesself.prompt. With the new openai data-driven path (prompt_format="openai",prompt_config=None), settingcode_execution=Truecrashes immediately in__init__withAttributeError: 'NoneType' object has no attribute 'get_code_execution_args'. A guard is needed in_post_init_validate_params.🛡️ Proposed fix
def _post_init_validate_params(self): """Validate that certain parameters are restricted to certain values""" if self.prompt_format not in ["ns", "openai"]: raise ValueError(f"prompt_format must be either 'ns' or 'openai', got '{self.prompt_format}'") if self.prompt_format == "ns": assert self.prompt_config is not None, "prompt_config is required when prompt_format == 'ns'" + + if self.code_execution and self.prompt_config is None: + raise ValueError("prompt_config is required when code_execution is True") + for param, default_value in self._get_disallowed_params():🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/generate.py` around lines 369 - 370, The startup crash occurs because setup_prompt can return None and code attempts to call self.prompt.get_code_execution_args() when self.cfg.code_execution is True; update _post_init_validate_params (or the block containing the shown lines) to guard against self.prompt being None: only call self.prompt.get_code_execution_args() and assign self.extra_generate_params when both self.cfg.code_execution is True and self.prompt is not None (otherwise leave extra_generate_params unset or default). Ensure you reference the symbols self.cfg.code_execution, self.prompt, get_code_execution_args, and self.extra_generate_params when making the change.
🧹 Nitpick comments (1)
nemo_skills/inference/generate.py (1)
601-601: Suppress Ruff ARG002 for the intentionally unuseddataparameter.Static analysis flags
dataas unused. The docstring explains it's reserved for subclass overrides, which is the correct design. Acknowledge this explicitly to silence the warning without changing the API.♻️ Proposed fix
- def fill_prompt(self, data_point, data): - """Passing in full data in case it's needed to fill the prompt in subclasses.""" + def fill_prompt(self, data_point, data): # noqa: ARG002 + """Passing in full data in case it's needed to fill the prompt in subclasses."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/generate.py` at line 601, The method fill_prompt(self, data_point, data) intentionally keeps the data parameter for subclass overrides but Ruff flags it as ARG002; silence that linter warning by adding a per-line noqa for ARG002 on the function definition (e.g., append "# noqa: ARG002" to the def line for fill_prompt) so the API stays unchanged while acknowledging the intentionally unused parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/inference/generate.py`:
- Around line 584-598: _merge_audio_from_data currently copies audio by index
which misaligns when templates add/remove messages; change it to a role-based
merge: iterate through template-generated messages in _merge_audio_from_data and
for each msg, find the corresponding original message from
data_point["messages"] that has the same "role" (and same "name" if present),
matching the next unmatched original with that role to handle repeated roles; if
a matching original is found, extract audios via orig.get("audios") or
([orig["audio"]] if "audio" in orig else None) and assign to msg["audios"] only
if msg does not already have audios, ensuring existing msg audios are preserved.
- Around line 608-610: Replace the runtime input validation that uses assert on
user_msgs with a ValueError so it cannot be stripped in optimized mode: change
the assert checking len(user_msgs) == 1 (the "user_message override expects
exactly 1 user message..." check) to a raise ValueError with the same message;
also update the other pre-existing assert at the nearby check (the earlier
assert around line 256 that validates similar input) to raise ValueError as well
so all input validation in this module uses consistent, non-optimizable
exceptions.
- Around line 606-613: The override currently replaces or concatenates message
content assuming it's a string, which drops multimodal items or raises TypeError
when content is a list; update the logic around user_msgs and prompt_suffix so:
when applying self.cfg.user_message, if user_msgs[0]["content"] is a list,
replace or set only the text element (e.g., find an item with type "text" and
set its "text" to self.cfg.user_message, or prepend a {"type":"text","text":...}
item if none) and if content is a string keep the current assignment; when
applying self.cfg.prompt_suffix to data_point["messages"][-1]["content"], if
that content is a list append a {"type":"text","text": self.cfg.prompt_suffix}
item, and if it's a string concatenate as before; apply the same list-handling
fix to the other prompt_suffix sites referenced (the blocks using
data_point["messages"] and any similar concatenation operations).
---
Outside diff comments:
In `@nemo_skills/inference/generate.py`:
- Around line 369-370: The startup crash occurs because setup_prompt can return
None and code attempts to call self.prompt.get_code_execution_args() when
self.cfg.code_execution is True; update _post_init_validate_params (or the block
containing the shown lines) to guard against self.prompt being None: only call
self.prompt.get_code_execution_args() and assign self.extra_generate_params when
both self.cfg.code_execution is True and self.prompt is not None (otherwise
leave extra_generate_params unset or default). Ensure you reference the symbols
self.cfg.code_execution, self.prompt, get_code_execution_args, and
self.extra_generate_params when making the change.
---
Nitpick comments:
In `@nemo_skills/inference/generate.py`:
- Line 601: The method fill_prompt(self, data_point, data) intentionally keeps
the data parameter for subclass overrides but Ruff flags it as ARG002; silence
that linter warning by adding a per-line noqa for ARG002 on the function
definition (e.g., append "# noqa: ARG002" to the def line for fill_prompt) so
the API stays unchanged while acknowledging the intentionally unused parameter.
Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
nemo_skills/inference/generate.py (1)
601-601: Suppress intentional unused-parameter warning fordata.Ruff reports
ARG002(unused method argument) ondata. Since the parameter is intentionally kept for subclass overrides (as stated in the docstring), suppress the warning in place rather than removing it:♻️ Proposed fix
- def fill_prompt(self, data_point, data): - """Passing in full data in case it's needed to fill the prompt in subclasses.""" + def fill_prompt(self, data_point, data): # noqa: ARG002 + """Passing in full data in case it's needed to fill the prompt in subclasses."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/generate.py` at line 601, The method fill_prompt(self, data_point, data) intentionally keeps the data parameter for subclass overrides but Ruff flags ARG002; suppress the unused-parameter warning in-place by adding a noqa for ARG002 on the function definition (e.g., append "# noqa: ARG002" to the def line for fill_prompt) so the parameter stays for subclasses while silencing the lint error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@nemo_skills/inference/generate.py`:
- Around line 255-256: Replace the runtime-unsafe assert in the prompt config
validation with an explicit exception: check self.prompt_format == "ns" and if
so validate that self.prompt_config is not None, raising a ValueError (with a
clear message like "prompt_config is required when prompt_format == 'ns'")
instead of using assert; ensure this change is applied where the current assert
appears so that downstream code (e.g., fill_prompt / self.prompt.fill) fails
with a clear configuration error rather than an AttributeError.
- Around line 584-598: _merge_audio_from_data currently copies audio by index
which misassigns audio when the template adds or removes messages; change the
merge to align messages by role instead of position: iterate template messages
(messages) and for each msg try to find the next unused original message in
data_point["messages"] whose "role" equals msg.get("role") (track consumed
original indices) and copy its audios (orig_msg.get("audios") or
orig_msg.get("audio")). If role is missing in either side, fall back to the
existing positional assignment only for remaining unmatched originals; update
references to original_messages, orig_msg, msg, and audios accordingly so audio
is only copied from correctly-matched originals.
---
Nitpick comments:
In `@nemo_skills/inference/generate.py`:
- Line 601: The method fill_prompt(self, data_point, data) intentionally keeps
the data parameter for subclass overrides but Ruff flags ARG002; suppress the
unused-parameter warning in-place by adding a noqa for ARG002 on the function
definition (e.g., append "# noqa: ARG002" to the def line for fill_prompt) so
the parameter stays for subclasses while silencing the lint error.
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
b875c2d to
601d92d
Compare
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
|
@melllinia plz re-review and lets proceed |
Example
ns eval ... ++prompt_config=/path/to/prompt.yaml
ns eval ... ++user_message="Transcribe this audio"
ns eval ... ++prompt_config=/path/to/prompt.yaml
++prompt_config=... — Path or name of a prompt config YAML (e.g. eval/speechlm/numb3rs_text). Used for both NS and openai format; with openai, the template is filled and audio from the data is merged into the generated messages.
++user_message=... — Overrides only the user message. For openai-format data with a single user message, this is a quick way to try a different instruction without adding a YAML (e.g. for audio benchmarks).
++system_message=... — Overrides the system message
Summary by CodeRabbit
New Features
Tests