-
Notifications
You must be signed in to change notification settings - Fork 436
Update alert traige agent to work with nat optimizer #992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Signed-off-by: Hsin Chen <[email protected]>
WalkthroughAdds an offline optimizer YAML and README updates; converts several agent prompt fields to OptimizableField/SearchSpace with optimizer prompt-purpose metadata; adds an optimizer_prompts module; and wires optimizer/eval settings for an end-to-end offline alert-triage optimization workflow. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Dev as Developer (CLI)
participant NAT as nat optimize
participant CFG as Config YAML
participant DS as Offline Dataset
participant WF as Alert Triage Workflow
participant LLM as LLM(s)
participant EVAL as Evaluators
Dev->>NAT: nat optimize -c config_offline_optimizer.yml
NAT->>CFG: Load workflow, tools, llms, optimizer settings
NAT->>DS: Load offline dataset & benign fallbacks
loop GA generations
NAT->>NAT: Generate/mutate parameter sets (including prompts)
par Parallel evaluations
NAT->>WF: Instantiate workflow with parameter set (offline_mode)
WF->>DS: Read sample input / fallbacks
WF->>LLM: Invoke configured LLMs/tools per sample
WF-->>NAT: Return outputs
NAT->>EVAL: Score outputs (rag_accuracy, classification_accuracy)
EVAL-->>NAT: Return scores
end
NAT->>NAT: Select/retain best parameter sets
end
NAT-->>Dev: Persist best params, reports to optimizer output_path
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (4)**/*.{py,yaml,yml}📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
Files:
**/*.py📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
Files:
**/*⚙️ CodeRabbit configuration file
Files:
examples/**/*⚙️ CodeRabbit configuration file
Files:
🧬 Code graph analysis (1)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py (2)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
dnandakumar-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Could we also add a section to the README letting developers know that an optimizable config is available and how to run optimization for this workflow?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompt.py (1)
2-12: Consider usingFinalfor immutable constants.The class constants
AGENT_PROMPT_PURPOSEandTELEMETRY_AGENT_PROMPT_PURPOSEare immutable and could benefit from explicit type annotations.+from typing import Final + class OptimizerPrompts: - AGENT_PROMPT_PURPOSE = """This is the system prompt that instructs the Alert Triage Agent on how to behave and respond to system alerts. It is used as a SystemMessage that's prepended to every LLM conversation, providing the agent with its role and behavior guidelines. + AGENT_PROMPT_PURPOSE: Final[str] = """This is the system prompt that instructs the Alert Triage Agent on how to behave and respond to system alerts. It is used as a SystemMessage that's prepended to every LLM conversation, providing the agent with its role and behavior guidelines. The prompt should be well-structured and provide specific instructions to help the agent: - Analyze incoming alerts and identify their type (e.g., InstanceDown, HighCPUUsage) - Select and use the appropriate diagnostic tools for each alert type (hardware_check, host_performance_check, network_connectivity_check, telemetry_metrics_analysis_agent, monitoring_process_check) - Avoid calling the same tool repeatedly during a single alert investigation - Correlate collected data from multiple tools to determine root causes - Distinguish between true issues, false positives, and benign anomalies - Generate structured markdown triage reports with clear sections: Alert Summary, Collected Metrics, Analysis, Recommended Actions, and Alert Status The prompt should give the agent clear security context and explicit instructions on the expected final report format to ensure consistent, actionable output for system analysts.""" - TELEMETRY_AGENT_PROMPT_PURPOSE = """This is the system prompt for the Telemetry Metrics Analysis Agent, a specialized sub-agent within the alert triage system. It is used as a SystemMessage for a nested agent that the main Alert Triage Agent can call to analyze remotely collected telemetry data. + TELEMETRY_AGENT_PROMPT_PURPOSE: Final[str] = """This is the system prompt for the Telemetry Metrics Analysis Agent, a specialized sub-agent within the alert triage system. It is used as a SystemMessage for a nested agent that the main Alert Triage Agent can call to analyze remotely collected telemetry data.Also applies to: 13-28
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py (1)
67-75: Remove redundantpromptparameter fromSearchSpace.The
OptimizableFieldimplementation automatically uses the field's default value as the base prompt whenspace.promptis not specified. Since bothdefaultandspace.promptare set toALERT_TRIAGE_AGENT_PROMPT, the explicitpromptparameter is redundant.Apply this diff to simplify the configuration:
agent_prompt: str = OptimizableField( default=ALERT_TRIAGE_AGENT_PROMPT, description="The system prompt to use for the alert triage agent.", space=SearchSpace( is_prompt=True, - prompt=ALERT_TRIAGE_AGENT_PROMPT, prompt_purpose=OptimizerPrompts.AGENT_PROMPT_PURPOSE, ) )Based on the
OptimizableFieldimplementation insrc/nat/data_models/optimizable.py(lines 78-82), which automatically falls back to the field's default whenspace.promptis None.examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py (2)
35-43: Remove redundantpromptparameter fromSearchSpace.Similar to the main agent configuration, the
space.promptparameter is redundant sinceOptimizableFieldautomatically uses the field's default value as the base prompt.Apply this diff:
prompt: str | None = OptimizableField( default=TelemetryMetricsAnalysisAgentPrompts.PROMPT, description="The system prompt to use for the alert triage agent.", space=SearchSpace( is_prompt=True, - prompt=TelemetryMetricsAnalysisAgentPrompts.PROMPT, prompt_purpose=OptimizerPrompts.TELEMETRY_AGENT_PROMPT_PURPOSE, ) )Based on the
OptimizableFieldimplementation insrc/nat/data_models/optimizable.py.
33-33: Pre-existing issue: Mutable default fortool_names.Ruff correctly flags that
tool_names: list[str] = []uses a mutable default, which can cause unexpected behavior if the list is modified. While this is a pre-existing issue not introduced by this PR, consider addressing it.Apply this diff to fix the issue:
+from typing import ClassVar + class TelemetryMetricsAnalysisAgentConfig(FunctionBaseConfig, name="telemetry_metrics_analysis_agent"): description: str = Field(default=TelemetryMetricsAnalysisAgentPrompts.TOOL_DESCRIPTION, description="Description of the tool for the triage agent.") - tool_names: list[str] = [] + tool_names: list[str] = Field(default_factory=list)Based on static analysis hints.
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/configs/config_offline_optimizer.yml (1)
69-69: Consider extracting the duplicated system objective.The
system_objectivestring is duplicated betweenprompt_init(line 69) andprompt_recombination(line 73). This long description could be extracted to a YAML anchor for better maintainability.Apply this diff to use YAML anchors:
+ system_objective_description: &system_objective The alert triage agent autonomously investigates infrastructure monitoring alerts, performs root cause analysis, and generates structured diagnostic reports by dynamically selecting and orchestrating diagnostic tools including IPMI hardware checks, network connectivity tests, host performance monitoring, process status verification, and telemetry analysis, then correlating multi-source data through LLM-powered reasoning to classify issues into predefined categories (hardware, software, network, false positive, or requiring investigation), helping security analysts reduce manual triage workload, accelerate incident response times, and maintain consistent investigation quality through standardized evidence collection and automated documentation of findings and recommended remediation actions. + prompt_init: _type: prompt_init optimizer_llm: optimizer_llm - system_objective: The alert triage agent autonomously investigates infrastructure monitoring alerts, performs root cause analysis, and generates structured diagnostic reports by dynamically selecting and orchestrating diagnostic tools including IPMI hardware checks, network connectivity tests, host performance monitoring, process status verification, and telemetry analysis, then correlating multi-source data through LLM-powered reasoning to classify issues into predefined categories (hardware, software, network, false positive, or requiring investigation), helping security analysts reduce manual triage workload, accelerate incident response times, and maintain consistent investigation quality through standardized evidence collection and automated documentation of findings and recommended remediation actions. + system_objective: *system_objective prompt_recombination: _type: prompt_recombiner optimizer_llm: optimizer_llm - system_objective: The alert triage agent autonomously investigates infrastructure monitoring alerts, performs root cause analysis, and generates structured diagnostic reports by dynamically selecting and orchestrating diagnostic tools including IPMI hardware checks, network connectivity tests, host performance monitoring, process status verification, and telemetry analysis, then correlating multi-source data through LLM-powered reasoning to classify issues into predefined categories (hardware, software, network, false positive, or requiring investigation), helping security analysts reduce manual triage workload, accelerate incident response times, and maintain consistent investigation quality through standardized evidence collection and automated documentation of findings and recommended remediation actions. + system_objective: *system_objectiveAlso applies to: 73-73
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/configs/config_offline_optimizer.yml(1 hunks)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompt.py(1 hunks)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py(3 hunks)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompt.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/configs/config_offline_optimizer.yml
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompt.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompt.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/configs/config_offline_optimizer.yml
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompt.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/configs/config_offline_optimizer.yml
**/*.{yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
In workflow/config YAML, set llms.._type: nat_test_llm to stub responses.
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/configs/config_offline_optimizer.yml
**/configs/**
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Configuration files consumed by code must be stored next to that code in a configs/ folder
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/configs/config_offline_optimizer.yml
🧬 Code graph analysis (2)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py (3)
src/nat/data_models/optimizable.py (2)
OptimizableField(68-107)SearchSpace(33-65)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompt.py (1)
OptimizerPrompts(1-28)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-27)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py (2)
src/nat/data_models/optimizable.py (2)
OptimizableField(68-107)SearchSpace(33-65)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompt.py (1)
OptimizerPrompts(1-28)
🪛 Ruff (0.14.0)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py
33-33: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (5)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py (1)
27-28: LGTM: Imports are correctly added.The new imports for optimization support (
OptimizableField,SearchSpace,OptimizerPrompts) are properly organized and align with the changes to make the agent prompt optimizable.Also applies to: 46-46
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py (1)
24-25: LGTM: Imports are correctly added.The new imports for optimization support are properly organized and necessary for the prompt optimization functionality.
Also applies to: 28-28
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/configs/config_offline_optimizer.yml (3)
1-15: LGTM: License header is correct.The Apache License 2.0 header is properly formatted and includes the correct copyright year (2025).
As per coding guidelines.
45-45: Placeholder URL in metrics configuration.The
metrics_urlfields contain placeholder values (http://your-monitoring-server:9090) with a comment indicating they should be replaced when running in live mode. Sinceoffline_mode: trueis set for these functions, these placeholders are safe but could be clarified.The placeholder URLs are acceptable given that offline mode is enabled. However, verify that the system gracefully handles these placeholder URLs if offline mode is accidentally disabled.
Also applies to: 50-50
151-189: LGTM: Optimizer configuration is well-structured.The optimizer section properly configures:
- Both numeric and prompt optimization
- Evaluation metrics with appropriate directions (maximize)
- GA parameters for prompt optimization
- References to the prompt initialization and recombination functions
The configuration aligns with the code changes that make prompts optimizable via
OptimizableField.
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.py
Show resolved
Hide resolved
...ced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py
Outdated
Show resolved
Hide resolved
…ments Signed-off-by: Hsin Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.py (1)
24-50: Consider adding constant docstrings.While the class docstring provides context, individual docstrings for
AGENT_PROMPT_PURPOSEandTELEMETRY_AGENT_PROMPT_PURPOSEwould improve clarity and follow best practices for public constants.Example:
""" AGENT_PROMPT_PURPOSE = """This is the system prompt that instructs the Alert Triage Agent on how to behave and respond to system alerts. It is used as a SystemMessage that's prepended to every LLM conversation, providing the agent with its role and behavior guidelines.could become:
""" #: Prompt purpose for the main Alert Triage Agent system prompt. AGENT_PROMPT_PURPOSE = """This is the system prompt that instructs the Alert Triage Agent on how to behave and respond to system alerts. It is used as a SystemMessage that's prepended to every LLM conversation, providing the agent with its role and behavior guidelines.examples/advanced_agents/alert_triage_agent/README.md (2)
531-542: Fix list indentation.The unordered list items should use 0 spaces of indentation instead of 3 for consistency with markdown best practices.
Apply this diff to fix the indentation:
-#### 1. **Set required environment variables** +#### 1. Set required environment variables Make sure `offline_mode: true` is set in both the `workflow` section and individual tool sections of your config file (see [Understanding the configuration](#understanding-the-configuration) section). -#### 2. **How offline mode works:** +#### 2. How offline mode works: - - The **main CSV offline dataset** (`offline_data_path`) provides both alert details and a mock environment. For each alert, expected tool return values are included. These simulate how the environment would behave if the alert occurred on a real system. - - The **JSON offline dataset** (`eval.general.dataset.filepath` in the config) contains a subset of the information from the main CSV: the alert inputs and their associated ground truth root causes. It is used to run `nat eval`, focusing only on the essential data needed for running the workflow, while the full CSV retains the complete mock environment context. - - At runtime, the system links each alert in the JSON dataset to its corresponding context in the CSV using the unique host IDs included in both datasets. - - The **benign fallback dataset** fills in tool responses when the agent calls a tool not explicitly defined in the alert's offline data. These fallback responses mimic healthy system behavior and help provide the "background scenery" without obscuring the true root cause. +- The **main CSV offline dataset** (`offline_data_path`) provides both alert details and a mock environment. For each alert, expected tool return values are included. These simulate how the environment would behave if the alert occurred on a real system. +- The **JSON offline dataset** (`eval.general.dataset.filepath` in the config) contains a subset of the information from the main CSV: the alert inputs and their associated ground truth root causes. It is used to run `nat eval`, focusing only on the essential data needed for running the workflow, while the full CSV retains the complete mock environment context. +- At runtime, the system links each alert in the JSON dataset to its corresponding context in the CSV using the unique host IDs included in both datasets. +- The **benign fallback dataset** fills in tool responses when the agent calls a tool not explicitly defined in the alert's offline data. These fallback responses mimic healthy system behavior and help provide the "background scenery" without obscuring the true root cause. -#### 3. **Run the agent in offline mode** +#### 3. Run the agent in offline mode
631-642: Fix list indentation for optimization workflow.The list items under the optimization section should also use 0 spaces of indentation.
Apply this diff:
The agent will: - - Load alerts from the JSON dataset specified in the config `eval.general.dataset.filepath` - - Run optimization for the metrics specified in the config `optimizer.eval_metrics` - - Save the optimization results to the path specified by `optimizer.output_dir` +- Load alerts from the JSON dataset specified in the config `eval.general.dataset.filepath` +- Run optimization for the metrics specified in the config `optimizer.eval_metrics` +- Save the optimization results to the path specified by `optimizer.output_dir`
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/advanced_agents/alert_triage_agent/README.md(5 hunks)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.py(1 hunks)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.pyexamples/advanced_agents/alert_triage_agent/README.md
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.pyexamples/advanced_agents/alert_triage_agent/README.md
**/README.@(md|ipynb)
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Ensure READMEs follow the naming convention; avoid deprecated names; use “NeMo Agent Toolkit” (capital T) in headings
Files:
examples/advanced_agents/alert_triage_agent/README.md
🧬 Code graph analysis (1)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py (2)
src/nat/data_models/optimizable.py (2)
OptimizableField(68-107)SearchSpace(33-65)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.py (1)
OptimizerPrompts(16-50)
🪛 markdownlint-cli2 (0.18.1)
examples/advanced_agents/alert_triage_agent/README.md
537-537: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
538-538: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
539-539: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
540-540: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
639-639: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
640-640: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
641-641: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
🪛 Ruff (0.14.0)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py
33-33: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (2)
examples/advanced_agents/alert_triage_agent/README.md (1)
298-393: LGTM! Excellent optimization documentation.The new Optimization section is comprehensive, well-structured, and provides clear guidance on both numeric and prompt optimization. The examples and configuration snippets effectively demonstrate how to enable and configure the optimizer.
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py (1)
35-43: LGTM! Well-structured optimization configuration.The conversion to
OptimizableFieldwith properSearchSpaceconfiguration is correct. The field maintains the non-None type as requested in past reviews, and properly wires the prompt purpose fromOptimizerPrompts.
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.py
Show resolved
Hide resolved
...ced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py
Outdated
Show resolved
Hide resolved
...ced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Hsin Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.py(1 hunks)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py(3 hunks)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.pyexamples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py
🧬 Code graph analysis (2)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py (2)
src/nat/data_models/optimizable.py (2)
OptimizableField(68-107)SearchSpace(33-65)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.py (1)
OptimizerPrompts(22-56)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py (2)
src/nat/data_models/optimizable.py (2)
OptimizableField(68-107)SearchSpace(33-65)examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.py (1)
OptimizerPrompts(22-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (3)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/optimizer_prompts.py (1)
1-56: LGTM - all documentation requirements met.The file now includes the required Apache License header, module-level docstring, and class docstring. The prompt purpose constants are detailed and self-documenting, providing clear guidance for the optimizer.
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py (1)
67-73: OptimizableField configuration looks correct.The agent_prompt field is properly configured with SearchSpace metadata for prompt optimization, including the prompt purpose from OptimizerPrompts. The implementation follows the pattern established in the codebase.
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/telemetry_metrics_analysis_agent.py (1)
24-25: All previous review issues have been addressed.The changes correctly implement optimizer support:
- Import path uses
.optimizer_prompts(plural) ✓tool_namesusesField(default_factory=list)to avoid mutable default ✓prompttype is non-optional (str) ✓OptimizableFieldconfiguration withSearchSpaceis properly structured ✓Also applies to: 28-28, 35-36, 38-44
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Hsin Chen <[email protected]>
This PR updates the alert traige agent to include a new config for the optimizer, and updates the system prompts (core agent and the sub agent) to be optimizable.
Test run of the optimizer config succeeded.
All unit tests passed.
Summary by CodeRabbit
New Features
Documentation