[trajectory,logger]add metadata to trajectory and add llm_logger#805
[trajectory,logger]add metadata to trajectory and add llm_logger#805
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability and debugging capabilities for Large Language Model (LLM) interactions within the system. It establishes a robust logging framework specifically for LLM calls, capturing detailed request and response data, including timing and parameters. Furthermore, it enriches the system's internal memory and trajectory records by embedding raw LLM outputs, providing a more comprehensive and traceable history of LLM usage. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant enhancements for logging and data tracking, adding a new llm_logger to record detailed LLM API call information and enriching trajectories with raw LLM responses. However, these changes introduce several security risks related to insecure data handling. The llm_logger records full conversation history and potentially sensitive API keys in plain text to local log files, and storing raw LLM responses in persistent storage increases the risk of sensitive data leakage. Additionally, serialization logic failures in the logger could lead to a Denial of Service. My review also includes suggestions to improve maintainability by refactoring duplicated logging code, a fix for a potential bug where a dictionary is modified in-place, a minor code simplification, and a correction to an outdated docstring.
| "max_tokens": max_tokens, | ||
| "stop": stop | ||
| }) | ||
| log_llm_record("OPENAI_PARAMS", model_name, llm_params, {"request_id": llm_params.pop("llm_request_id", None)}) |
There was a problem hiding this comment.
The get_openai_params method logs the llm_params dictionary, which is updated with all keyword arguments passed to the completion methods. If an api_key is provided as a keyword argument (e.g., to override the default configuration for a specific call), it will be logged in plain text to the llm.log file. This poses a significant security risk as API keys should never be stored in logs.
| "request_id": request_id, | ||
| } | ||
| kwargs["llm_request_id"] = request_id | ||
| log_llm_record("INPUT", self.provider.model_name, messages, log_params, context.trace_id) |
There was a problem hiding this comment.
The LLMModel class logs the full content of LLM input messages, output responses, and streaming chunks to llm.log. This introduces a security risk as these logs may contain Personally Identifiable Information (PII) or other sensitive data. It is recommended to implement a mechanism to mask sensitive data or allow disabling LLM call logging in production environments. Furthermore, there is significant code duplication for logging the start and output of LLM calls across acompletion, completion, and astream_completion methods. Consider refactoring this logic into a helper method or context manager to reduce redundancy and improve maintainability.
| "request_id": request_id, | ||
| } | ||
| kwargs["llm_request_id"] = request_id | ||
| log_llm_record("INPUT", self.provider.model_name, messages, log_params, context.trace_id) |
There was a problem hiding this comment.
The LLMModel class logs the full content of LLM input messages, output responses, and streaming chunks. These logs are written to a local file (llm.log) and may contain Personally Identifiable Information (PII) or other sensitive data present in the conversation. It is recommended to implement a mechanism to mask sensitive data or allow disabling LLM call logging in production environments.
| resp = await self.llm_response_parser.parse(resp, **response_parse_args) | ||
|
|
||
| log_params["time_cost"] = round(time.time() - start_ms, 3) | ||
| log_llm_record("OUTPUT", self.provider.model_name, resp, log_params, context.trace_id) |
There was a problem hiding this comment.
The LLMModel class logs the full content of LLM input messages, output responses, and streaming chunks. These logs are written to a local file (llm.log) and may contain Personally Identifiable Information (PII) or other sensitive data present in the conversation. It is recommended to implement a mechanism to mask sensitive data or allow disabling LLM call logging in production environments.
| resp = sync_exec(self.llm_response_parser.parse, resp, **response_parse_args) | ||
|
|
||
| log_params["time_cost"] = round(time.time() - start_ms, 3) | ||
| log_llm_record("OUTPUT", self.provider.model_name, resp, log_params, context.trace_id) |
There was a problem hiding this comment.
The LLMModel class logs the full content of LLM input messages, output responses, and streaming chunks. These logs are written to a local file (llm.log) and may contain Personally Identifiable Information (PII) or other sensitive data present in the conversation. It is recommended to implement a mechanism to mask sensitive data or allow disabling LLM call logging in production environments.
aworld/logs/util.py
Outdated
| e, | ||
| traceback.format_exc(), | ||
| ) | ||
| raise RuntimeError(f"{e}") |
There was a problem hiding this comment.
The _safe_serialize function raises a RuntimeError if an object cannot be JSON serialized. Since this function is called within log_llm_record, which is used for logging LLM inputs and outputs, a serialization failure will cause the entire LLM call to fail. This can lead to a Denial of Service (DoS) if an unexpected or non-serializable object is passed to the logger. Logging should ideally be a non-blocking, side-effect-free operation that does not impact the main application flow.
| if history.role == "assistant": | ||
| ext_info = history.metadata.get("ext_info", {}) if history.metadata else {} | ||
| raw_response = ext_info.get("raw_response", "") | ||
| openai_msg["raw_response"] = raw_response |
There was a problem hiding this comment.
The application stores the raw LLM response in trajectory datasets and memory metadata. If the LLM response contains sensitive information or secrets, these will be persisted in the dataset or memory store, increasing the risk of data exposure. It is recommended to sanitize LLM responses before storage if they are not strictly necessary for the application's functionality.
| ext_info={ | ||
| "tools": agent.tools | ||
| "tools": agent.tools, | ||
| "raw_response": llm_response.to_dict() if hasattr(llm_response, 'to_dict') else llm_response, |
There was a problem hiding this comment.
The application stores the raw LLM response in trajectory datasets and memory metadata. If the LLM response contains sensitive information or secrets, these will be persisted in the dataset or memory store, increasing the risk of data exposure. It is recommended to sanitize LLM responses before storage if they are not strictly necessary for the application's functionality.
| messages.append(history.to_openai_message()) | ||
| openai_msg = history.to_openai_message() | ||
| if history.role == "assistant": | ||
| ext_info = history.metadata.get("ext_info", {}) if history.metadata else {} |
There was a problem hiding this comment.
The check if history.metadata is redundant. The MemoryItem base class ensures that metadata is always a dictionary, so history.metadata.get("ext_info", {}) is safe to call directly.
| ext_info = history.metadata.get("ext_info", {}) if history.metadata else {} | |
| ext_info = history.metadata.get("ext_info", {}) |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
No description provided.