Skip to content

[trajectory,logger]add metadata to trajectory and add llm_logger#805

Merged
ahgpt merged 5 commits intomainfrom
feat/add_metadata_to_traj
Mar 6, 2026
Merged

[trajectory,logger]add metadata to trajectory and add llm_logger#805
ahgpt merged 5 commits intomainfrom
feat/add_metadata_to_traj

Conversation

@SzekiHou
Copy link
Collaborator

@SzekiHou SzekiHou commented Mar 5, 2026

No description provided.

@SzekiHou SzekiHou requested a review from ahgpt March 5, 2026 04:06
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability and debugging capabilities for Large Language Model (LLM) interactions within the system. It establishes a robust logging framework specifically for LLM calls, capturing detailed request and response data, including timing and parameters. Furthermore, it enriches the system's internal memory and trajectory records by embedding raw LLM outputs, providing a more comprehensive and traceable history of LLM usage.

Highlights

  • LLM Logging Infrastructure: Introduced a dedicated llm_logger with a custom formatter and a _safe_serialize utility in aworld/logs/util.py to enable structured logging of Large Language Model (LLM) interactions.
  • LLM Request/Response Tracking: Implemented log_llm_record function and integrated it into LLMModel's acompletion, completion, and astream_completion methods to log LLM inputs, outputs, and streaming chunks, including unique request IDs, task IDs, and time costs.
  • Enhanced Trajectory Metadata: Modified aworld/dataset/trajectory_strategy.py to include the raw LLM response within the ext_info metadata of assistant messages, providing richer context for trajectory analysis.
  • Improved OpenAI Parameter Handling: Refactored get_openai_params in aworld/models/openai_provider.py to explicitly handle and log common LLM parameters like temperature, max_tokens, and stop, ensuring accurate parameter passing and logging.
  • Memory Enrichment with Raw LLM Responses: Updated aworld/runners/handler/memory.py to store the full raw LLM response in the ext_info of MemoryMessage when adding LLM responses to memory, enhancing the detail available for debugging and analysis.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • aworld/dataset/trajectory_strategy.py
    • Added 'raw_response' from assistant messages to 'ext_info' metadata when converting 'MemoryMessage' to OpenAI message format.
  • aworld/logs/util.py
    • Imported 'asyncio', 'json', 'traceback', 'enum', and 'Optional' for new functionalities.
    • Removed extraneous whitespace in docstrings and path concatenation.
    • Added 'llm_logger' with a custom formatter for LLM-specific logging.
    • Introduced '_safe_serialize' function for robust JSON serialization of various Python objects, handling non-serializable types.
    • Added 'log_llm_record' function to log LLM interactions with direction, model name, data, parameters, and trace ID.
    • Included 'llm_logger' in the 'update_logger_level' function.
  • aworld/models/llm.py
    • Imported 'time' and 'uuid' for generating unique request IDs.
    • Imported 'log_llm_record' from 'aworld.logs.util'.
    • Added '_generate_llm_request_id' static method to create unique request IDs.
    • Integrated 'log_llm_record' calls into 'acompletion', 'completion', and 'astream_completion' methods to log LLM inputs, outputs, and chunks, including request IDs, task IDs, and time costs.
    • Passed 'llm_request_id' to provider calls.
  • aworld/models/openai_provider.py
    • Imported 'log_llm_record' from 'aworld.logs.util'.
    • Modified 'get_openai_params' to explicitly include 'temperature', 'max_tokens', and 'stop' in 'llm_params' before filtering.
    • Logged the final OpenAI parameters using 'log_llm_record' before returning them.
    • Updated condition for adding parameters to 'openai_params' to check for 'is not None' instead of truthiness.
  • aworld/runners/handler/memory.py
    • Added 'raw_response' of the LLM response to the 'ext_info' dictionary when adding an LLM response to memory.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant enhancements for logging and data tracking, adding a new llm_logger to record detailed LLM API call information and enriching trajectories with raw LLM responses. However, these changes introduce several security risks related to insecure data handling. The llm_logger records full conversation history and potentially sensitive API keys in plain text to local log files, and storing raw LLM responses in persistent storage increases the risk of sensitive data leakage. Additionally, serialization logic failures in the logger could lead to a Denial of Service. My review also includes suggestions to improve maintainability by refactoring duplicated logging code, a fix for a potential bug where a dictionary is modified in-place, a minor code simplification, and a correction to an outdated docstring.

"max_tokens": max_tokens,
"stop": stop
})
log_llm_record("OPENAI_PARAMS", model_name, llm_params, {"request_id": llm_params.pop("llm_request_id", None)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The get_openai_params method logs the llm_params dictionary, which is updated with all keyword arguments passed to the completion methods. If an api_key is provided as a keyword argument (e.g., to override the default configuration for a specific call), it will be logged in plain text to the llm.log file. This poses a significant security risk as API keys should never be stored in logs.

"request_id": request_id,
}
kwargs["llm_request_id"] = request_id
log_llm_record("INPUT", self.provider.model_name, messages, log_params, context.trace_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The LLMModel class logs the full content of LLM input messages, output responses, and streaming chunks to llm.log. This introduces a security risk as these logs may contain Personally Identifiable Information (PII) or other sensitive data. It is recommended to implement a mechanism to mask sensitive data or allow disabling LLM call logging in production environments. Furthermore, there is significant code duplication for logging the start and output of LLM calls across acompletion, completion, and astream_completion methods. Consider refactoring this logic into a helper method or context manager to reduce redundancy and improve maintainability.

"request_id": request_id,
}
kwargs["llm_request_id"] = request_id
log_llm_record("INPUT", self.provider.model_name, messages, log_params, context.trace_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The LLMModel class logs the full content of LLM input messages, output responses, and streaming chunks. These logs are written to a local file (llm.log) and may contain Personally Identifiable Information (PII) or other sensitive data present in the conversation. It is recommended to implement a mechanism to mask sensitive data or allow disabling LLM call logging in production environments.

resp = await self.llm_response_parser.parse(resp, **response_parse_args)

log_params["time_cost"] = round(time.time() - start_ms, 3)
log_llm_record("OUTPUT", self.provider.model_name, resp, log_params, context.trace_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The LLMModel class logs the full content of LLM input messages, output responses, and streaming chunks. These logs are written to a local file (llm.log) and may contain Personally Identifiable Information (PII) or other sensitive data present in the conversation. It is recommended to implement a mechanism to mask sensitive data or allow disabling LLM call logging in production environments.

resp = sync_exec(self.llm_response_parser.parse, resp, **response_parse_args)

log_params["time_cost"] = round(time.time() - start_ms, 3)
log_llm_record("OUTPUT", self.provider.model_name, resp, log_params, context.trace_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The LLMModel class logs the full content of LLM input messages, output responses, and streaming chunks. These logs are written to a local file (llm.log) and may contain Personally Identifiable Information (PII) or other sensitive data present in the conversation. It is recommended to implement a mechanism to mask sensitive data or allow disabling LLM call logging in production environments.

e,
traceback.format_exc(),
)
raise RuntimeError(f"{e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The _safe_serialize function raises a RuntimeError if an object cannot be JSON serialized. Since this function is called within log_llm_record, which is used for logging LLM inputs and outputs, a serialization failure will cause the entire LLM call to fail. This can lead to a Denial of Service (DoS) if an unexpected or non-serializable object is passed to the logger. Logging should ideally be a non-blocking, side-effect-free operation that does not impact the main application flow.

if history.role == "assistant":
ext_info = history.metadata.get("ext_info", {}) if history.metadata else {}
raw_response = ext_info.get("raw_response", "")
openai_msg["raw_response"] = raw_response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The application stores the raw LLM response in trajectory datasets and memory metadata. If the LLM response contains sensitive information or secrets, these will be persisted in the dataset or memory store, increasing the risk of data exposure. It is recommended to sanitize LLM responses before storage if they are not strictly necessary for the application's functionality.

ext_info={
"tools": agent.tools
"tools": agent.tools,
"raw_response": llm_response.to_dict() if hasattr(llm_response, 'to_dict') else llm_response,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The application stores the raw LLM response in trajectory datasets and memory metadata. If the LLM response contains sensitive information or secrets, these will be persisted in the dataset or memory store, increasing the risk of data exposure. It is recommended to sanitize LLM responses before storage if they are not strictly necessary for the application's functionality.

messages.append(history.to_openai_message())
openai_msg = history.to_openai_message()
if history.role == "assistant":
ext_info = history.metadata.get("ext_info", {}) if history.metadata else {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check if history.metadata is redundant. The MemoryItem base class ensures that metadata is always a dictionary, so history.metadata.get("ext_info", {}) is safe to call directly.

Suggested change
ext_info = history.metadata.get("ext_info", {}) if history.metadata else {}
ext_info = history.metadata.get("ext_info", {})

SiqiHouCk and others added 2 commits March 5, 2026 19:29
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ahgpt ahgpt merged commit a72a91f into main Mar 6, 2026
1 check passed
@SzekiHou SzekiHou deleted the feat/add_metadata_to_traj branch March 6, 2026 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants