Skip to content

Conversation

@RONAK-AI647
Copy link
Contributor

Description

This PR resolves a critical vulnerability in the GmailSearch tool's message parsing logic that caused the tool to crash when encountering real-world email data like encode-decode and fetching sender's subject/from.

Relevant issues

Fixes #1030

Type

🐛 Bug Fix
🧹 Refactoring

Changes(optional)

  1. Fix 1 : UnicodeDecodeError (Fixes UnicodeDecodeError when decoding message body with non-UTF-8 encoding #1030)

Added errors="replace" to the decoding call for non-multipart message bodies. This will gracefully replace the invalid bytes with a placeholder characters instead of raising a fatal UnicodeDecodeError

  1. Fix 2: Header NoneType Crash (This is a new thing i found while I was testing this issue)

The code fetched headers (email_msg["Subject"]) which returned None when the header was missing. A downstream function (either clean_email_body or later string processing) then attempted to call a string method (like .replace()) on that None object, causing an AttributeError: 'NoneType' object has no attribute 'replace'.

solution: Added a check to safely convert any missing header value (None) to an empty string ("") upon retrieval.

Testing(optional)

To be added after maintainers' initial review .

Note(optional)

  1. PR only touches code within the community package.
  2. Changes are backwards compatible.

@mdrxy mdrxy requested a review from Copilot October 6, 2025 01:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes critical error handling in the GmailSearch tool's message parsing logic to prevent crashes when processing real-world email data with encoding issues or missing headers.

  • Adds graceful Unicode decoding error handling with errors="replace" parameter
  • Ensures header safety by converting None values to empty strings
  • Maintains backward compatibility while improving robustness

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 101 to 118
try:
message_body = part.get_payload(decode=True).decode("utf-8") # type: ignore[union-attr]
message_body = part.get_payload(decode=True).decode( # type: ignore[union-attr]
"utf-8", errors="replace"
)
except UnicodeDecodeError:
message_body = part.get_payload(decode=True).decode( # type: ignore[union-attr]
"latin-1"
"latin-1", errors="replace"
)
break
else:
try:
message_body = email_msg.get_payload(decode=True).decode("utf-8") # type: ignore[union-attr]
message_body = email_msg.get_payload(decode=True).decode( # type: ignore[union-attr]
"utf-8", errors="replace"
)
except UnicodeDecodeError:
message_body = email_msg.get_payload(decode=True).decode( # type: ignore[union-attr]
"latin-1"
"latin-1", errors="replace"
)
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback to latin-1 encoding with errors='replace' is redundant since the primary utf-8 decode now also uses errors='replace'. Consider removing this exception handling block or using a different fallback encoding strategy.

Copilot uses AI. Check for mistakes.
Comment on lines 101 to 118
try:
message_body = part.get_payload(decode=True).decode("utf-8") # type: ignore[union-attr]
message_body = part.get_payload(decode=True).decode( # type: ignore[union-attr]
"utf-8", errors="replace"
)
except UnicodeDecodeError:
message_body = part.get_payload(decode=True).decode( # type: ignore[union-attr]
"latin-1"
"latin-1", errors="replace"
)
break
else:
try:
message_body = email_msg.get_payload(decode=True).decode("utf-8") # type: ignore[union-attr]
message_body = email_msg.get_payload(decode=True).decode( # type: ignore[union-attr]
"utf-8", errors="replace"
)
except UnicodeDecodeError:
message_body = email_msg.get_payload(decode=True).decode( # type: ignore[union-attr]
"latin-1"
"latin-1", errors="replace"
)
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the multipart case, this fallback exception handling is now redundant since the primary utf-8 decode uses errors='replace'. Consider removing this block or implementing a different fallback strategy.

Copilot uses AI. Check for mistakes.

subject = email_msg["Subject"]
sender = email_msg["From"]
subject = email_msg["Subject"] or ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: you can just use email_msg.get("Subject", "") here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree , this is robust and more pythonic, i will update it

@lkuligin lkuligin merged commit 33f015b into langchain-ai:main Oct 10, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnicodeDecodeError when decoding message body with non-UTF-8 encoding

2 participants