-
Notifications
You must be signed in to change notification settings - Fork 317
fix(community): Handle UnicodeDecodeError in GmailSearch and ensure header robustness
#1226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes critical error handling in the GmailSearch tool's message parsing logic to prevent crashes when processing real-world email data with encoding issues or missing headers.
- Adds graceful Unicode decoding error handling with
errors="replace"parameter - Ensures header safety by converting None values to empty strings
- Maintains backward compatibility while improving robustness
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| try: | ||
| message_body = part.get_payload(decode=True).decode("utf-8") # type: ignore[union-attr] | ||
| message_body = part.get_payload(decode=True).decode( # type: ignore[union-attr] | ||
| "utf-8", errors="replace" | ||
| ) | ||
| except UnicodeDecodeError: | ||
| message_body = part.get_payload(decode=True).decode( # type: ignore[union-attr] | ||
| "latin-1" | ||
| "latin-1", errors="replace" | ||
| ) | ||
| break | ||
| else: | ||
| try: | ||
| message_body = email_msg.get_payload(decode=True).decode("utf-8") # type: ignore[union-attr] | ||
| message_body = email_msg.get_payload(decode=True).decode( # type: ignore[union-attr] | ||
| "utf-8", errors="replace" | ||
| ) | ||
| except UnicodeDecodeError: | ||
| message_body = email_msg.get_payload(decode=True).decode( # type: ignore[union-attr] | ||
| "latin-1" | ||
| "latin-1", errors="replace" | ||
| ) |
Copilot
AI
Oct 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback to latin-1 encoding with errors='replace' is redundant since the primary utf-8 decode now also uses errors='replace'. Consider removing this exception handling block or using a different fallback encoding strategy.
| try: | ||
| message_body = part.get_payload(decode=True).decode("utf-8") # type: ignore[union-attr] | ||
| message_body = part.get_payload(decode=True).decode( # type: ignore[union-attr] | ||
| "utf-8", errors="replace" | ||
| ) | ||
| except UnicodeDecodeError: | ||
| message_body = part.get_payload(decode=True).decode( # type: ignore[union-attr] | ||
| "latin-1" | ||
| "latin-1", errors="replace" | ||
| ) | ||
| break | ||
| else: | ||
| try: | ||
| message_body = email_msg.get_payload(decode=True).decode("utf-8") # type: ignore[union-attr] | ||
| message_body = email_msg.get_payload(decode=True).decode( # type: ignore[union-attr] | ||
| "utf-8", errors="replace" | ||
| ) | ||
| except UnicodeDecodeError: | ||
| message_body = email_msg.get_payload(decode=True).decode( # type: ignore[union-attr] | ||
| "latin-1" | ||
| "latin-1", errors="replace" | ||
| ) |
Copilot
AI
Oct 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the multipart case, this fallback exception handling is now redundant since the primary utf-8 decode uses errors='replace'. Consider removing this block or implementing a different fallback strategy.
|
|
||
| subject = email_msg["Subject"] | ||
| sender = email_msg["From"] | ||
| subject = email_msg["Subject"] or "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: you can just use email_msg.get("Subject", "") here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree , this is robust and more pythonic, i will update it
…rror-in-GmailSearch-and-ensure-header-robustness-(Fixes-langchain-ai#1030)-
Description
This PR resolves a critical vulnerability in the GmailSearch tool's message parsing logic that caused the tool to crash when encountering real-world email data like encode-decode and fetching sender's
subject/from.Relevant issues
Fixes #1030
Type
🐛 Bug Fix
🧹 Refactoring
Changes(optional)
UnicodeDecodeError(FixesUnicodeDecodeErrorwhen decoding message body with non-UTF-8 encoding #1030)Added errors="replace" to the decoding call for non-multipart message bodies. This will gracefully replace the invalid bytes with a placeholder characters instead of raising a fatal
UnicodeDecodeErrorNoneTypeCrash (This is a new thing i found while I was testing this issue)The code fetched headers
(email_msg["Subject"])which returned None when the header was missing. A downstream function (either clean_email_body or later string processing) then attempted to call a string method (like .replace()) on that None object, causing an AttributeError: 'NoneType' object has no attribute 'replace'.solution: Added a check to safely convert any missing header value (None) to an empty string ("") upon retrieval.
Testing(optional)
To be added after maintainers' initial review .
Note(optional)