fix(community): Handle `UnicodeDecodeError` in GmailSearch and ensure header robustness #1226

RONAK-AI647 · 2025-10-02T18:07:23Z

Description

This PR resolves a critical vulnerability in the GmailSearch tool's message parsing logic that caused the tool to crash when encountering real-world email data like encode-decode and fetching sender's subject/from.

Relevant issues

Fixes #1030

Type

🐛 Bug Fix
🧹 Refactoring

Changes(optional)

Fix 1 : UnicodeDecodeError (Fixes UnicodeDecodeError when decoding message body with non-UTF-8 encoding #1030)

Added errors="replace" to the decoding call for non-multipart message bodies. This will gracefully replace the invalid bytes with a placeholder characters instead of raising a fatal UnicodeDecodeError

Fix 2: Header NoneType Crash (This is a new thing i found while I was testing this issue)

The code fetched headers (email_msg["Subject"]) which returned None when the header was missing. A downstream function (either clean_email_body or later string processing) then attempted to call a string method (like .replace()) on that None object, causing an AttributeError: 'NoneType' object has no attribute 'replace'.

solution: Added a check to safely convert any missing header value (None) to an empty string ("") upon retrieval.

Testing(optional)

To be added after maintainers' initial review .

Note(optional)

PR only touches code within the community package.
Changes are backwards compatible.

Copilot

Pull Request Overview

This PR fixes critical error handling in the GmailSearch tool's message parsing logic to prevent crashes when processing real-world email data with encoding issues or missing headers.

Adds graceful Unicode decoding error handling with errors="replace" parameter
Ensures header safety by converting None values to empty strings
Maintains backward compatibility while improving robustness

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-06T01:09:00Z

libs/community/langchain_google_community/gmail/search.py

                        try:
-                            message_body = part.get_payload(decode=True).decode("utf-8")  # type: ignore[union-attr]
+                            message_body = part.get_payload(decode=True).decode(  # type: ignore[union-attr]
+                                "utf-8", errors="replace"
+                            )
                        except UnicodeDecodeError:
                            message_body = part.get_payload(decode=True).decode(  # type: ignore[union-attr]
-                                "latin-1"
+                                "latin-1", errors="replace"
                            )
                        break
            else:
                try:
-                    message_body = email_msg.get_payload(decode=True).decode("utf-8")  # type: ignore[union-attr]
+                    message_body = email_msg.get_payload(decode=True).decode(  # type: ignore[union-attr]
+                        "utf-8", errors="replace"
+                    )
                except UnicodeDecodeError:
                    message_body = email_msg.get_payload(decode=True).decode(  # type: ignore[union-attr]
-                        "latin-1"
+                        "latin-1", errors="replace"
                    )


The fallback to latin-1 encoding with errors='replace' is redundant since the primary utf-8 decode now also uses errors='replace'. Consider removing this exception handling block or using a different fallback encoding strategy.

Copilot · 2025-10-06T01:09:00Z

libs/community/langchain_google_community/gmail/search.py

                        try:
-                            message_body = part.get_payload(decode=True).decode("utf-8")  # type: ignore[union-attr]
+                            message_body = part.get_payload(decode=True).decode(  # type: ignore[union-attr]
+                                "utf-8", errors="replace"
+                            )
                        except UnicodeDecodeError:
                            message_body = part.get_payload(decode=True).decode(  # type: ignore[union-attr]
-                                "latin-1"
+                                "latin-1", errors="replace"
                            )
                        break
            else:
                try:
-                    message_body = email_msg.get_payload(decode=True).decode("utf-8")  # type: ignore[union-attr]
+                    message_body = email_msg.get_payload(decode=True).decode(  # type: ignore[union-attr]
+                        "utf-8", errors="replace"
+                    )
                except UnicodeDecodeError:
                    message_body = email_msg.get_payload(decode=True).decode(  # type: ignore[union-attr]
-                        "latin-1"
+                        "latin-1", errors="replace"
                    )


Similar to the multipart case, this fallback exception handling is now redundant since the primary utf-8 decode uses errors='replace'. Consider removing this block or implementing a different fallback strategy.

lkuligin · 2025-10-09T12:07:11Z

libs/community/langchain_google_community/gmail/search.py


-            subject = email_msg["Subject"]
-            sender = email_msg["From"]
+            subject = email_msg["Subject"] or ""


nits: you can just use email_msg.get("Subject", "") here

i agree , this is robust and more pythonic, i will update it

…rror-in-GmailSearch-and-ensure-header-robustness-(Fixes-langchain-ai#1030)-

RONAK-AI647 added 3 commits October 2, 2025 23:18

update search.py

f16776f

mypy error fix for .decode

c53d6b9

Update search.py

05f2c22

mdrxy requested a review from Copilot October 6, 2025 01:08

Copilot AI reviewed Oct 6, 2025

View reviewed changes

lkuligin reviewed Oct 9, 2025

View reviewed changes

lkuligin approved these changes Oct 9, 2025

View reviewed changes

RONAK-AI647 added 2 commits October 9, 2025 22:25

update serach.py to a better pythonic approach

654cfdc

Merge branch 'main' into RONAK-A1647/fix(gmail)-Handle-UnicodeDecodeE…

8d8a26a

…rror-in-GmailSearch-and-ensure-header-robustness-(Fixes-langchain-ai#1030)-

lkuligin merged commit 33f015b into langchain-ai:main Oct 10, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(community): Handle `UnicodeDecodeError` in GmailSearch and ensure header robustness #1226

fix(community): Handle `UnicodeDecodeError` in GmailSearch and ensure header robustness #1226

Uh oh!

RONAK-AI647 commented Oct 2, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 6, 2025

Uh oh!

Copilot AI Oct 6, 2025

Uh oh!

lkuligin Oct 9, 2025

Uh oh!

RONAK-AI647 Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(community): Handle UnicodeDecodeError in GmailSearch and ensure header robustness #1226

fix(community): Handle UnicodeDecodeError in GmailSearch and ensure header robustness #1226

Uh oh!

Conversation

RONAK-AI647 commented Oct 2, 2025

Description

Relevant issues

Type

Changes(optional)

Testing(optional)

Note(optional)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

lkuligin Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

RONAK-AI647 Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(community): Handle `UnicodeDecodeError` in GmailSearch and ensure header robustness #1226

fix(community): Handle `UnicodeDecodeError` in GmailSearch and ensure header robustness #1226