Skip to content

add workaround for CPython's incorrect handling of native utf-8 with 8-bit CTE#5

Open
bastidest wants to merge 1 commit intoplenaerts:mainfrom
bastidest:fix/native-utf8
Open

add workaround for CPython's incorrect handling of native utf-8 with 8-bit CTE#5
bastidest wants to merge 1 commit intoplenaerts:mainfrom
bastidest:fix/native-utf8

Conversation

@bastidest
Copy link

I noticed that some of my emails, which can be correctly rendered in email clients (Thunderbird, Outlook), contain charset encoding errors when converted to a PDF with this tool. This pull request fixes this issue.

The emails with this error contain the following headers, followed by utf-8 encoded payload:

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit

I added a test case for this.

This seems to be caused by incorrect handling of this case in the CPython implementation of message.get_payload(decode=True).

My expectation to this software would be to produce output similar to an email client's "export as PDF" function.

References

@plenaerts
Copy link
Owner

plenaerts commented Feb 15, 2026

I was wondering when charset / conversion errors would pop up.

Thanks for including the test case and not make me ask for them ;-)

One thing on my mind: you suggest an argument for one specific CTE string, is_8bit_cte. Wouldn't it be more generalistic and in line with the other argument content_charset to have a cte or content_transfer_encoding argument which we test on in line 263? I don't know if we well get other similar issues, but this seems the better approach to me.

What do you think?

@bastidest
Copy link
Author

I think that depends on how much effort you are willing to put into handling the tail end of encoding errors. As the commit says, this is just a workaround for the incorrect CPython implementation, hence the quite specific is_8bit_cte flag.

Since I suspect that this will not be the last encoding error, in my opinion we should implement the get_payload method ourself (based on the existing CPython implementation) and handle the encoding quirks directly there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments