add workaround for CPython's incorrect handling of native utf-8 with 8-bit CTE#5
add workaround for CPython's incorrect handling of native utf-8 with 8-bit CTE#5bastidest wants to merge 1 commit intoplenaerts:mainfrom
Conversation
|
I was wondering when charset / conversion errors would pop up. Thanks for including the test case and not make me ask for them ;-) One thing on my mind: you suggest an argument for one specific CTE string, is_8bit_cte. Wouldn't it be more generalistic and in line with the other argument content_charset to have a cte or content_transfer_encoding argument which we test on in line 263? I don't know if we well get other similar issues, but this seems the better approach to me. What do you think? |
|
I think that depends on how much effort you are willing to put into handling the tail end of encoding errors. As the commit says, this is just a workaround for the incorrect CPython implementation, hence the quite specific Since I suspect that this will not be the last encoding error, in my opinion we should implement the |
I noticed that some of my emails, which can be correctly rendered in email clients (Thunderbird, Outlook), contain charset encoding errors when converted to a PDF with this tool. This pull request fixes this issue.
The emails with this error contain the following headers, followed by utf-8 encoded payload:
I added a test case for this.
This seems to be caused by incorrect handling of this case in the CPython implementation of
message.get_payload(decode=True).My expectation to this software would be to produce output similar to an email client's "export as PDF" function.
References