-
Notifications
You must be signed in to change notification settings - Fork 478
Description
Summary
gog gmail read garbles emails whose MIME header declares charset="iso-2022-jp" (and likely other non-UTF-8 charsets).
Environment: gog v0.12.0, Linux arm64
Steps to Reproduce
- Receive an email with
Content-Type: text/plain; charset="iso-2022-jp". - Run
gog gmail read <threadId> -a <account>. - Body text is garbled (
\ufffdreplacement characters).
Root Cause
In internal/cmd/gmail_thread.go, decodeBodyCharset() checks the MIME charset and, if it's not UTF-8, re-decodes the bytes using that charset.
However, the Gmail API (format=full) always normalizes body.data to UTF-8 before base64url-encoding, while preserving the original MIME headers verbatim. So after decodeBase64URLBytes() the bytes are already valid UTF-8, but decodeBodyCharset re-interprets them as ISO-2022-JP via golang.org/x/text/encoding/ianaindex, producing garbage.
Proposed Fix
Add a utf8.Valid() guard before charset conversion:
import "unicode/utf8"
func decodeBodyCharset(data []byte, contentType string) []byte {
charsetLabel := charsetLabelFromContentType(contentType)
normalized := strings.ToLower(strings.ReplaceAll(strings.TrimSpace(charsetLabel), "_", "-"))
if charsetLabel == "" || normalized == "utf-8" || normalized == "utf8" {
return data
}
if utf8.Valid(data) {
return data
}
if decoded, ok := decodeWithCharsetLabel(data, charsetLabel); ok {
return decoded
}
return data
}This is safe because genuine ISO-2022-JP / Shift-JIS / EUC-JP raw bytes are almost never valid UTF-8, so the guard won't fire on non-API paths. For Gmail API JSON responses (the normal path), the bytes are always valid UTF-8 and the re-encoding is correctly skipped.
Affected Encodings
Confirmed: iso-2022-jp. Likely also: shift_jis, euc-jp, gb2312, gbk, euc-kr, windows-1252, iso-8859-1.
Workaround (Python script)
Use gog gmail read -j <threadId> (JSON mode) and decode the base64 body as UTF-8 manually:
#!/usr/bin/env python3
import json, sys, subprocess, base64
cmd = ['gog', 'gmail', 'read', '-j', sys.argv[1]]
if len(sys.argv) > 2:
cmd += ['-a', sys.argv[2]]
data = json.loads(subprocess.run(cmd, capture_output=True).stdout)
for msg in data.get('thread', data).get('messages', []):
hdrs = {h['name']: h['value'] for h in msg['payload'].get('headers', [])}
print(f"From: {hdrs.get('From','')} Date: {hdrs.get('Date','')}")
print(f"Subject: {hdrs.get('Subject','')}\n")
def find_text(p):
if 'text/plain' in p.get('mimeType',''):
return p
for sub in p.get('parts', []):
r = find_text(sub)
if r: return r
tp = find_text(msg['payload'])
if tp and tp.get('body',{}).get('data'):
print(base64.urlsafe_b64decode(tp['body']['data'] + '==').decode('utf-8', errors='replace'))
print('-' * 40)