UnicodeDecodeError on headers with bad characters #7749
-
Checklist
Steps to reproduceThis happened to one of our users; he was accessing our APIs with a browser, and it was sending an HTTP I did't include a failing test as a pull request, but I did pull the code, wrote a test and verified that it fails; I'm not sending a pull-request because I find it weird to submit a pull-request with only one (failing) test and no actual solution; the test is the following:
We're using Expected behaviorRest framework should not throw a Actual behaviorRest framework throws a |
Beta Was this translation helpful? Give feedback.
Replies: 16 comments 5 replies
-
I'm not 100% sure about this on first pass. Feel free to stop me if I get any of this wrong, but here's my take... It's not possible to include a "zero width space" in an HTTP header, because HTTP header values are to be treated as iso-8859-1 encoded bytestrings, which doesn't include a zero wide space codepoint. Attempting to test the expected behavior with the test client gets awkward because we're suddenly in fuzzy land of being able to pass either unicode or bytes as header values to the test client, which isn't really correct (I assume conferment WSGI clients only pass through bytestrings) However, if you're seeing a UnicodeDecodeError caused by a client submitting an incorrectly formed Accept header, then that does sound like a bug. We ought to be gracefully handling them in all cases. Are you able to replicate the issue using curl or python requests? |
Beta Was this translation helpful? Give feedback.
-
I'll tend to forward this one to Django. Is it possible to have the flu traceback ? |
Beta Was this translation helpful? Give feedback.
-
@tomchristie here's a requests example to show the exception:
@xordoquy full traceback:
I was also unsure about where to submit this, actually. |
Beta Was this translation helpful? Give feedback.
-
@armisael thanks. With the traceback, I'm not sure either where this should be handled. need to think further about. |
Beta Was this translation helpful? Give feedback.
-
Arguable that django's multipart parser should prefer
Given iso-8859-1 being historically treated as the header encoding, it'd seem reasonable to use it as a more lenient behavior. It's a bit fuzzy, as...
But having Django use iso-8859-1 would be more tolerant there (cannot raise a |
Beta Was this translation helpful? Give feedback.
-
Similarly this behavior here... https://github.com/django/django/blob/master/django/http/request.py#L388-L392 Seems unnecessary. The request headers are being decoded as |
Beta Was this translation helpful? Give feedback.
-
This one is tangential, but might also be worth exploring... That Django uses
Which is perfectly reasonable, but isn't at all how it's being used in the In summary: I believe there may be 2 or 3 independent PRs/issues to work through with the Django team here.
|
Beta Was this translation helpful? Give feedback.
-
Alternate... Use Eg:
|
Beta Was this translation helpful? Give feedback.
-
This might be related. So _MediaType doesn't produce similar result when converted to string as given input was. This is because >>> str(_MediaType('test/test; foobar=2; cat=dog'))
"test/test; cat=b'dog'; foobar=b'2'" Not sure if This in init seems to fix my case: self.params = dict(((k, v.decode(HTTP_HEADER_ENCODING)) for k, v in self.params.items())) |
Beta Was this translation helpful? Give feedback.
-
To add to my previous comment, my point is kind of fixed in default content negotiation class locally: https://github.com/tomchristie/django-rest-framework/blob/10a080d395b8f1fc5cef0778cd2a40c3d81dfe19/rest_framework/negotiation.py#L70 |
Beta Was this translation helpful? Give feedback.
-
I'm going to de-milestone this for now. We can reassess after v3.7 |
Beta Was this translation helpful? Give feedback.
-
Came here searching for a similar issue. I'm using rest framework to return JSON response. If I use a unicode character in
I get a similar error. Here's the traceback:
|
Beta Was this translation helpful? Give feedback.
-
Have the same problem as above on ancient Django 1.7.1 with DRF 3.2.4 running on Python 2.7.12. Unfortunately didn't test on newer versions. |
Beta Was this translation helpful? Give feedback.
-
Related : https://code.djangoproject.com/ticket/30737
Is it possible to override
@raphendyr Can you share your fix please |
Beta Was this translation helpful? Give feedback.
-
@bitcity There were a hack here, but that was removed later when the code was rewrote to use ASCII values. I guess if errors raise on |
Beta Was this translation helpful? Give feedback.
-
I just observed users sending such invalid headers to our service, causing 500. |
Beta Was this translation helpful? Give feedback.
This one is tangential, but might also be worth exploring...
That Django uses
settings.DEFAULT_CHARSET
forQueryDict
https://github.com/django/django/blob/master/django/http/request.py#L377 seems odd too - the server settings shouldn't be affecting how we parse incoming requests, nor does it match up with the documented meaning...Which is perfectly reasonable, but isn't at all how it's being used in the
QueryDict
context.In summary: I believe there may be 2 or 3 independent PRs/issues to work through with the Django team here.