Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/1 parsing #1156

Open
annevk opened this issue Feb 1, 2021 · 4 comments
Open

HTTP/1 parsing #1156

annevk opened this issue Feb 1, 2021 · 4 comments
Labels
interop Implementations are not interoperable with each other topic: http

Comments

@annevk
Copy link
Member

annevk commented Feb 1, 2021

HTTP/1 was poorly tested for error handling and as a result implementations are not interoperable. This lack of interoperability reaches Fetch at times. Where HTTP/1 itself takes no stance or implementations cannot follow it, it might make sense to document it in Fetch or at least document it in this issue.

cc @whatwg/http

(Previously there was https://wiki.whatwg.org/wiki/HTTP.)

@annevk annevk added interop Implementations are not interoperable with each other topic: http labels Feb 1, 2021
@MattMenke2
Copy link
Contributor

Unfortunately, removing old behavior here is now pretty resource intensive, in terms of launch process in Chrome:

  • Add code to gather data, wait months for data.
  • Get feedback from other browsers [I don't think I've ever been in contact with a single Safari person]
  • Go through intent process (Tag review can probably be skipped, at least)
  • Usher through release process

As I'm not longer on the network team, I have limited time to work on this sort of stuff. That having been said, I certainly support adding tests and working towards both tighter parsing and unifying behavior here, and I can probably find time to address what I view as most concerning (which, here, is status code handling/overflow).

@kinu @yutakahirano: Kinuko, Yutaka, would your team be interested in pursuing this?

annevk added a commit that referenced this issue Feb 1, 2021
@annevk annevk mentioned this issue Feb 1, 2021
3 tasks
annevk added a commit that referenced this issue Feb 1, 2021
@MattMenke2
Copy link
Contributor

Another fun one: Chrome allows 4 bytes of random slop between requests, before the start of headers. I believe this was copied from FireFox of IE's behavior at the time. I don't believe all browsers currently do this, so it may be easier to remove. My main concern is servers sending 0-length compressed gzip bodies with "Content-Length: 0", which aren't actually of length 0. In this case, there's slop on the end, which I suspect this logic deals with.

annevk added a commit that referenced this issue Mar 1, 2021
As the body concept is refactored for #604, XMLHttpRequest will need to use this algorithm for its events. We also want to require browsers to use this algorithm instead of the one defined by HTTP as part of #1156.

Tests: web-platform-tests/wpt#10548 & web-platform-tests/wpt#27837.
annevk added a commit that referenced this issue Mar 3, 2021
As the body concept is refactored for #604, XMLHttpRequest will need to use this algorithm for its events. We also want to require browsers to use this algorithm instead of the one defined by HTTP as part of #1156.

Tests: web-platform-tests/wpt#10548.
@JannisBush
Copy link

Several other differences I discovered:

  1. Space (or tab) between header-name and :, e.g., x-frame-options : DENY:
  • HTTP spec says that servers MUST reject such responses and proxies must remove such whitespace. It does not say what clients should do.
  • Firefox ignores only the invalid header
  • Chromium and Safari ignore the space, even the invalid header gets parsed and in the example here: XFO is applied
  1. NULL bytes in header names
  • Invalid according to the spec (token) but unclear what clients should do
  • Firefox and Safari only ignore the invalid header
  • Chromium treats the whole response as invalid (network error), similar to NULL in header values
  1. NULL in header values
  • Agreement seems to be network error
  • Tests do not test for all kind of inclusions: e.g., no tests for fetch where currently browser behavior diverges
  1. Lone LF (0x0A) in responses
  • Similar to lone CR
  • Currently behavior diverges for both browsers and within browsers (fetch (simple vs complex) vs document vs subresource, ...)
  • Behavior also depends on where the lone LF is encountered, adding a bunch of tests is probably helpful
  • Probably related: Acting on incomplete headers #472
  1. Lone CR (0x0D) in responses

@MattMenke2
Copy link
Contributor

A couple thoughts (note that I'm no longer on Chrome's network team, though still care about this area).

Several other differences I discovered:

  1. Space (or tab) between header-name and :, e.g., x-frame-options : DENY:
  • HTTP spec says that servers MUST reject such responses and proxies must remove such whitespace. It does not say what clients should do.

It would seem very weird if proxies remove it, but clients were expected to reject it. Then some sites would only work with a compliant browser connecting through a compliant proxy, but not with a compliant browser directly making the requests.

  1. NULL bytes in header names
  2. NULL in header values

Chrome currently rejects responses with NULLs anywhere in the headers (including in status text). Not claiming this is the correct behavior.

Other weirdness that may be worth thinking about, if you haven't already. Only know how Chrome handles these:

  • Chrome allows 4 bytes of slop before the "http" (including NULLs). I assume this is for buggy servers that send 0-byte compressed bodies when they should be sending nothing, but not positive of that. Think this was copied from another browser's behavior before Chrome was released.
  • Chrome allows truncated headers (Server closes socket after only sending partial headers - that is, no double terminating CRLF, or possibly no terminating CRLF at all). Chrome only does this for HTTP (not HTTPS), for security reasons, since truncating an http-only or whatnot from a Set-Cookie header is a security issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop Implementations are not interoperable with each other topic: http
Development

No branches or pull requests

4 participants
@annevk @MattMenke2 @JannisBush and others