Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate identical multipart/form-data request payloads #938

Open
BHSPitMonkey opened this issue May 21, 2020 · 4 comments
Open

Deduplicate identical multipart/form-data request payloads #938

BHSPitMonkey opened this issue May 21, 2020 · 4 comments
Assignees

Comments

@BHSPitMonkey
Copy link

Is your feature request related to a problem? Please describe.

Currently, if you start a capture and make the following two requests:

curl localhost -F foo=bar
curl localhost -F foo=bar

... Hoverfly will not recognize these as the same request, and it will record two request/response pairs in the simulation output.

This is because multipart/form-data POST request bodies separate the form fields using a user-defined delimiter (boundary), which is typically random. MDN has a nice concise description of how these requests are structured (with examples) here.

For the example above, the simulation shows the request bodies for these two requests as:

"body": [
	{
		"matcher": "exact",
		"value": "--------------------------9c2c1f1950ea2fff\r\nContent-Disposition: form-data; name=\"foo\"\r\n\r\nbar\r\n--------------------------9c2c1f1950ea2fff--\r\n"
	}
]
"body": [
	{
		"matcher": "exact",
		"value": "--------------------------400771b7006844db\r\nContent-Disposition: form-data; name=\"foo\"\r\n\r\nbar\r\n--------------------------400771b7006844db--\r\n"
	}
]

This means that if I run the same suite of requests multiple times, the simulation data set will grow in size with redundant copies instead of skipping the ones it's already seen.

Describe the solution you'd like

Ideally Hoverfly would be aware of multipart/form-data request bodies, and provide a mechanism that would allow these two requests to be considered identical (or use this behavior by default).

Describe alternatives you've considered

I suspect that these boundaries could be removed using a custom Middleware script, but I would prefer not to have to bring in a second library to parse HTTP in a spec-compliant way (or to rely on a home-grown parsing strategy).

Additional context

Brief mention of this problem on Gitter: Permalink

@kapishmalik
Copy link
Collaborator

@tommysitu you can assign this to me. I would like to work on this issue.

@kapishmalik
Copy link
Collaborator

kapishmalik commented Jan 25, 2023

@tommysitu I looked at this issue and read about multipart/form-data. This form of encoding is basically used for sending files. We can send non-file or text fields as well. I feel we should not do a comparison of the files at the proxy layer primarily due to 2 reasons 1) security concerns (storing user files on the proxy layer and then using it for comparison) 2) abuse (one can abuse the hoverfly proxy server) by sending huge files multiple times and reducing the space. If hoverfly is hosted by a user on its own server then he may not abuse or won't care about security but the above points are valid for hoverfly cloud if they are using the same version.

I personally feel that we should limit this feature to non-file or text fields encoded with multipart/form-data.

What do you suggest on the same?

@kapishmalik
Copy link
Collaborator

@tommysitu could you please help me with the above concern? ^^

@tommysitu
Copy link
Member

I think hoverfly should be used for matching structured data only, and multi-part form data is usually for uploading files, probably not worth supporting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants