Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routing percent-encoded paths #2678

Open
1 task done
mladedav opened this issue Mar 25, 2024 · 3 comments · May be fixed by #2729
Open
1 task done

Routing percent-encoded paths #2678

mladedav opened this issue Mar 25, 2024 · 3 comments · May be fixed by #2729
Labels
A-axum C-musings Category: musings about a better world
Milestone

Comments

@mladedav
Copy link
Collaborator

mladedav commented Mar 25, 2024

  • I have looked for existing issues (including closed) about this

Feature Request

Motivation

URIs can be percent-encoded but that should not change what resource is accessed.

More specifically, RFC3986 says:

URIs that differ in the replacement of an unreserved character with
its corresponding percent-encoded US-ASCII octet are equivalent: they
identify the same resource.

There are also some characters that are sometimes encoded and sometimes not in the real world, e.g. hyper will probably have handling of {, }, and " in paths configurable.

Proposal

We can unescape unreserved characters inside the path, which can be decoded at any time. E.g. requests /axum and /%61xum can be interpreted as the same one. We can also normalize reserved characters by percent-encoding them, if they are in the path (with some exceptions that already have special meaning like % , ?, #, and so on; we can however encode {, ", , etc. to normalize these before routing).

We can also encode special characters when registering a route such as internally turning current .route("/100%",..) into .route("/100%25", ...) (%25 is percent encoding for %). Special case of this is that a user can currently write .route("/what?",...) which can never match any request.

With these two changes users can use special characters in route like r#"/"qoutes"/etc"# and match it with both encdoed and not encoded variants.

Alternatives

Using a middleware before axum::Router that decodes (or rather otherwise normalizes) the percent-encoding. Not all percent-decoded paths are valid paths in URI so it would most likely have to percent-decode unreserved characters and percent-encode reserved characters to normalize.

This can be combined with encoding special characters when they are to be registered to matchit (e.g. braces, percent, question mark,...) either by providing something like route_encoded or by having the user encode any needed characters themselves.

I mention this primarily in case we would want to support people who want to have control over percent decoding themselves, but I think this can be also just built inside axum itself.

@jplatte jplatte added C-musings Category: musings about a better world A-axum labels Mar 25, 2024
@jplatte jplatte added this to the 0.8 milestone Mar 25, 2024
@mladedav mladedav linked a pull request May 3, 2024 that will close this issue
@jplatte
Copy link
Member

jplatte commented Aug 9, 2024

I wonder if it would be possible to instead do partial percent decoding before routing - only translating unreserved (pointlessly) percent-encoded characters to their regular character.

Notably, this avoids any issues with % signs that are themselves percent-encoded, because these will only be decoded in the second and last decoding step, as part of path parameter deserialization.

The percent_encoding crate does not offer an API that allows partial decoding, but it should be fairly simple to cherry-pick the relevant things.

W.r.t. what we allow in .route() calls, I haven't thought about it much yet, but I think we can be relatively strict there. I definitely want it to be possible to have non-/ separators in the future. Maybe we should only support reserved characters for that? It would be a little sad to not allow separating by ., but not the end of the world... Would be useful to collect real use-cases.

@mladedav
Copy link
Collaborator Author

We could decode just the unreserved characters, but then I think we should also encode reserved characters (like ?) in route() and similar calls. Otherwise, matching /*path or /path? or /path% is impossible (unless the user encodes the special characters by hand). And the question of how to handle {, ", and } would still stand since these should be reserved but hyper allows them.

Regarding separators inside path segments, I assumed that this would be something that would come from matchit, I know there were some issues about matching based on extensions and such and I believe the change in its grammar was in part to be able to support things like that later. I'm not exactly sure how this is consideration here but it's been some time since I've seen this.

Or we can just decode the unreserved characters and ignore the rest for now of course.

@jplatte
Copy link
Member

jplatte commented Aug 19, 2024

Maybe just forbid *, ? and % verbatim in route calls? (though would need to check for percent-encoded % as a special case)

Not sure about the rest of your comment, I also don't have the entire context rn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-axum C-musings Category: musings about a better world
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants