Skip to content

8359808: JavaRuntimeURLConnection should only connect to non-directory resources #25871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

david-beaumont
Copy link
Contributor

@david-beaumont david-beaumont commented Jun 18, 2025

Simplifying JavaRuntimeURLConnection to avoid accidentally returning non-resource data to users.

This change has the following distinct parts:

  1. Refactor code to use Node instead of directly accessing low level ImageLocation type.
  2. Remove unnecessary use of "Resource" interface and related URL generation code (completely unreachable).
  3. Adding comments explaining why there's a non-obvious distinction in how module and resource names are treated with respect to URL percent encoding.
  4. Small constructor logic simplification (module name cannot be null anymore)
  5. Small simplification around 'READER' use, since it is impossible for that to ever be null (other users of ImageReaderFactory already assume it could never be null, and code path analysis agrees).
  6. Adding tests for the non-resource cases.
  7. Adding extra test data to check the behaviour with respect to things like percent escaping (previously untested).
  8. Adding TODO comments for things I could do in this PR or later (reviewer opinions welcome).

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8359808: JavaRuntimeURLConnection should only connect to non-directory resources (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25871/head:pull/25871
$ git checkout pull/25871

Update a local copy of the PR:
$ git checkout pull/25871
$ git pull https://git.openjdk.org/jdk.git pull/25871/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25871

View PR using the GUI difftool:
$ git pr show -t 25871

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25871.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 18, 2025

👋 Welcome back david-beaumont! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 18, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 18, 2025
@openjdk
Copy link

openjdk bot commented Jun 18, 2025

@david-beaumont The following label will be automatically applied to this pull request:

  • net

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Jun 18, 2025

Webrevs

Copy link
Contributor Author

@david-beaumont david-beaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preloading a few explanatory comments.

@@ -45,99 +44,79 @@
*/
public class JavaRuntimeURLConnection extends URLConnection {

// ImageReader to access resources in jimage
private static final ImageReader reader = ImageReaderFactory.getImageReader();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examination of code and other users shows that this could never be null. Other users also name it "READER".

private final String name;

// the Resource when connected
private volatile Resource resource;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Resource API was never needed here and adds uncalled methods such as getURL() which are then implemented with code that could never be invoked.

throw new MalformedURLException(url + " missing path or /");
if (path.length() == 1) {
this.module = null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason for the module==null case, since the only thing caring that this might be null just converts null to the empty string anyway.

throw new IOException(module + "/" + name + " not found");
}
this.resource = node;
super.connected = true;
Copy link
Contributor Author

@david-beaumont david-beaumont Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I don't need to use super here, but it documents the fact that this is not a field of this subclass, without readers having to go check. Mutable protected fields are weird and error prone, so I felt calling it out a bit was worth it. Happy to replace with a comment if people feel that's better though.

}

@Override
public long getContentLengthLong() {
try {
connect();
return resource.getContentLength();
return getResourceNode().size();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having getResourceNode() return the (lazily fetched) node avoids the reader needing to know/reason about how "connect()" has the side-effect of making the "resource" field non-null.

@david-beaumont david-beaumont changed the title 8356645: JavaRuntimeURLConnection should only connect to non-directory resources 8359808: JavaRuntimeURLConnection should only connect to non-directory resources Jun 18, 2025
// URL's path part, and there's no requirement for there to be distinct rules
// about percent encoding. However, we choose to treat the module name is if
// it were a URL authority (since Java package/module names are historically
// strongly associated with internet domain names).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"However, we choose to treat the module name is if URL authority" - I don't think we should be put in the comment as it is confusing to suggest the URL authority component. The URL scheme documented in JEP 220 always put the module and resource name in the URL path component. It's just historical that it allowed for encoding of the resource name, wasn't given enough thought in JDK 9 when this url stream handler was added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to accept rewording here. I do want to pull out that there is a conceptual reason for treating module names like domain authorities though, or just make the code treat the whole path the same.
Having unexplained weirdness like this just ends up being a drain on future maintainers otherwise.

Copy link
Contributor

@AlanBateman AlanBateman Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to accept rewording here. I do want to pull out that there is a conceptual reason for treating module names like domain authorities though, or just make the code treat the whole path the same. Having unexplained weirdness like this just ends up being a drain on future maintainers otherwise.

I don't disagree on the weirdness but I don't want to mislead readers that this has anything to do with the URL authority component (the jrt scheme does not have this component). However to explain the weirdness requires digging into history, probably the jake repo where the changes for JEP 220 were accumulated before the JEP was integrated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to permit module names to use percent encoding, since package names (and thus module names) can include non-ASCII characters, and doing so would simplify the code, but it also allows a level of obfuscateability to the URLs which we might not desire (unless we reject any over-encoding of ASCII).

Copy link
Contributor Author

@david-beaumont david-beaumont Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment and some TODOs. I'll likely raise a separate PR for it since I suspect we need to end up:

  1. allowing percent encoding everywhere (to support non-ASCII Unicode).
  2. disallowing over-encoding (e.g. of '/' or '$', or just any ASCII) to prevent obfuscation.
    But as it's a behaviour change, I'd rather separate it out.

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes looks mostly okay, just minor comments on comments, TODOs and naming.

// URL's path part, and there's no requirement for there to be distinct rules
// about percent encoding, and it is likely that any differences between how
// module names and resource names are treated is unintentional. The rules
// about percent encoding may well be tightened up in the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to drop this paragraph from the comment, it just begs too many questions.


// the module and resource name in the URL
// ImageReader to access resources in jimage (never null).
private static final ImageReader READER = ImageReaderFactory.getImageReader();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it simpler if you drop "(never null)".

String path = url.getPath();
if (path.isEmpty() || path.charAt(0) != '/')
// TODO: Allow percent encoding in module names.
// TODO: Consider rejecting URLs with fragments, queries or authority.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not include all the TODO messages.

// ImageReader to access resources in jimage (never null).
private static final ImageReader READER = ImageReaderFactory.getImageReader();

// The module and resource name in the URL ("jrt:/<module-name>/<resource-name>").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In JEP 220 we use jrt:/[$MODULE[/$PATH]] to make it clear that the resource is optionally.

if (pathSep == -1) {
// No trailing resource path. This can never "connect" or return a
// resource, but might be useful as a representation to pass around.
// The module name *can* be empty here (e.g. "jrt:/") but not null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the URL scheme from JEP 220 goes into the first comment then it will be clear that the resource is optionally. This will allow the comment here to be reduced.

// the Resource when connected
private volatile Resource resource;
// The resource node (when connected).
private volatile Node resource;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to rename to node, resourceNode, or imageNode here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net [email protected] rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

2 participants