Skip to content

Conversation

jaikiran
Copy link
Member

@jaikiran jaikiran commented Oct 4, 2025

Can I please get a review of this change which proposes to fix a file descriptor leak in sun.net.www.protocol.file.FileURLConnection?

The bug affects all JDK versions since Java 8 through mainline JDK. Furthermore, the issue isn't in any way related to "jar:" URLs or the JAR resources caching involved in the JarURLConnection.

sun.net.www.protocol.file.FileURLConnection is a java.net.URLConnection. For this issue, the APIs on URLConnection that are of interest are connect(), header related APIs (the getHeaderXXXX(), getLastModified() and similar APIs) and getInputStream().

The existing implementation in FileURLConnection in its connect() implementation does readability checks on the underlying File instance. If the File is a directory then the readability check is done by verifying that a File.list() call does not return null. On the other hand, if the File is not a directory, then connect() constructs a temporary java.io.FileInputStream for the File and lets the FileInputStream's constructor implementation do the necessary readability checks. In either case, if the readability checks fail, then an IOException is thrown from this method and the FileURLConnection stays unconnected.

One important detail of the implementation in FileURLConnection.connect() is that the FileInputStream that it creates for the regular-file readability check, it keeps it open when it returns from connect(). connect() itself doesn't return any value, so this FileInputStream that was created for readability check remains open without the application having access to it, unless the application calls (File)URLConnection.getInputStream(). The implementation of FileURLConnection.getInputStream() returns this previously constructed InputStream if connect() had previously suceeded with its readability checks. The application, as usual, is then expected to close() that InputStream that was returned from URLConnection.getInputStream(). This is the "normal" case, where the application at some point calls the URLConnection.getInputStream() and closes that stream.

As noted previously, URLConnection has a few other APIs, for example the APIs that provide header values. An example:

Path regularFile = Path.of("hello.txt");
URLConnection conn = regularFile.toUri().toURL().openConnection();
// either of the following header related APIs
long val = conn.getLastModified();
String val = conn.getHeaderField("foobar");
// ... some other header related APIs

The issue/leak arises in these cases. Application code constructs a URLConnection and invokes these trivial APIs and neither does an explicit connect() nor calls getInputStream(), because it doesn't have to. In code like this, the implementation of FileURLConnection internally in the implementation of getLastModified() and other similar APIs, does a connect(). As explained above, the implementation of connect() then creates and leaves around a FileInputStream after doing readability checks. In this above application code, the InputStream never reaches the application code nor does it closed, thus leading to the leak.

The specification of URLConnection states:

The actual connection to the remote object is made, using the {@link #connect() connect} method.
The remote object becomes available. The header fields and the contents of the remote object can be accessed.
...
Invoking the {@code close()} methods on the {@code InputStream} or {@code OutputStream} of an
{@code URLConnection} after a request may free network resources associated with this
instance, unless particular protocol specifications specify different behaviours
for it.

It can be argued that it seems to expect that the application call URLConnection.getInputStream() and then close() it to close the resources, even in the case where it isn't interested in the InputStream or its contents. But, I think, it's a bit odd to be expecting applications to be doing that, if at all that's what is expected. So the change in this PR proposes to address the leak that can arise when the application never calls URLConnection.getInputStream().

The change in this PR, keeps the readability checks in the connect() method. However, the FileInputStream that is opened to do the readability check is closed before returning from the connect() method. That way, there is no InputStream lying around. When/if the application calls URLConnection.getInputStream() then the implementation in that method has been updated to construct and return the FileInputStream. This does mean that there's a change in behaviour for the following case (I consider it a corner case):

  • URLConnection is constructed for a "foo.txt" file which is readable
  • URLConnection.getLastModified() (or similar header related APIs) is invoked on that URLConnection. Internally that results in a connect() and file readability check and that succeeds.
  • Behind the scenes on the filesystem (or from some other part of the application), "foo.txt" file's permissions are modified to remove "read" permission.
  • URLConnection.getInputStream() is invoked on that previously constructed (and connected) URLConnection.

Before the changes in this PR, the above scenario would result in the URLConnection.getInputStream() returning normally a InputStream instance (that was cached during the connect() call). The application would then be able to normally/successfully read content from that InputStream. With the proposed change in this PR, the URLConnection.getInputStream() call will throw a FileNotFoundException, when it attempts to construct the FileInputStream and noticing that the file doesn't have "read" permission. URLConnection.getInputStream() is specified to throw an IOException, so the exception (and its type) itself isn't a problem. It however is still a change in behaviour. Having said that, my opinion is that changing file permissions in between API calls on a URLConnection is a corner case and I think it might be OK if there is a change in behaviour here (along with a release note maybe), especially since I think the proposed change prevents a leak in a "normal" usage of the URLConnection APIs in the application code (and also helps the internal implementation in FileURLConnection to have a more clearly demarcated lifetime for the InputStream instance).

Additional care has been taken to make sure that no other change in behaviour is introduced. 2 new jtreg tests have been introduced. The FileURLConnStreamLeakTest is a regression test which reproduces the leak and verifies the fix - several test methods in this test will fail on Windows without the fix and will pass with the fix. The GetInputStreamTest is a new test which has been introduced to help improve the coverage of testing for the InputStream returned from the FileURLConnection.

tier1, tier2 and tier3 tests pass with this change.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8367561: Getting some "header" property from a file:// URL causes a file descriptor leak (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27633/head:pull/27633
$ git checkout pull/27633

Update a local copy of the PR:
$ git checkout pull/27633
$ git pull https://git.openjdk.org/jdk.git pull/27633/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27633

View PR using the GUI difftool:
$ git pr show -t 27633

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27633.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 4, 2025

👋 Welcome back jpai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 4, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Oct 4, 2025

@jaikiran The following label will be automatically applied to this pull request:

  • net

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 4, 2025
@mlbridge
Copy link

mlbridge bot commented Oct 4, 2025

Webrevs

@Override
public String getHeaderField(String name) {
initializeHeaders();
return super.getHeaderField(name);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several header related method implementations in this class have been updated to avoid the super.XXXX() calls. The super.XXX() implementation was calling getInputStream() and ignoring/not using the returned InputStream. That call to getInputStream() was a way to initiate a readability check and if the File wasn't readable then it would return a different result compared to when it was readable. The use of getInputStream() has been replaced by a call to isReadable() which does the same checks without leaving around a InputStream instance. Apart from that change, the rest of the code that resided in the super.XXX() implementation has been literally copy/pasted in these methods to avoid any kind of behavioural change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net [email protected] rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

1 participant