Skip to content

Conversation

@kieran-farrell
Copy link
Contributor

@kieran-farrell kieran-farrell commented Oct 24, 2025

Currently, it is only possible to read the number of open file descriptors of a Java process via the UnixOperatingSystemMXBean which is only accessible via JMX enabled tools. To improve servicability, it would be benifical to be able to view this information from jcmd VM.info output or hs_err_pid crash logs. This could help diagnose resource exhaustion and troubleshoot "too many open files" errors in Java processes on Unix platforms.

This PR adds reporting the current open file descriptor count to both jcmd VM.info output or hs_err_pid crash logs by refactoring the native JNI logic from Java_com_sun_management_internal_OperatingSystemImpl_getOpenFileDescriptorCount0 of the UnixOperatingSystemMXBean into hotspot. Apple's API for retrieving open file descriptor count provides an array of the actual FDs to determine the count. To avoid using malloc to store this array in a potential signal handling context where stack space may be limited, the apple implementation instead allocates a fixed 32KB struct on the stack to store the open FDs and only reports the result if the struct is less than the max (1024 FDs). This should cover the majoirty of use cases.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8359706: Add file descriptor count and maximum limit to VM.info (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27971/head:pull/27971
$ git checkout pull/27971

Update a local copy of the PR:
$ git checkout pull/27971
$ git pull https://git.openjdk.org/jdk.git pull/27971/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27971

View PR using the GUI difftool:
$ git pr show -t 27971

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27971.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 24, 2025

👋 Welcome back kfarrell! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 24, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Oct 24, 2025

@kieran-farrell The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 24, 2025
@mlbridge
Copy link

mlbridge bot commented Oct 24, 2025

Webrevs

Copy link
Member

@jsikstro jsikstro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about the placement in the VMError::print_vm_info, but some comments so far.

Comment on lines 2507 to 2544
#ifdef __APPLE__
void os::print_open_file_descriptors(outputStream* st) {
const int MAX_SAFE_FDS = 1024;
struct proc_fdinfo fds[MAX_SAFE_FDS];
struct proc_bsdinfo bsdinfo;
int nfiles;
kern_return_t kres;
int res;
size_t fds_size;
pid_t my_pid;

kres = pid_for_task(mach_task_self(), &my_pid);
if (kres != KERN_SUCCESS) {
st->print_cr("OpenFileDescriptorCount = unknown");
return;
}

res = proc_pidinfo(my_pid, PROC_PIDLISTFDS, 0, fds, MAX_SAFE_FDS * sizeof(struct proc_fdinfo));
if (res <= 0) {
st->print_cr("OpenFileDescriptorCount = unknown");
return;
}

nfiles = res / sizeof(struct proc_fdinfo);
if (nfiles >= MAX_SAFE_FDS) {
st->print_cr("OpenFileDescriptorCount = unknown");
return;
}

st->print_cr("OpenFileDescriptorCount = %d", nfiles);
}
#endif // __APPLE__

#if defined(_ALLBSD_SOURCE) && !defined(__APPLE__)
long os::get_open_file_descriptor_count() {
st->print_cr("OpenFileDescriptorCount = unknown");
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is defined(_ALLBSD_SOURCE) needed? The return type for the non-apple variant is incorrect. Maybe we could do something like the following here instead:

void os::print_open_file_descriptors(outputStream* st) {
#ifdef __APPLE__
// do Apple stuff
#else
  st->print_cr("OpenFileDescriptorCount = unknown");
#endif
}

Comment on lines +1435 to +1440
// STEP("printing number of open file descriptors")
#ifndef _WIN32
os::print_open_file_descriptors(st);
st->cr();
#endif

Copy link
Member

@jsikstro jsikstro Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in VMError::print_vm_info which is only printed via the VM.info jcmd and the -XX:+PrintVMInfoAtExit flag. To add it to hs_err files, you should also add the code to VMError::report.

}
}
closedir(dirp);
st->print_cr("OpenFileDescriptorCount = %d", fds - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment on why we do -1 here, like you have for the AIX implementation.

@tstuefe
Copy link
Member

tstuefe commented Oct 27, 2025

Hmm. Not 100% convinced we should do it this way, at least not in hs-err files where speed is somewhat important. But let's see.

File descriptor counting scales linearly with the number of open files. Process can have in the 100000 range of open files in case of leaks, and those cases are the interesting ones. On slow boxes this can hurt.

The problem can be somewhat mitigated by making the counting dependent on kernel FDSize in case of Linux. If that is very large, maybe don't count. But that is Linux only. AIX file ops can be pretty slow. Plus, AIX does not show all file descriptors in /proc.

@jsikstro
Copy link
Member

To improve servicability, it would be benifical to be able to view this information from jcmd VM.info output or hs_err_pid crash logs. This could help diagnose resource exhaustion and troubleshoot "too many open files" errors in Java processes on Unix

I'm trying to understand what functionality you're after here. Is the specific number of open file descriptors important or is it enough to just report that there are "a lot" of file descriptors open? If so, @tstuefe's suggested approach of looking at /proc/self/status and FDSize and don't report the exact number if over some limit is a good compromise I think.

I couldn't find any good documentation for Mac's proc_pidinfo, but looking at the source code it's not an expensive operation at all (https://github.com/apple/darwin-xnu/blob/2ff845c2e033bd0ff64b5b6aa6063a1f8f65aa32/bsd/kern/proc_info.c#L486). I know very little of AIX to have an opinion, but if it's an issue maybe we can skip reporting it for now?

Copy link
Member

@jsikstro jsikstro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, you can test what's being printed in hs_err files easily by using the -XX:ErrorHandlerTest=14 flag, which crashes the VM in a "controlled way".

#ifdef __APPLE__
const int MAX_SAFE_FDS = 1024;
struct proc_fdinfo fds[MAX_SAFE_FDS];
struct proc_bsdinfo bsdinfo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

int nfiles;
kern_return_t kres;
int res;
size_t fds_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

@tstuefe
Copy link
Member

tstuefe commented Oct 29, 2025

Just to be clear, number of open files is a very useful metric. I would see no problem adding this to VM.info and hs-err file, but we need some failsafe in case iteration takes too long. Can be very simple as in "every n iterations check time for limit).

@kieran-farrell
Copy link
Contributor Author

Thanks @jsikstro and @tstuefe for the insightful comments. Just to note, the maximum possible FDs (NOFILE) is already printed in VM.info (and maybe should be added to the hs_err file), the main idea of this patch is to provide the current number of open file descriptors for diagnosing FD exhaustion (compared against the max FD) and FD leaks.

Would a timed loop for both AIX and Linux (exiting the scan if it takes more than a set threshold) work and what would be an acceptable threshold? Maybe we could report that the FD count above the current counted value if we reach the threshold or fall back to reporting the FDSize field from /proc/self/status on Linux as a rough estimate?

Also, the Mac API is undocumented - https://zameermanji.com/blog/2021/8/1/counting-open-file-descriptors-on-macos/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

3 participants