- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6.1k
8359706: Add file descriptor count and maximum limit to VM.info #27971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
# Conflicts: # src/hotspot/os/windows/os_windows.cpp
| 👋 Welcome back kfarrell! A progress list of the required criteria for merging this PR into  | 
| ❗ This change is not yet ready to be integrated. | 
| @kieran-farrell The following label will be automatically applied to this pull request: 
 When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. | 
| Webrevs
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure about the placement in the VMError::print_vm_info, but some comments so far.
        
          
                src/hotspot/os/bsd/os_bsd.cpp
              
                Outdated
          
        
      | #ifdef __APPLE__ | ||
| void os::print_open_file_descriptors(outputStream* st) { | ||
| const int MAX_SAFE_FDS = 1024; | ||
| struct proc_fdinfo fds[MAX_SAFE_FDS]; | ||
| struct proc_bsdinfo bsdinfo; | ||
| int nfiles; | ||
| kern_return_t kres; | ||
| int res; | ||
| size_t fds_size; | ||
| pid_t my_pid; | ||
|  | ||
| kres = pid_for_task(mach_task_self(), &my_pid); | ||
| if (kres != KERN_SUCCESS) { | ||
| st->print_cr("OpenFileDescriptorCount = unknown"); | ||
| return; | ||
| } | ||
|  | ||
| res = proc_pidinfo(my_pid, PROC_PIDLISTFDS, 0, fds, MAX_SAFE_FDS * sizeof(struct proc_fdinfo)); | ||
| if (res <= 0) { | ||
| st->print_cr("OpenFileDescriptorCount = unknown"); | ||
| return; | ||
| } | ||
|  | ||
| nfiles = res / sizeof(struct proc_fdinfo); | ||
| if (nfiles >= MAX_SAFE_FDS) { | ||
| st->print_cr("OpenFileDescriptorCount = unknown"); | ||
| return; | ||
| } | ||
|  | ||
| st->print_cr("OpenFileDescriptorCount = %d", nfiles); | ||
| } | ||
| #endif // __APPLE__ | ||
|  | ||
| #if defined(_ALLBSD_SOURCE) && !defined(__APPLE__) | ||
| long os::get_open_file_descriptor_count() { | ||
| st->print_cr("OpenFileDescriptorCount = unknown"); | ||
| } | ||
| #endif | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is defined(_ALLBSD_SOURCE) needed? The return type for the non-apple variant is incorrect. Maybe we could do something like the following here instead:
void os::print_open_file_descriptors(outputStream* st) {
#ifdef __APPLE__
// do Apple stuff
#else
  st->print_cr("OpenFileDescriptorCount = unknown");
#endif
}
| // STEP("printing number of open file descriptors") | ||
| #ifndef _WIN32 | ||
| os::print_open_file_descriptors(st); | ||
| st->cr(); | ||
| #endif | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in VMError::print_vm_info which is only printed via the VM.info jcmd and the -XX:+PrintVMInfoAtExit flag. To add it to hs_err files, you should also add the code to VMError::report.
        
          
                src/hotspot/os/linux/os_linux.cpp
              
                Outdated
          
        
      | } | ||
| } | ||
| closedir(dirp); | ||
| st->print_cr("OpenFileDescriptorCount = %d", fds - 1); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment on why we do -1 here, like you have for the AIX implementation.
| Hmm. Not 100% convinced we should do it this way, at least not in hs-err files where speed is somewhat important. But let's see. File descriptor counting scales linearly with the number of open files. Process can have in the 100000 range of open files in case of leaks, and those cases are the interesting ones. On slow boxes this can hurt. The problem can be somewhat mitigated by making the counting dependent on kernel FDSize in case of Linux. If that is very large, maybe don't count. But that is Linux only. AIX file ops can be pretty slow. Plus, AIX does not show all file descriptors in /proc. | 
| 
 I'm trying to understand what functionality you're after here. Is the specific number of open file descriptors important or is it enough to just report that there are "a lot" of file descriptors open? If so, @tstuefe's suggested approach of looking at  I couldn't find any good documentation for Mac's  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, you can test what's being printed in hs_err files easily by using the -XX:ErrorHandlerTest=14 flag, which crashes the VM in a "controlled way".
        
          
                src/hotspot/os/bsd/os_bsd.cpp
              
                Outdated
          
        
      | #ifdef __APPLE__ | ||
| const int MAX_SAFE_FDS = 1024; | ||
| struct proc_fdinfo fds[MAX_SAFE_FDS]; | ||
| struct proc_bsdinfo bsdinfo; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused.
        
          
                src/hotspot/os/bsd/os_bsd.cpp
              
                Outdated
          
        
      | int nfiles; | ||
| kern_return_t kres; | ||
| int res; | ||
| size_t fds_size; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused.
| Just to be clear, number of open files is a very useful metric. I would see no problem adding this to VM.info and hs-err file, but we need some failsafe in case iteration takes too long. Can be very simple as in "every n iterations check time for limit). | 
| Thanks @jsikstro and @tstuefe for the insightful comments. Just to note, the maximum possible FDs (NOFILE) is already printed in VM.info (and maybe should be added to the hs_err file), the main idea of this patch is to provide the current number of open file descriptors for diagnosing FD exhaustion (compared against the max FD) and FD leaks. Would a timed loop for both AIX and Linux (exiting the scan if it takes more than a set threshold) work and what would be an acceptable threshold? Maybe we could report that the FD count above the current counted value if we reach the threshold or fall back to reporting the FDSize field from /proc/self/status on Linux as a rough estimate? Also, the Mac API is undocumented - https://zameermanji.com/blog/2021/8/1/counting-open-file-descriptors-on-macos/. | 
Currently, it is only possible to read the number of open file descriptors of a Java process via the
UnixOperatingSystemMXBeanwhich is only accessible via JMX enabled tools. To improve servicability, it would be benifical to be able to view this information from jcmd VM.info output or hs_err_pid crash logs. This could help diagnose resource exhaustion and troubleshoot "too many open files" errors in Java processes on Unix platforms.This PR adds reporting the current open file descriptor count to both jcmd VM.info output or hs_err_pid crash logs by refactoring the native JNI logic from
Java_com_sun_management_internal_OperatingSystemImpl_getOpenFileDescriptorCount0of theUnixOperatingSystemMXBeaninto hotspot. Apple's API for retrieving open file descriptor count provides an array of the actual FDs to determine the count. To avoid usingmallocto store this array in a potential signal handling context where stack space may be limited, the apple implementation instead allocates a fixed 32KB struct on the stack to store the open FDs and only reports the result if the struct is less than the max (1024 FDs). This should cover the majoirty of use cases.Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27971/head:pull/27971$ git checkout pull/27971Update a local copy of the PR:
$ git checkout pull/27971$ git pull https://git.openjdk.org/jdk.git pull/27971/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27971View PR using the GUI difftool:
$ git pr show -t 27971Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27971.diff
Using Webrev
Link to Webrev Comment