Skip to content

[Linux] Fix random test failures of browser tests #1523 #1788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

HeikoKlare
Copy link
Contributor

@HeikoKlare HeikoKlare commented Feb 1, 2025

Browser tests randomly fail during Jenkins execution because of too many opened file descriptors. The reason seems to be that the build uses parallel execution and Maven plugins executed for other bundles may have opened and closed file descriptors in parallel, which are erroneously taken into account by the browser tests evaluating the number of file descriptors left open after a test execution.

This change excludes open file descriptors for Maven artifacts used in parallel by other Maven plugins by not considering file descriptors with their path containing ".m2" or "target/classes".

Fixes #1523

Example

In previous logs, you find lists of analyzed file descriptor deltas like this:

11:06:00  /home/jenkins/.m2/repository/jakarta/el/jakarta.el-api/3.0.3/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/jakarta/el/jakarta.el-api/3.0.3/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/jakarta/servlet/jakarta.servlet-api/4.0.4/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/jakarta/servlet/jakarta.servlet-api/4.0.4/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/javax/servlet/jsp/javax.servlet.jsp-api/2.3.3/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/javax/servlet/jsp/javax.servlet.jsp-api/2.3.3/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/bouncycastle/bcutil-jdk18on/1.80/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/bouncycastle/bcutil-jdk18on/1.80/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/ee8/jetty-ee8-apache-jsp/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/ee8/jetty-ee8-apache-jsp/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/ee8/jetty-ee8-nested/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/ee8/jetty-ee8-nested/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/ee8/jetty-ee8-security/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/ee8/jetty-ee8-security/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/ee8/jetty-ee8-servlet/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/ee8/jetty-ee8-servlet/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-http/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-http/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-io/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-io/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-security/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-security/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-server/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-server/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-session/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-session/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-util-ajax/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-util-ajax/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-util/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/jetty-util/12.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/toolchain/jetty-servlet-api/4.0.6/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/eclipse/jetty/toolchain/jetty-servlet-api/4.0.6/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/glassfish/jakarta.el/3.0.4/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/glassfish/jakarta.el/3.0.4/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/hamcrest/hamcrest/2.2/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/hamcrest/hamcrest/2.2/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/jsoup/jsoup/1.18.3/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/jsoup/jsoup/1.18.3/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/mortbay/jasper/apache-el/9.0.96/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/mortbay/jasper/apache-el/9.0.96/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/mortbay/jasper/apache-jsp/9.0.96/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/mortbay/jasper/apache-jsp/9.0.96/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/mortbay/jasper/apache-jsp/9.0.96/apache-jsp-9.0.96-sources.jar
11:06:00  	/home/jenkins/.m2/repository/org/ow2/sat4j/org.ow2.sat4j.core/2.3.6/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/ow2/sat4j/org.ow2.sat4j.core/2.3.6/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/ow2/sat4j/org.ow2.sat4j.pb/2.3.6/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/ow2/sat4j/org.ow2.sat4j.pb/2.3.6/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/slf4j/slf4j-api/2.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/slf4j/slf4j-api/2.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/slf4j/slf4j-simple/2.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/slf4j/slf4j-simple/2.0.16/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/tukaani/xz/1.10/_remote.repositories
11:06:00  	/home/jenkins/.m2/repository/org/tukaani/xz/1.10/_remote.repositories

Remark

The currently used validation approach of browser tests on Linux relies on some global system state (open file descriptors at system level). This makes the approach error-prone in general. Actually, it would require far more isolation of test execution than currently achieved. A further improvement would be to ensure that, at least during the Maven build, no other task is executed in parallel (such as the compilation of another bundle), as that is the reason for the open descriptors of files in .m2 directory but that can of course also open other file descriptors, such as in target folders of the bundles. That would not prevent completely independent tasks opening file descriptors concurrently, but since the analysis is very weak anyway (it allows a delta of 50 + test number more open file descriptors between the state before test execution and after a single test). That's also the reasons why it seems to be sufficient to exclude .m2, as that involves a high number of files (more than the 50 + test number allowed ones).

I am not even sure if the overall analysis is that useful in it's current implementation (in particular as it is so weak, basically allowing every test to leave behind another open file descriptor). So maybe it would even be necessary to revise or remove that analysis.

At least, this should point us to the cause of the frequent Jenkins build failures.

Copy link
Contributor

github-actions bot commented Feb 1, 2025

Test Results

   494 files  ±0     494 suites  ±0   11m 18s ⏱️ + 1m 40s
 4 333 tests ±0   4 320 ✅ ±0   13 💤 ±0  0 ❌ ±0 
16 574 runs  ±0  16 466 ✅ ±0  108 💤 ±0  0 ❌ ±0 

Results for commit f19242f. ± Comparison against base commit 4a201e2.

♻️ This comment has been updated with latest results.

@HeikoKlare HeikoKlare force-pushed the file-descriptors-browser-tests branch from 5a43b26 to 4a338c6 Compare February 1, 2025 12:00
@HeikoKlare
Copy link
Contributor Author

Further remark: I have integrated that kind of change in #1706 for several builds of that PR and the tests always succeeded on Jenkins.

@HeikoKlare HeikoKlare marked this pull request as ready for review February 1, 2025 12:15
Copy link
Contributor

@laeubi laeubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the analysis and good catch!

// by other Maven plugins executed in parallel build (such as parallel
// compilation of the swt.tools bundle etc.)
String resolvedPath = Files.isSymbolicLink(f) ? Files.readSymbolicLink(f).toString() : f.toString();
if (!resolvedPath.contains(".m2")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we extract this to a method (isValidPath) maybe? Then it could be enhanced later on.
We most likely want to exclude java files or jar in general, so thinking further maybe even better have a white list to what kind of descriptors do we expects a browser can leave open at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I have extended the PR with a minor refactoring of the file descriptor processing and extracted the filtering into a separate method.

Regarding an extensive whitelist: generally, having such a list with kinds of descriptors we should not consider sounds reasonable to me. However, I am not sure how that list has to look like to not exclude files that would actually be of interest. Since the current implementation of the file descriptor analysis is rather weak anyway, I am not even sure how much sense it makes to target a "precise" whitelist. The analysis currently contains workarounds for surefire leaving behing open file descriptors (of which I am not sure whether that is still true) and having some random limit of open file descriptors (50) anway. See also #108 for this.

So I would summarize as follows:

  • The analysis is capable of catching high numbers of file descriptors being left open after a browser test was executed
  • The analysis is imprecise, possibly leading to false positives
  • We want to keeo the number of false positives as low as possible by reducing the number of unrelated file descriptors being taken into account

@HeikoKlare HeikoKlare force-pushed the file-descriptors-browser-tests branch 6 times, most recently from 47f8e1d to f7e9819 Compare February 3, 2025 18:53
@HeikoKlare
Copy link
Contributor Author

Just saw this in a recent build, so might be that this change will not catch all "open file descriptors" failures as there may be file descriptors of which we cannot decide if they are related to the test or not. But this change will hopefully at least significantly reduce the number of false "positives":

Delta to previous test: 151
19:24:00  	/proc/self/fd/727
19:24:00  	/proc/self/fd/728
19:24:00  	/proc/self/fd/729
19:24:00  	/proc/self/fd/741
19:24:00  	/proc/self/fd/743
19:24:00  	/proc/self/fd/744
19:24:00  	/proc/self/fd/745
19:24:00  	/proc/self/fd/746
19:24:00  	/proc/self/fd/747
19:24:00  	/proc/self/fd/748
19:24:00  	/proc/self/fd/749
19:24:00  	/proc/self/fd/750
19:24:00  	/proc/self/fd/751
19:24:00  	/proc/self/fd/752
19:24:00  	/proc/self/fd/753
19:24:00  	/proc/self/fd/754
19:24:00  	/proc/self/fd/756
19:24:00  	/proc/self/fd/760
19:24:00  	/proc/self/fd/761
19:24:00  	/proc/self/fd/762
19:24:00  	/proc/self/fd/763
19:24:00  	/proc/self/fd/764
19:24:00  	/proc/self/fd/765
19:24:00  	/proc/self/fd/766
19:24:00  	/proc/self/fd/767
19:24:00  	/proc/self/fd/768
19:24:00  	/proc/self/fd/769
19:24:00  	/proc/self/fd/770
19:24:00  	/proc/self/fd/771
19:24:00  	/proc/self/fd/772
19:24:00  	/proc/self/fd/773
19:24:00  	/proc/self/fd/775
19:24:00  	/proc/self/fd/776
19:24:00  	/proc/self/fd/778
19:24:00  	/proc/self/fd/779
19:24:00  	/proc/self/fd/781
19:24:00  	/proc/self/fd/786
19:24:00  	/proc/self/fd/788
19:24:00  	/proc/self/fd/789
19:24:00  	/proc/self/fd/801
19:24:00  	/proc/self/fd/803
19:24:00  	/proc/self/fd/804
19:24:00  	/proc/self/fd/805
19:24:00  	/proc/self/fd/806
19:24:00  	/proc/self/fd/807
19:24:00  	/proc/self/fd/808
19:24:00  	/proc/self/fd/809
19:24:00  	/proc/self/fd/810
19:24:00  	/proc/self/fd/811
19:24:00  	/proc/self/fd/812
19:24:00  	/proc/self/fd/813
19:24:00  	/proc/self/fd/814
19:24:00  	/proc/self/fd/815
19:24:00  	/proc/self/fd/816
19:24:00  	/proc/self/fd/817
19:24:00  	/proc/self/fd/818
19:24:00  	/proc/self/fd/820
19:24:00  	/proc/self/fd/821
19:24:00  	/proc/self/fd/822
19:24:00  	/proc/self/fd/823
19:24:00  	/proc/self/fd/824
19:24:00  	/proc/self/fd/825
19:24:00  	/proc/self/fd/826
19:24:00  	/proc/self/fd/827
19:24:00  	/proc/self/fd/828
19:24:00  	/proc/self/fd/829
19:24:00  	/proc/self/fd/830
19:24:00  	/proc/self/fd/831
19:24:00  	/proc/self/fd/832
19:24:00  	/proc/self/fd/833
19:24:00  	/proc/self/fd/834
19:24:00  	/proc/self/fd/835
19:24:00  	/proc/self/fd/836
19:24:00  	/proc/self/fd/837
19:24:00  	/proc/self/fd/838
19:24:00  	/proc/self/fd/839
19:24:00  	/proc/self/fd/840
19:24:00  	/proc/self/fd/841
19:24:00  	/proc/self/fd/842
19:24:00  	/proc/self/fd/843
19:24:00  	/proc/self/fd/844
19:24:00  	/proc/self/fd/845
19:24:00  	/proc/self/fd/846
19:24:00  	/proc/self/fd/847
19:24:00  	/proc/self/fd/848
19:24:00  	/proc/self/fd/849
19:24:00  	/proc/self/fd/850
19:24:00  	/proc/self/fd/851
19:24:00  	/proc/self/fd/852
19:24:00  	/proc/self/fd/853
19:24:00  	/proc/self/fd/854
19:24:00  	/proc/self/fd/855
19:24:00  	/proc/self/fd/856
19:24:00  	/proc/self/fd/857
19:24:00  	/proc/self/fd/858
19:24:00  	/proc/self/fd/859
19:24:00  	/proc/self/fd/860
19:24:00  	/proc/self/fd/861
19:24:00  	/proc/self/fd/862
19:24:00  	/proc/self/fd/863
19:24:00  	/proc/self/fd/864
19:24:00  	/proc/self/fd/865
19:24:00  	/proc/self/fd/866
19:24:00  	/proc/self/fd/867
19:24:00  	/proc/self/fd/868
19:24:00  	/proc/self/fd/869
19:24:00  	/proc/self/fd/870
19:24:00  	/proc/self/fd/871
19:24:00  	/proc/self/fd/872
19:24:00  	/proc/self/fd/873
19:24:00  	/proc/self/fd/874
19:24:00  	/proc/self/fd/875
19:24:00  	/proc/self/fd/876
19:24:00  	/proc/self/fd/877
19:24:00  	/proc/self/fd/878
19:24:00  	/proc/self/fd/879
19:24:00  	/proc/self/fd/880
19:24:00  	/proc/self/fd/881
19:24:00  	/proc/self/fd/882
19:24:00  	/proc/self/fd/883
19:24:00  	/proc/self/fd/884
19:24:00  	/proc/self/fd/885
19:24:00  	/proc/self/fd/886
19:24:00  	/proc/self/fd/887
19:24:00  	/proc/self/fd/888
19:24:00  	/proc/self/fd/889
19:24:00  	/proc/self/fd/890
19:24:00  	/proc/self/fd/891
19:24:00  	/proc/self/fd/892
19:24:00  	/proc/self/fd/893
19:24:00  	/proc/self/fd/894
19:24:00  	/proc/self/fd/895
19:24:00  	/proc/self/fd/896
19:24:00  	/proc/self/fd/897
19:24:00  	/proc/self/fd/898
19:24:00  	/proc/self/fd/899
19:24:00  	/proc/self/fd/900
19:24:00  	/proc/self/fd/901
19:24:00  	/proc/self/fd/902
19:24:00  	/proc/self/fd/903
19:24:00  	/proc/self/fd/904
19:24:00  	/proc/self/fd/905
19:24:00  	/proc/self/fd/906
19:24:00  	/proc/self/fd/907
19:24:00  	/proc/self/fd/908
19:24:00  	/proc/self/fd/909
19:24:00  	/proc/self/fd/910
19:24:00  	/proc/self/fd/911
19:24:00  	/proc/self/fd/912
19:24:00  	/proc/self/fd/913
19:24:00  	/proc/self/fd/914
19:24:00  	/proc/self/fd/915

Browser tests randomly fail during Jenkins execution because of too many
opened file descriptors. The reason seems to be that the build uses
parallel execution and Maven plugins executed for other bundles may have
opened and closed file descriptors in parallel, which are erroneously
taken into account by the browser tests evaluating the number of file
descriptors left open after a test execution.

This change excludes open file descriptors for Maven artifacts used in
parallel by other Maven plugins by not considering file descriptors with
their path containing ".m2" or "target/classes".

Fixes eclipse-platform#1523
@HeikoKlare HeikoKlare force-pushed the file-descriptors-browser-tests branch from f7e9819 to f19242f Compare February 3, 2025 19:49
@HeikoKlare HeikoKlare merged commit f746154 into eclipse-platform:master Feb 3, 2025
11 of 14 checks passed
@HeikoKlare HeikoKlare deleted the file-descriptors-browser-tests branch February 3, 2025 20:05
@HannesWell
Copy link
Member

But this change will hopefully at least significantly reduce the number of false "positives":

It looks like it does it quite successfully. I saw many more builds pass today. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random "Too many leaked file descriptors" in Test_org_eclipse_swt_browser_Browser
3 participants