Skip to content

Conversation

@donoghuc
Copy link
Member

@donoghuc donoghuc commented Oct 23, 2025

Release notes

Removal of duplicated gems in logstash artifacts.

What does this PR do?

Bundler is used to manage a gem environment that is shipped with logstash
artifacts. By default, bundler will install newer/duplicate gems than shipped
with ruby distributions (in logstash's case jruby). Duplicate gems in the
shipped environment can cause issues with code loading with ambiguous gem specs
or gem activation issues. This commit adds a step to compute the duplicate gems
managed with bundler (and therefore direct/transitive dependencies of
logstash/plugins) and removes copies shipped with jruby. Note that there are
two locations to do the deduplication at. Both the stdlib gems as well as what
jruby refers to as "bundled" gems. The existing pattern for excluding files from
artifacts is used to implement the deduplication.

Why is it important/What is the impact to the user?

In some cases security scanners would pick up vendored/standard lib gems which typically trail in version shipped with the jruby distrubuted with logstash artifacts. While the newer code was loaded for logstash (and therefore not a practical threat) the scanner would still produce noise and require justifications. By removing old/duplicated gems we remove the false positives on the scanners.

How to test this PR locally

Build a container artifact and look for duplicated gems:

➜  logstash git:(deduplicate-gem-env) ✗ ARCH="aarch64" rake artifact:docker
Using system java: /Users/cas/.jenv/shims/java
Skipping bundler install...
Building logstash-core using gradle
./gradlew assemble
To honour the JVM settings for this build a single-use Daemon process will be forked. For more on this, please refer to https://docs.gradle.org/8.11.1/userguide/gradle_daemon.html#sec:disabling_the_daemon in the Gradle documentation.
Daemon will be stopped at the end of the build

> Task :downloadJRuby UP-TO-DATE
Download https://repo1.maven.org/maven2/org/jruby/jruby-dist/9.4.13.0/jruby-dist-9.4.13.0-bin.tar.gz

BUILD SUCCESSFUL in 4s
33 actionable tasks: 2 executed, 31 up-to-date
[plugin:install-default] Installing default plugins
Installing logstash-codec-avro, logstash-codec-cef, logstash-codec-collectd, logstash-codec-dots, logstash-codec-edn, logstash-codec-edn_lines, logstash-codec-es_bulk, logstash-codec-fluent, logstash-codec-graphite, logstash-codec-json, logstash-codec-json_lines, logstash-codec-line, logstash-codec-msgpack, logstash-codec-multiline, logstash-codec-netflow, logstash-codec-plain, logstash-codec-rubydebug, logstash-filter-aggregate, logstash-filter-anonymize, logstash-filter-cidr, logstash-filter-clone, logstash-filter-csv, logstash-filter-date, logstash-filter-de_dot, logstash-filter-dissect, logstash-filter-dns, logstash-filter-drop, logstash-filter-elastic_integration, logstash-filter-elasticsearch, logstash-filter-fingerprint, logstash-filter-geoip, logstash-filter-grok, logstash-filter-http, logstash-filter-json, logstash-filter-kv, logstash-filter-memcached, logstash-filter-metrics, logstash-filter-mutate, logstash-filter-prune, logstash-filter-ruby, logstash-filter-sleep, logstash-filter-split, logstash-filter-syslog_pri, logstash-filter-throttle, logstash-filter-translate, logstash-filter-truncate, logstash-filter-urldecode, logstash-filter-useragent, logstash-filter-uuid, logstash-filter-xml, logstash-input-azure_event_hubs, logstash-input-beats, logstash-input-couchdb_changes, logstash-input-dead_letter_queue, logstash-input-elasticsearch, logstash-input-exec, logstash-input-file, logstash-input-ganglia, logstash-input-gelf, logstash-input-generator, logstash-input-graphite, logstash-input-heartbeat, logstash-input-http, logstash-input-http_poller, logstash-input-jms, logstash-input-pipe, logstash-input-redis, logstash-input-stdin, logstash-input-syslog, logstash-input-tcp, logstash-input-twitter, logstash-input-udp, logstash-input-unix, logstash-input-elastic_serverless_forwarder, logstash-integration-jdbc, logstash-integration-kafka, logstash-integration-logstash, logstash-integration-rabbitmq, logstash-integration-snmp, logstash-integration-aws, logstash-output-csv, logstash-output-elasticsearch, logstash-output-email, logstash-output-file, logstash-output-graphite, logstash-output-http, logstash-output-lumberjack, logstash-output-nagios, logstash-output-null, logstash-output-pipe, logstash-output-redis, logstash-output-stdout, logstash-output-tcp, logstash-output-udp, logstash-output-webhdfs
Installation successful
[artifact:archives] Building tar.gz/zip of default plugins for OS: linux, arch: arm64
Adding duplicate gems to exclude path: base64, bigdecimal, cgi, date, ffi, fileutils, jar-dependencies, jruby-openssl, json, logger, net-http, net-imap, net-pop, net-protocol, net-smtp, psych, racc, rake, rexml, ruby2_keywords, timeout, uri
Full exclude_paths list:
 - **/*.gem
 - **/test/files/slow-xpath.xml
 - **/logstash-*/spec
 - bin/bundle
 - bin/rspec
 - bin/rspec.bat
 - vendor/**/gems/*/test/**/*
 - vendor/**/gems/*/spec/**/*
 - vendor/**/gems/**/Gemfile.lock
 - vendor/**/gems/**/Gemfile
 - vendor/jruby/lib/ruby/gems/shared/gems/jar-dependencies-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/jar-dependencies-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/jar-dependencies-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/net-imap-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/net-imap-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/net-imap-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/net-pop-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/net-pop-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/net-pop-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/net-smtp-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/net-smtp-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/net-smtp-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/racc-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/racc-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/racc-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/rake-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/rake-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/rake-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/rexml-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/rexml-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/rexml-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/base64-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/base64.rb
 - vendor/jruby/lib/ruby/stdlib/base64/**/*
 - vendor/jruby/lib/ruby/stdlib/base64
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/bigdecimal-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/bigdecimal.rb
 - vendor/jruby/lib/ruby/stdlib/bigdecimal/**/*
 - vendor/jruby/lib/ruby/stdlib/bigdecimal
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/cgi-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/cgi.rb
 - vendor/jruby/lib/ruby/stdlib/cgi/**/*
 - vendor/jruby/lib/ruby/stdlib/cgi
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/date-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/date.rb
 - vendor/jruby/lib/ruby/stdlib/date/**/*
 - vendor/jruby/lib/ruby/stdlib/date
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/ffi-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/ffi.rb
 - vendor/jruby/lib/ruby/stdlib/ffi/**/*
 - vendor/jruby/lib/ruby/stdlib/ffi
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/fileutils-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/fileutils.rb
 - vendor/jruby/lib/ruby/stdlib/fileutils/**/*
 - vendor/jruby/lib/ruby/stdlib/fileutils
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/jar-dependencies-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/jar-dependencies.rb
 - vendor/jruby/lib/ruby/stdlib/jar-dependencies/**/*
 - vendor/jruby/lib/ruby/stdlib/jar-dependencies
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/jruby-openssl-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/jruby-openssl.rb
 - vendor/jruby/lib/ruby/stdlib/jruby-openssl/**/*
 - vendor/jruby/lib/ruby/stdlib/jruby-openssl
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/json-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/json.rb
 - vendor/jruby/lib/ruby/stdlib/json/**/*
 - vendor/jruby/lib/ruby/stdlib/json
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/logger-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/logger.rb
 - vendor/jruby/lib/ruby/stdlib/logger/**/*
 - vendor/jruby/lib/ruby/stdlib/logger
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/net-http-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/net-http.rb
 - vendor/jruby/lib/ruby/stdlib/net-http/**/*
 - vendor/jruby/lib/ruby/stdlib/net-http
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/net-protocol-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/net-protocol.rb
 - vendor/jruby/lib/ruby/stdlib/net-protocol/**/*
 - vendor/jruby/lib/ruby/stdlib/net-protocol
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/psych-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/psych.rb
 - vendor/jruby/lib/ruby/stdlib/psych/**/*
 - vendor/jruby/lib/ruby/stdlib/psych
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/racc-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/racc.rb
 - vendor/jruby/lib/ruby/stdlib/racc/**/*
 - vendor/jruby/lib/ruby/stdlib/racc
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/ruby2_keywords-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/ruby2_keywords.rb
 - vendor/jruby/lib/ruby/stdlib/ruby2_keywords/**/*
 - vendor/jruby/lib/ruby/stdlib/ruby2_keywords
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/timeout-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/timeout.rb
 - vendor/jruby/lib/ruby/stdlib/timeout/**/*
 - vendor/jruby/lib/ruby/stdlib/timeout
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/uri-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/uri.rb
 - vendor/jruby/lib/ruby/stdlib/uri/**/*
 - vendor/jruby/lib/ruby/stdlib/uri
[artifact:tar] building build/logstash-9.3.0-SNAPSHOT-linux-aarch64.tar.gz
[docker] Building docker image
➜  logstash git:(deduplicate-gem-env) ✗ docker image ls
REPOSITORY                                 TAG              IMAGE ID       CREATED          SIZE
docker.elastic.co/logstash/logstash-full   9.3.0-SNAPSHOT   fa5a1591bf02   54 seconds ago   1.48GB
docker.elastic.co/logstash/logstash        9.3.0-SNAPSHOT   fa5a1591bf02   54 seconds ago   1.48GB
python                                     3                671d8548cfc6   2 weeks ago      1.61GB
➜  logstash git:(deduplicate-gem-env) ✗ docker run -it fa5a1591bf02 /bin/bash
bash-5.1$ find / -name *rexml*
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4/doc/rexml
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4/lib/rexml
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4/lib/rexml/rexml.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4/lib/rexml.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/logstash-filter-xml-4.3.2/lib/logstash/filters/xml/patch_rexml.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/aws-sdk-core-3.234.0/lib/aws-sdk-core/xml/parser/rexml_engine.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/specifications/rexml-3.4.4.gemspec
/usr/share/logstash/vendor/jruby/lib/ruby/gems/shared/gems/rss-0.2.9/lib/rss/rexmlparser.rb
find: ‘/root’: Permission denied
find: ‘/var/cache/ldconfig’: Permission denied
find: ‘/proc/tty/driver’: Permission denied
bash-5.1$ find / -name *uri*
/sys/kernel/security
/sys/module/spurious
/usr/lib64/security
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/twitter-6.2.0/lib/twitter/entity/uri.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4/lib/rexml/security.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/http-cookie-1.1.0/lib/http/cookie/uri_parser.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/addressable-2.8.7/lib/addressable/uri.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/sequel-5.97.0/lib/sequel/plugins/blacklist_security.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/sequel-5.97.0/lib/sequel/plugins/whitelist_security.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/tzinfo-data-1.2025.2/lib/tzinfo/data/definitions/Indian/Mauritius.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/tzinfo-data-1.2025.2/lib/tzinfo/data/definitions/Europe/Zurich.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/uri-1.0.4
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/uri-1.0.4/lib/uri
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/uri-1.0.4/lib/uri.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rack-session-2.1.1/security.md
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rack-protection-4.2.1/lib/rack/protection/content_security_policy.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/http-3.3.0/lib/http/uri.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/elasticsearch-api-8.19.1/lib/elasticsearch/api/actions/security
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/elasticsearch-api-8.19.1/lib/elasticsearch/api/namespace/security.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/mustermann-3.0.4/bench/uri_parser_object.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/mustermann-3.0.4/bench/capturing.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/nio4r-2.7.4-java/ext/libev/ev_iouring.c
/usr/share/logstash/vendor/bundle/jruby/3.1.0/specifications/uri-1.0.4.gemspec
/usr/share/logstash/vendor/jruby/lib/ruby/gems/shared/specifications/default/open-uri-0.3.0.gemspec
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/vendor/uri
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/vendor/uri/lib/uri
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/vendor/uri/lib/uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/vendor/optparse/lib/optparse/uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/security.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/security
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/security_option.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/s3_uri_signer.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/uri_formatter.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/vendor/uri
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/vendor/uri/lib/uri
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/vendor/uri/lib/uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/vendored_uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/uri_credentials_filter.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/uri_normalizer.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/open-uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/optparse/uri.rb
/usr/share/logstash/lib/pluginmanager/pack_fetch_strategy/uri.rb
/usr/share/logstash/logstash-core/lib/logstash/util/safe_uri.rb

Related issues

@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Oct 23, 2025

This pull request does not have a backport label. Could you fix it @donoghuc? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

@donoghuc
Copy link
Member Author

@donoghuc donoghuc added the backport-active-all Automated backport with mergify to all the active branches label Oct 23, 2025
Bundler is used to manage a gem environment that is shipped with logstash
artifacts. By default, bundler will install newer/duplicate gems than shipped
with ruby distributions (in logstash's case jruby). Duplicate gems in the
shipped environment can cause issues with code loading with ambiguous gem specs
or gem activation issues. This commit adds a step to compute the duplicate gems
managed with bundler (and therefore direct/transitive dependencies of
logstash/plugins) and *removes* copies shipped with jruby. Note that there are
two locations to do the deduplication at. Both the stdlib gems as well as what
jruby refers to as "bundled" gems. The existing pattern for excluding files from
artifacts is used to implement the deduplication.
@donoghuc donoghuc force-pushed the deduplicate-gem-env branch from f6ba5bd to 6efb420 Compare October 23, 2025 22:13
@donoghuc
Copy link
Member Author

@donoghuc donoghuc marked this pull request as ready for review October 23, 2025 22:22
@donoghuc donoghuc changed the title WIP: Test pattern for detecting and excluding duplicated gems Remove duplicate gems when producting logstash artifacts Oct 23, 2025
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 23, 2025

💔 Build Failed

Failed CI Steps

History

@donoghuc donoghuc marked this pull request as draft October 23, 2025 23:09
@donoghuc
Copy link
Member Author

Moving back to draft form. Need to track down a gem loading issue. Somehow removal of psych is breaking at least the plugin manager. Need to trace where the GEM_HOME/GEM_PATH are getting set to point to the bundled gems.

@donoghuc
Copy link
Member Author

After further investigation it seems that removing stdlib gems is going to be more trouble than its worth. Digging in to the example failures we see a case where logstash code does something like require 'yaml'. This returns true because something in the ruby internals already assumes psych is loaded. Logstash then blows up with unitialized constant errors. If we manually activate psych we still get a warning emitted from ruby itself saying it thinks we've deleted a standard gem.

WIth that in mind I did validate that when we do have a bundled gem that is the code that is loaded/used during logstash exectuion (this sounds obvious, but wanted to double check).

I think we can safely remove the duplicate "bundled" gems still, but not move forward with the removal of the standard lib gems. Practically, I imagine that CVEs in the standard lib gems wont last too long as they are shipped with the interpreter. We still have the ability to mitigate by shipping newer versions in the lag time between being able to take up latest jruby.

I am curious in this comment #17873 (comment) @jsvd were you indicating to remove just the gemspecs from the stdlib location?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants