Skip to content

feat: custom product versions for Hadoop, HBase, Phoenix, hbase-operator-tools, Druid, Hive and Spark #1173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 18, 2025

Conversation

dervoeti
Copy link
Member

@dervoeti dervoeti commented Jun 13, 2025

Description

Part of #1068

This PR enables custom versions (with a suffix like -stackable0.0.0-dev) for components we apply custom patches to. It also makes the following products use our patched Hadoop libraries:

  • HBase
  • Phoenix
  • hbase-operator-tools
  • Hive
  • Druid
  • Spark

And these components use our patched HBase libraries now:

  • Phoenix
  • hbase-operator-tools
  • Spark

In the SBOMs, the custom version is replaced by the original one (using sed), so vulnerabilities filed directly against e.g. Hadoop 3.3.6 are still found when scanning the SBOMs (which we do in our vulnerability management pipeline). Otherwise vulnerabilities against Hadoop 3.3.6 might be missed, because the version would be something like 3.3.6-stackable25.7.0 and vulnerability databases don't contain entries for that particular version.The patched libraries are used by overriding for instance the Hadoop version when running Maven (like -Dhadoop.compile.version=3.3.6-stackable0.0.0-dev ). Since they are not found in our Nexus Maven repo, we copy them to the local Maven repo from the Hadoop builder:

COPY --from=hadoop-builder --chown=${STACKABLE_USER_UID}:0 /stackable/patched-libs /stackable/patched-libs
cp -r /stackable/patched-libs/maven/* /stackable/.m2/repository

We can't COPY directly into /stackable/.m2/repository since that directory is usually cached and the copied libraries would be overridden by the cache.

Further changes:

  • The patch that fixes CVE-2023-34455 in Druid was removed, since Druid now uses the patched Hadoop version which does not contain the vulnerability
  • Building HBase now happens in a separate Dockerfile (hbase/hbase/Dockerfile) to differentiate the final HBase image (hbase/Dockerfile) from HBase the application. This was needed, because for example Phoenix depends on our patched HBase version now, but the HBase image depends on Phoenix, which would be a circular dependency if the HBase container image and application were actually the same image. Now all components of the HBase image (except hadoop-s3-builder ) are built in separate Dockerfiles instead of inline.
  • hbase-operator-tools and Phoenix now have the HBase version they were built with as suffix in their Docker target names (but not in their versions). The reason is, that while the Phoenix version itself is the same (5.2.1-stackable25.7.0), in our case Phoenix can be built against HBase 2.6.1 or HBase 2.6.2, which also both introduce a different version of Hadoop. We don't want e.g. the HBase 2.6.2 image include a static Phoenix 5.2.1 version that was built with HBase 2.6.1 and Hadoop 3.3.6. So when we include Phoenix in the HBase image, we need to specify which variant of 5.2.1 we want to include. Same for hbase-operator-tools.

I successfully built these products and tested them with the smoke tests:

  • Hive 3.1.3
  • Hive 4.0.1
  • Druid 30.0.1
  • HBase 2.6.1
  • HBase 2.6.2
  • Hadoop 3.3.6
  • Hadoop 3.4.1
  • Spark-k8s 3.5.5

Definition of Done Checklist

Note

Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant.

Please make sure all these things are done and tick the boxes

  • Changes are OpenShift compatible
  • All added packages (via microdnf or otherwise) have a comment on why they are added
  • Things not downloaded from Red Hat repositories should be mirrored in the Stackable repository and downloaded from there
  • All packages should have (if available) signatures/hashes verified
  • Add an entry to the CHANGELOG.md file
  • Integration tests ran successfully
TIP: Running integration tests with a new product image

The image can be built and uploaded to the kind cluster with the following commands:

bake --product <product> --image-version <stackable-image-version>
kind load docker-image <image-tagged-with-the-major-version> --name=<name-of-your-test-cluster>

See the output of bake to retrieve the image tag for <image-tagged-with-the-major-version>.

@dervoeti dervoeti moved this to Development: Waiting for Review in Stackable Engineering Jun 13, 2025
@dervoeti dervoeti self-assigned this Jun 13, 2025
@adwk67 adwk67 self-requested a review June 17, 2025 07:31
@adwk67 adwk67 moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Jun 17, 2025
@dervoeti dervoeti added this pull request to the merge queue Jun 18, 2025
Merged via the queue into main with commit f59f715 Jun 18, 2025
3 checks passed
@dervoeti dervoeti deleted the feat/custom-product-versions-hadoop branch June 18, 2025 08:27
@dervoeti dervoeti mentioned this pull request Jun 18, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Development: In Review
Development

Successfully merging this pull request may close these issues.

2 participants