Skip to content

Conversation

@steveloughran
Copy link
Contributor

This moves all the cloud connector libraries to common/lib There are specific build options to control which libraries to include The hadoop-* JARs of the modules are includes, but dependencies are only included when the build-time options specify it.

Available package profiles:
hadoop-aliyun-package
hadoop-aws-package
hadoop-azure-datalake-package
hadoop-cos-package
hadoop-huaweicloud-package

This means that by default AWS bundle.jar is no longer included in the distribution: to add it users must drop their chosen version of the SDK into share/hadoop/common/lib

Anyone building their own release now has a choice of which connectors to bundle. The ASF ones will stay fairly lean to reduce the CVE attack surface as well as keep package size under control.

This is the branch-3.4 variant which cuts out connector that are not present (tos, gcp).

How was this patch tested?

Manual builds; another in progress.

LICENSE-binary validated by looking at dependencie of hadoop-cloud-storage, making sure the needed ones were there and deleting some which didn't appear any more.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

…hadoop common/lib (apache#7980)

This moves all the cloud connector libraries to common/lib
There are specific build options to control which libraries to include
The hadoop-* JARs of the modules are includes, but dependencies
are only included when the build-time options specify it.

  Available package profiles:
    hadoop-aliyun-package
    hadoop-aws-package
    hadoop-azure-datalake-package
    hadoop-cos-package
    hadoop-huaweicloud-package

This means that by default AWS bundle.jar is no longer included
in the distribution: to add it users must drop their chosen version
of the SDK into share/hadoop/common/lib

Anyone building their own release now has a choice of which connectors
to bundle. The ASF ones will stay fairly lean to reduce the CVE
attack surface as well as keep package size under control.

Contributed by Steve Loughran
mvn package -Pdist -DskipTests -Dhadoop-aws-package -Dhadoop-azure-datalake-package

Available package profiles:
hadoop-aws-package
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore hadoop-aliyun-package

-------------

aopalliance:aopalliance:1.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cut

build with -Dhadoop-aws-package -Dhadoop-azure-datalake-package
Available package profiles:
hadoop-aws-package
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore hadoop-aliyun-package docs

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 58s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-3.4 Compile Tests _
+0 🆗 mvndep 2m 12s Maven dependency ordering for branch
+1 💚 mvninstall 21m 55s branch-3.4 passed
+1 💚 compile 9m 5s branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 8m 0s branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 1m 56s branch-3.4 passed
+1 💚 mvnsite 14m 38s branch-3.4 passed
+1 💚 javadoc 4m 47s branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 4m 39s branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+0 🆗 spotbugs 0m 13s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+0 🆗 spotbugs 0m 11s branch/hadoop-assemblies no spotbugs output file (spotbugsXml.xml)
-1 ❌ spotbugs 16m 43s /branch-spotbugs-root-warnings.html root in branch-3.4 has 1 extant spotbugs warnings.
+0 🆗 spotbugs 0m 15s branch/hadoop-tools/hadoop-tools-dist no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 18m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 23s Maven dependency ordering for patch
+1 💚 mvninstall 18m 34s the patch passed
+1 💚 compile 8m 45s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 8m 45s the patch passed
+1 💚 compile 8m 12s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 8m 12s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 58s the patch passed
+1 💚 mvnsite 11m 35s the patch passed
+1 💚 shellcheck 0m 0s No new issues.
-1 ❌ javadoc 4m 42s /results-javadoc-javadoc-root-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt root-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 generated 10 new + 5675 unchanged - 10 fixed = 5685 total (was 5685)
-1 ❌ javadoc 4m 40s /results-javadoc-javadoc-root-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt root-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09 with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09 generated 10 new + 1429 unchanged - 10 fixed = 1439 total (was 1439)
+0 🆗 spotbugs 0m 14s hadoop-project has no data from spotbugs
+0 🆗 spotbugs 0m 13s hadoop-assemblies has no data from spotbugs
+0 🆗 spotbugs 0m 14s hadoop-tools/hadoop-tools-dist has no data from spotbugs
+0 🆗 spotbugs 0m 16s hadoop-cloud-storage-project/hadoop-cloud-storage-dist has no data from spotbugs
+1 💚 shadedclient 18m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 634m 43s /patch-unit-root.txt root in the patch passed.
-1 ❌ asflicense 0m 53s /results-asflicense.txt The patch generated 1 ASF License warnings.
845m 29s
Reason Tests
Failed junit tests hadoop.mapred.gridmix.TestSleepJob
hadoop.mapred.gridmix.TestGridmixSubmission
hadoop.mapred.gridmix.TestLoadJob
Subsystem Report/Notes
Docker ClientAPI=1.52 ServerAPI=1.52 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8094/1/artifact/out/Dockerfile
GITHUB PR #8094
Optional Tests dupname asflicense codespell detsecrets shellcheck shelldocs compile javac javadoc mvninstall mvnsite unit shadedclient xmllint spotbugs checkstyle
uname Linux 5698d08ecfea 5.15.0-156-generic #166-Ubuntu SMP Sat Aug 9 00:02:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4 / 026c9ba
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8094/1/testReport/
Max. process+thread count 4336 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-assemblies hadoop-common-project/hadoop-common hadoop-tools/hadoop-tools-dist hadoop-tools/hadoop-aws hadoop-cloud-storage-project/hadoop-huaweicloud hadoop-cloud-storage-project . hadoop-cloud-storage-project/hadoop-cloud-storage-dist U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8094/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2 shellcheck=0.7.0
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@pan3793
Copy link
Member

pan3793 commented Nov 20, 2025

Hmm... I feel this is a surprising change for branch-3.4

@steveloughran
Copy link
Contributor Author

steveloughran commented Nov 20, 2025

@pan3793 I understand your concerns but this is actually a packaging improvement

The current build

  • contains a tools/lib/bundle.jar for aws, which is too big for distribution
  • doesn't put anyone elses cloud connector on the classpath. People doing their own distros end up moving them or doing other classpath stuff.
  • forces a release process where we have to take the full tar, untar it, remove that bundle.tar, rebuild and sign again: https://github.com/apache/hadoop-release-support/blob/main/build.xml#L1315

moving the hadoop-* cloud libs into common-lib but leaving out dependencies means the stuff is in the right place, provided users manually add the dependencies (which I'm not going to build with except for hadoop-azure as that's the httpclient and wildfly libs we use elsewhere.

This makes releasing easier, and it makes adding the dependencies easier as the current setup requires a user to add the specific bundle-jar version the release was built with, same for the other components.

That's why this change has been a blocker for a 3.4.3 release -the release process itself is what needs fixing

@pan3793
Copy link
Member

pan3793 commented Nov 21, 2025

@steveloughran I understand it's a little bit tricky for creating the lean tarball (should be similar to aarch tarball?), given that it already has a working script for it, I don't think it's a blocker for the 3.4 patching releases. TBH, I think Hadoop is currently kind of an abuse of patch releases, the recent patch releases contain more features than bug fixes, and even breaking changes.

@steveloughran
Copy link
Contributor Author

I think Hadoop is currently kind of an abuse of patch releases, the recent patch releases contain more features than bug fixes, and even breaking changes.

I've been trying to keep 3.4.3 low diff-wise to 3.4.2, balancing out the need for a lot of those transient CVE fixes. Other than anything related to avro updates, everything api-wise shouldn't be causing regressions.

I'd like 3.4.3 to be the last java8 release, though I suspect we may need some dependency update releases next year. It's got a stabilisation of the aws analytics reader, but the only breaking change there is we change the default to "on"...people can switch back.

maybe we should discuss this on common-dev?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants