-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-19696. hadoop binary distribution to move cloud connectors to hadoop common/lib (#7980) #8094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-3.4
Are you sure you want to change the base?
Conversation
…hadoop common/lib (apache#7980) This moves all the cloud connector libraries to common/lib There are specific build options to control which libraries to include The hadoop-* JARs of the modules are includes, but dependencies are only included when the build-time options specify it. Available package profiles: hadoop-aliyun-package hadoop-aws-package hadoop-azure-datalake-package hadoop-cos-package hadoop-huaweicloud-package This means that by default AWS bundle.jar is no longer included in the distribution: to add it users must drop their chosen version of the SDK into share/hadoop/common/lib Anyone building their own release now has a choice of which connectors to bundle. The ASF ones will stay fairly lean to reduce the CVE attack surface as well as keep package size under control. Contributed by Steve Loughran
| mvn package -Pdist -DskipTests -Dhadoop-aws-package -Dhadoop-azure-datalake-package | ||
|
|
||
| Available package profiles: | ||
| hadoop-aws-package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restore hadoop-aliyun-package
| ------------- | ||
|
|
||
| aopalliance:aopalliance:1.0 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cut
| build with -Dhadoop-aws-package -Dhadoop-azure-datalake-package | ||
| Available package profiles: | ||
| hadoop-aws-package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restore hadoop-aliyun-package docs
|
💔 -1 overall
This message was automatically generated. |
|
Hmm... I feel this is a surprising change for branch-3.4 |
|
@pan3793 I understand your concerns but this is actually a packaging improvement The current build
moving the hadoop-* cloud libs into common-lib but leaving out dependencies means the stuff is in the right place, provided users manually add the dependencies (which I'm not going to build with except for hadoop-azure as that's the httpclient and wildfly libs we use elsewhere. This makes releasing easier, and it makes adding the dependencies easier as the current setup requires a user to add the specific bundle-jar version the release was built with, same for the other components. That's why this change has been a blocker for a 3.4.3 release -the release process itself is what needs fixing |
|
@steveloughran I understand it's a little bit tricky for creating the lean tarball (should be similar to aarch tarball?), given that it already has a working script for it, I don't think it's a blocker for the 3.4 patching releases. TBH, I think Hadoop is currently kind of an abuse of patch releases, the recent patch releases contain more features than bug fixes, and even breaking changes. |
I've been trying to keep 3.4.3 low diff-wise to 3.4.2, balancing out the need for a lot of those transient CVE fixes. Other than anything related to avro updates, everything api-wise shouldn't be causing regressions. I'd like 3.4.3 to be the last java8 release, though I suspect we may need some dependency update releases next year. It's got a stabilisation of the aws analytics reader, but the only breaking change there is we change the default to "on"...people can switch back. maybe we should discuss this on common-dev? |
This moves all the cloud connector libraries to common/lib There are specific build options to control which libraries to include The hadoop-* JARs of the modules are includes, but dependencies are only included when the build-time options specify it.
Available package profiles:
hadoop-aliyun-package
hadoop-aws-package
hadoop-azure-datalake-package
hadoop-cos-package
hadoop-huaweicloud-package
This means that by default AWS bundle.jar is no longer included in the distribution: to add it users must drop their chosen version of the SDK into share/hadoop/common/lib
Anyone building their own release now has a choice of which connectors to bundle. The ASF ones will stay fairly lean to reduce the CVE attack surface as well as keep package size under control.
This is the branch-3.4 variant which cuts out connector that are not present (tos, gcp).
How was this patch tested?
Manual builds; another in progress.
LICENSE-binary validated by looking at dependencie of hadoop-cloud-storage, making sure the needed ones were there and deleting some which didn't appear any more.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?