-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[core] format table: jindo io use upload part to support two-phase commit #6470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bc1d1fa to
baae9ea
Compare
paimon-common/src/main/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStream.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds two-phase commit support for Jindo filesystem using multipart upload, aligning it with the OSS implementation. The change enables Jindo to handle large file uploads by splitting them into parts and supporting transactional commits.
Key Changes:
- Implemented multipart upload infrastructure for Jindo filesystem with new classes for upload handling and committing
- Standardized part size threshold to 8MB across OSS and Jindo implementations
- Enhanced the base multipart upload stream to properly handle writes that exceed buffer thresholds by splitting data across multiple parts
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
paimon-filesystems/paimon-oss-impl/src/main/java/org/apache/paimon/oss/OssTwoPhaseOutputStream.java |
Changed part size threshold from 10MB to 8MB for consistency |
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/JindoTwoPhaseOutputStream.java |
New class implementing two-phase output stream for Jindo using 8MB part threshold |
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/JindoMultiPartUploadCommitter.java |
New committer class for finalizing Jindo multipart uploads |
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/JindoMultiPartUpload.java |
New implementation of multipart upload store using Jindo APIs |
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/JindoFileIO.java |
Added newTwoPhaseOutputStream method to enable two-phase writes |
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/HadoopCompliantFileIO.java |
Changed getFileSystemPair visibility from private to protected for subclass access |
paimon-common/src/test/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStreamTest.java |
Updated test expectations and added new test for threshold-based data splitting |
paimon-common/src/main/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStream.java |
Refactored write logic to handle data exceeding threshold by splitting across multiple parts |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
paimon-common/src/main/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStream.java
Outdated
Show resolved
Hide resolved
paimon-common/src/main/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStream.java
Outdated
Show resolved
Hide resolved
baae9ea to
3ae00a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
paimon-common/src/test/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStreamTest.java
Outdated
Show resolved
Hide resolved
5f2c57b to
b77af94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/JindoMultiPartUpload.java
Show resolved
Hide resolved
paimon-common/src/main/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStream.java
Outdated
Show resolved
Hide resolved
b77af94 to
ced55e1
Compare
ced55e1 to
0f78223
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
paimon-common/src/main/java/org/apache/paimon/fs/BaseMultiPartUploadCommitter.java
Outdated
Show resolved
Hide resolved
paimon-common/src/test/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStreamTest.java
Outdated
Show resolved
Hide resolved
...filesystems/paimon-oss-impl/src/main/java/org/apache/paimon/oss/OssTwoPhaseOutputStream.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
paimon-common/src/main/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStream.java
Show resolved
Hide resolved
...filesystems/paimon-oss-impl/src/main/java/org/apache/paimon/oss/OssTwoPhaseOutputStream.java
Outdated
Show resolved
Hide resolved
35ee180 to
26dbfaf
Compare
d418342 to
e89b6e6
Compare
e89b6e6 to
7f69157
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
…write * github/master: (71 commits) [core] format table: update format-table.implementation default value from engine to paimon (apache#6474) [core] Introduce 'commit.discard-duplicate-files' to make append safe (apache#6464) [spark] Refactor for spark format table (apache#6477) [doc] Update batch partition mark done doc (apache#6478) [core] Refactor dv index cache in SnapshotReaderImpl [typo] fix typo in distinct (apache#6475) [doc] Fix typo in flink/sql-ddl.md (apache#6476) [core] Introduce deletion vector meta cache at bucket level (apache#6407) [rest] Support external paimon table in rest catalog (apache#6446) [core] format table: jindo io use upload part to support two-phase commit (apache#6470) [doc] add view doc (apache#6469) [doc] fix MarkPartitionDoneProcedure doc. (apache#6473) [arrow] Fix java.lang.IllegalArgumentException in ArrowFormatCWriter. (apache#6459) [core] Enable manifest filter in data evolution table (apache#6455) [doc] Refactor names in python-api [doc] Supplementary the document of python REST API (apache#6466) [core] Do not use checkArgument in complex error message (apache#6468) [core] spark format table support overwrite (apache#6442) [Python] Blob read supports with_shard (apache#6465) [Python] Add basic tests for schema evolution read (apache#6463) ...
Purpose
format table: jindo io use upload part to support two-phase commit
Tests
API and Format
Documentation