Skip to content

Conversation

@jerry-024
Copy link
Contributor

@jerry-024 jerry-024 commented Oct 24, 2025

Purpose

format table: jindo io use upload part to support two-phase commit

Tests

API and Format

Documentation

@jerry-024 jerry-024 marked this pull request as draft October 24, 2025 07:57
@jerry-024 jerry-024 force-pushed the jindo_two_phase_commit branch from bc1d1fa to baae9ea Compare October 24, 2025 08:00
@jerry-024 jerry-024 changed the title [core] format table: jindo io support two phase commit [core] format table: jindo io use upload part to support two phase commit Oct 24, 2025
@jerry-024 jerry-024 changed the title [core] format table: jindo io use upload part to support two phase commit [core] format table: jindo io use upload part to support two-phase commit Oct 24, 2025
@jerry-024 jerry-024 requested a review from Copilot October 24, 2025 08:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds two-phase commit support for Jindo filesystem using multipart upload, aligning it with the OSS implementation. The change enables Jindo to handle large file uploads by splitting them into parts and supporting transactional commits.

Key Changes:

  • Implemented multipart upload infrastructure for Jindo filesystem with new classes for upload handling and committing
  • Standardized part size threshold to 8MB across OSS and Jindo implementations
  • Enhanced the base multipart upload stream to properly handle writes that exceed buffer thresholds by splitting data across multiple parts

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
paimon-filesystems/paimon-oss-impl/src/main/java/org/apache/paimon/oss/OssTwoPhaseOutputStream.java Changed part size threshold from 10MB to 8MB for consistency
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/JindoTwoPhaseOutputStream.java New class implementing two-phase output stream for Jindo using 8MB part threshold
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/JindoMultiPartUploadCommitter.java New committer class for finalizing Jindo multipart uploads
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/JindoMultiPartUpload.java New implementation of multipart upload store using Jindo APIs
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/JindoFileIO.java Added newTwoPhaseOutputStream method to enable two-phase writes
paimon-filesystems/paimon-jindo/src/main/java/org/apache/paimon/jindo/HadoopCompliantFileIO.java Changed getFileSystemPair visibility from private to protected for subclass access
paimon-common/src/test/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStreamTest.java Updated test expectations and added new test for threshold-based data splitting
paimon-common/src/main/java/org/apache/paimon/fs/MultiPartUploadTwoPhaseOutputStream.java Refactored write logic to handle data exceeding threshold by splitting across multiple parts

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jerry-024 jerry-024 force-pushed the jindo_two_phase_commit branch from baae9ea to 3ae00a3 Compare October 24, 2025 08:55
@jerry-024 jerry-024 requested a review from Copilot October 24, 2025 08:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jerry-024 jerry-024 force-pushed the jindo_two_phase_commit branch 2 times, most recently from 5f2c57b to b77af94 Compare October 24, 2025 09:25
@jerry-024 jerry-024 requested a review from Copilot October 24, 2025 09:27
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jerry-024 jerry-024 force-pushed the jindo_two_phase_commit branch from b77af94 to ced55e1 Compare October 24, 2025 09:42
@jerry-024 jerry-024 force-pushed the jindo_two_phase_commit branch from ced55e1 to 0f78223 Compare October 24, 2025 09:43
@jerry-024 jerry-024 requested a review from Copilot October 24, 2025 09:43
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jerry-024 jerry-024 requested a review from Copilot October 27, 2025 01:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jerry-024 jerry-024 force-pushed the jindo_two_phase_commit branch 2 times, most recently from 35ee180 to 26dbfaf Compare October 27, 2025 02:12
@jerry-024 jerry-024 marked this pull request as ready for review October 27, 2025 03:03
@jerry-024 jerry-024 force-pushed the jindo_two_phase_commit branch 2 times, most recently from d418342 to e89b6e6 Compare October 27, 2025 06:54
@jerry-024 jerry-024 force-pushed the jindo_two_phase_commit branch from e89b6e6 to 7f69157 Compare October 27, 2025 06:56
Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit bc64466 into apache:master Oct 27, 2025
24 checks passed
@jerry-024 jerry-024 deleted the jindo_two_phase_commit branch October 27, 2025 07:46
jerry-024 added a commit to jerry-024/paimon that referenced this pull request Oct 28, 2025
…write

* github/master: (71 commits)
  [core] format table: update format-table.implementation default value from engine to paimon (apache#6474)
  [core] Introduce 'commit.discard-duplicate-files' to make append safe (apache#6464)
  [spark] Refactor for spark format table (apache#6477)
  [doc] Update batch partition mark done doc (apache#6478)
  [core] Refactor dv index cache in SnapshotReaderImpl
  [typo] fix typo in distinct (apache#6475)
  [doc] Fix typo in flink/sql-ddl.md (apache#6476)
  [core] Introduce deletion vector meta cache at bucket level (apache#6407)
  [rest] Support external paimon table in rest catalog  (apache#6446)
  [core] format table: jindo io use upload part to support two-phase commit (apache#6470)
  [doc] add view doc (apache#6469)
  [doc] fix MarkPartitionDoneProcedure doc. (apache#6473)
  [arrow] Fix java.lang.IllegalArgumentException in ArrowFormatCWriter. (apache#6459)
  [core] Enable manifest filter in data evolution table (apache#6455)
  [doc] Refactor names in python-api
  [doc] Supplementary the document of python REST API (apache#6466)
  [core] Do not use checkArgument in complex error message (apache#6468)
  [core] spark format table support  overwrite (apache#6442)
  [Python] Blob read supports with_shard (apache#6465)
  [Python] Add basic tests for schema evolution read (apache#6463)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants