Skip to content

Conversation

@sryza
Copy link
Contributor

@sryza sryza commented Nov 11, 2025

What changes were proposed in this pull request?

  • Raises an error if the pipeline checkpoint storage dir is not an absolute path
  • Updated the init CLI to create and set a checkpoint storage dir as an absolute path

Why are the changes needed?

Prevent users from accidentally losing checkpoints.

Does this PR introduce any user-facing change?

Yes, but to unreleased functionality.

How was this patch tested?

  • New unit tests
  • Ran the init CLI and then ran pipeline with streaming table

Was this patch authored or co-authored using generative AI tooling?

@HyukjinKwon HyukjinKwon changed the title [SPARK-54280] Require pipeline checkpoint storage dir to be absolute path [SPARK-54280][SDP] Require pipeline checkpoint storage dir to be absolute path Nov 11, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make the CI happy?

[info] SinkExecutionSuite:
[info] - writing to external sink - memory sink *** FAILED *** (2 seconds, 65 milliseconds)

},
"PIPELINE_STORAGE_ROOT_INVALID" : {
"message" : [
"Pipeline storage root must be an absolute path with a URI scheme (e.g., file://, s3a://, hdfs://). Got: `<storage_root>`."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Pipeline storage root must be an absolute path with a URI scheme (e.g., file://, s3a://, hdfs://). Got: `<storage_root>`."
"Pipeline storage root must be an absolute path with an URI scheme (e.g., file://, s3a://, hdfs://). Got: `<storage_root>`."

Copy link
Contributor

@jaceklaskowski jaceklaskowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-picking 🤷‍♂️

@sryza sryza force-pushed the storage-location-absolute branch from 971c876 to dd8d70a Compare November 13, 2025 03:26
@dongjoon-hyun
Copy link
Member

Thank you for updating. It seems that we have only 1 failure, @sryza .

[info] - Pipeline execution cache *** FAILED *** (2 milliseconds)
[info]   org.apache.spark.SparkException: [PIPELINE_STORAGE_ROOT_INVALID] Pipeline storage root must be an absolute path with a URI scheme (e.g., file://, s3a://, hdfs://). Got: `test_storage_root`. SQLSTATE: 42K03

@sryza
Copy link
Contributor Author

sryza commented Nov 13, 2025

I keep thinking I've handled them all and then finding new ones 🤦 . Working on this.

@dongjoon-hyun
Copy link
Member

There is one remaining failure.

[error] Failed tests:
[error] 	org.apache.spark.network.RpcIntegrationSuite

I locally verified that it's irrelevant and passes locally.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @sryza . Merged to master/4.1.

dongjoon-hyun pushed a commit that referenced this pull request Nov 13, 2025
…lute path

### What changes were proposed in this pull request?

- Raises an error if the pipeline checkpoint storage dir is not an absolute path
- Updated the init CLI to create and set a checkpoint storage dir as an absolute path

### Why are the changes needed?

Prevent users from accidentally losing checkpoints.

### Does this PR introduce _any_ user-facing change?

Yes, but to unreleased functionality.

### How was this patch tested?

- New unit tests
- Ran the init CLI and then ran pipeline with streaming table

### Was this patch authored or co-authored using generative AI tooling?

Closes #52999 from sryza/storage-location-absolute.

Authored-by: Sandy Ryza <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 4020794)
Signed-off-by: Dongjoon Hyun <[email protected]>
@sryza
Copy link
Contributor Author

sryza commented Nov 14, 2025

Thanks @dongjoon-hyun !

@dongjoon-hyun
Copy link
Member

Hi, @sryza . This seems to be flaky and block other PRs. Please see the following.

dongjoon-hyun pushed a commit that referenced this pull request Nov 14, 2025
…t schema

### What changes were proposed in this pull request?

Fixes the `EndToEndAPISuite`, which was broken by #52999.

### Why are the changes needed?

For CI to pass

### Does this PR introduce _any_ user-facing change?

No - test-only change.

### How was this patch tested?

Ran tests.

### Was this patch authored or co-authored using generative AI tooling?

Closes #53069 from sryza/end-to-end-storage-root.

Authored-by: Sandy Ryza <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Nov 14, 2025
…t schema

### What changes were proposed in this pull request?

Fixes the `EndToEndAPISuite`, which was broken by #52999.

### Why are the changes needed?

For CI to pass

### Does this PR introduce _any_ user-facing change?

No - test-only change.

### How was this patch tested?

Ran tests.

### Was this patch authored or co-authored using generative AI tooling?

Closes #53069 from sryza/end-to-end-storage-root.

Authored-by: Sandy Ryza <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 7599b2f)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants