Skip to content

Conversation

fabio-rizzo-01
Copy link
Contributor

@fabio-rizzo-01 fabio-rizzo-01 commented Jul 2, 2025

I followed the same pattern used in methods of the same class, like commit(...). This will return a 409 back to the caller instead of a 500 error

Solves (#1390)

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the change @fabio-rizzo-01, can you please add UTs for the same ?

@fabio-rizzo-01
Copy link
Contributor Author

Thank you for the change @fabio-rizzo-01, can you please add UTs for the same ?

Test added

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM ! Have one minor suggestion

cc @dimas-b as we discussed this before !

.updateEntityPropertiesIfNotChanged(any(), any(), any());
assertThrows(
CommitFailedException.class, () -> catalog.setProperties(NS, ImmutableMap.of("a", "b")));
assertThrows(CommitFailedException.class, () -> catalog.removeProperties(NS, Set.of("a", "b")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommitFailedException would throw 409 from IcebergException mapper, but is there a way to just test the status code without relying on exception mapper on this case ?

Copy link
Contributor Author

@fabio-rizzo-01 fabio-rizzo-01 Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code is in the Exception mapper, I can add that specific case in the test:

Arguments.of(new RuntimeException("Error persisting entity"), 500)),

like this
Arguments.of(new CommitFailedException("Error persisting entity"), 409)),

but I think that at the catalog level we have to check for the exception.

@@ -756,7 +756,7 @@ public boolean removeProperties(Namespace namespace, Set<String> properties)
.map(PolarisEntity::new)
.orElse(null);
if (returnedEntity == null) {
throw new RuntimeException("Concurrent modification of namespace: " + namespace);
throw new CommitFailedException("Concurrent modification of namespace: %s", namespace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with overloading the CommitFailedException as the iceberg exception for that doesn't say specifically anything about the entities for which commit failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, CommitFailedException is too generic to be used for reporting conflicts.

I'd prefer to have a Polaris-specific exception class specifically for reporting conflicts and mapping them to 409 at the REST API layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so should I add a new exception in polaris core and add the mapping in PolarisExceptionMapper? Any suggestion for the exception name is welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a polaris-specific exception for this case will simplify maintenance and allow fine-tuning response... so please so.

As for the name, I'd propose org.apache.polaris.core.exceptions.CommitConflictException extending PolarisException.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: please see my other comment.

Sorry, I was not really up-to-date on current code when I commented earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for chiming in late, and also apologies if I'm missing some context here.

I am not convinced that we should introduce CommitConflictException. I see many usages of CommitFailedException in similar situations, for example:

And so on for PrincipalRole and CatalogRole.

Besides, CommitFailedException is already being mapped to 409:

case CommitFailedException e -> Status.CONFLICT.getStatusCode();

In short, I don't mind introducing this new type, but unless I'm missing something obvious, I see little value in it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point against CommitFailedException is that it comes from Iceberg API jar and as such is not related to the Iceberg REST API directly. Moreover, Polaris needs to have flexibility in producing REST responses, which is not provided for by CommitFailedException in general.

I'd personally prefer to gradually migrate all REST error handling to Polaris exceptions in cases where the error is triggered by Polaris code.

I think it's fine to have a mix of exceptions during the transition period.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then... let's not forget to do the migration later, it would be nice to have a consistent handling of this kind of conflicts in Polaris.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singhpk234 singhpk234 requested a review from dimas-b July 2, 2025 15:26
@@ -756,7 +756,7 @@ public boolean removeProperties(Namespace namespace, Set<String> properties)
.map(PolarisEntity::new)
.orElse(null);
if (returnedEntity == null) {
throw new RuntimeException("Concurrent modification of namespace: " + namespace);
Copy link
Contributor

@dimas-b dimas-b Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning 409 in this case is preferable to 500.

However, as far as #1390 is concerned, I believe Polaris can handle this case better. The Iceberg REST API defines property changes in terms or "removals" and "updates". Nothing is specified about checks for the previous state of Namespace properties. In that regard a read-then-write-if-not-modified pattern in the current Polaris code is prone to errors even if the property changes are non-conflicting in themselves.

I believe a better fix would be to add retries in case the entity is modified by another process in this middle of this operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: I do not think a 409 response is justified in this case from the client's perspective. Per #1390 clients' requests are not conflicting with each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the retry logic but there could still be a case when everything fails, so what do we want to return in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use a configurable retry timeout and the 408 status code when the timeout expires.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually 408 has a different purpose... Maybe 504 (Gateway Timeout) fits this use case.

@fabio-rizzo-01 : Could you start a dev email discussion on this for visibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimas-b email sent

}
return switch (exception) {
case AlreadyExistsException alreadyExistsException -> Response.Status.CONFLICT;
case CommitConflictException commitConflictException -> Response.Status.REQUEST_TIMEOUT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 408 does not match this use case. Per PRC 9110: The 408 (Request Timeout) status code indicates that the server did not receive a complete request message within the time that it was prepared to wait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the custom exception we can return 500 at this point, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current state of the PR (no retries on persistence-level conflicts) I believe the 409 (Conflict) response code is the most relevant.

However, as discussed on the dev email thread, I do not think 409 is the best possible implementation of the Iceberg REST API. I believe Polaris should retry on the server side and respond with 429 (Too Many Requests) if retries do not help.

That said, I understand that implementing retries is a much bigger effort, I'd be ok with 409 for now (as it is incrementally better than 500).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so for now I'll use 409 and we can have an enhancement for the retries, ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm :)

dimas-b
dimas-b previously approved these changes Jul 22, 2025
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @fabio-rizzo-01 ! I'll leave this PR over for a couple of days for other people to review.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jul 22, 2025
@dimas-b dimas-b changed the title Concurrent Namespace Update Fails with RuntimeException and 500 Error… Respond with 409 in case of concurrent Namespace update failures instead of 500 Jul 22, 2025
eric-maynard
eric-maynard previously approved these changes Jul 22, 2025
@fabio-rizzo-01 fabio-rizzo-01 dismissed stale reviews from eric-maynard and dimas-b via 365bf4e July 23, 2025 16:07
@fabio-rizzo-01
Copy link
Contributor Author

good to be merged? I'll create a new task for the retry logic @dimas-b

@dimas-b dimas-b merged commit e9267b6 into apache:main Jul 24, 2025
12 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants