-
Notifications
You must be signed in to change notification settings - Fork 305
Respond with 409 in case of concurrent Namespace update failures instead of 500 #1989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the change @fabio-rizzo-01, can you please add UTs for the same ?
…pache#1390: updated unit tests
…pache#1390: updated unit tests
Test added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM ! Have one minor suggestion
cc @dimas-b as we discussed this before !
.updateEntityPropertiesIfNotChanged(any(), any(), any()); | ||
assertThrows( | ||
CommitFailedException.class, () -> catalog.setProperties(NS, ImmutableMap.of("a", "b"))); | ||
assertThrows(CommitFailedException.class, () -> catalog.removeProperties(NS, Set.of("a", "b"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CommitFailedException would throw 409 from IcebergException mapper, but is there a way to just test the status code without relying on exception mapper on this case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code is in the Exception mapper, I can add that specific case in the test:
Line 73 in eb6b6ad
Arguments.of(new RuntimeException("Error persisting entity"), 500)), |
like this
Arguments.of(new CommitFailedException("Error persisting entity"), 409)),
but I think that at the catalog level we have to check for the exception.
@@ -756,7 +756,7 @@ public boolean removeProperties(Namespace namespace, Set<String> properties) | |||
.map(PolarisEntity::new) | |||
.orElse(null); | |||
if (returnedEntity == null) { | |||
throw new RuntimeException("Concurrent modification of namespace: " + namespace); | |||
throw new CommitFailedException("Concurrent modification of namespace: %s", namespace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with overloading the CommitFailedException as the iceberg exception for that doesn't say specifically anything about the entities for which commit failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, CommitFailedException
is too generic to be used for reporting conflicts.
I'd prefer to have a Polaris-specific exception class specifically for reporting conflicts and mapping them to 409 at the REST API layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so should I add a new exception in polaris core and add the mapping in PolarisExceptionMapper? Any suggestion for the exception name is welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a polaris-specific exception for this case will simplify maintenance and allow fine-tuning response... so please so.
As for the name, I'd propose org.apache.polaris.core.exceptions.CommitConflictException
extending PolarisException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: please see my other comment.
Sorry, I was not really up-to-date on current code when I commented earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for chiming in late, and also apologies if I'm missing some context here.
I am not convinced that we should introduce CommitConflictException
. I see many usages of CommitFailedException
in similar situations, for example:
- Conflict when updating a
Catalog
:polaris/service/common/src/main/java/org/apache/polaris/service/admin/PolarisAdminService.java
Lines 927 to 937 in de351de
CatalogEntity returnedEntity = Optional.ofNullable( CatalogEntity.of( PolarisEntity.of( metaStoreManager.updateEntityPropertiesIfNotChanged( getCurrentPolarisContext(), null, updatedEntity)))) .orElseThrow( () -> new CommitFailedException( "Concurrent modification on Catalog '%s'; retry later", name)); return returnedEntity; - Conflict when updating a
Principal
:polaris/service/common/src/main/java/org/apache/polaris/service/admin/PolarisAdminService.java
Lines 1064 to 1074 in de351de
PrincipalEntity returnedEntity = Optional.ofNullable( PrincipalEntity.of( PolarisEntity.of( metaStoreManager.updateEntityPropertiesIfNotChanged( getCurrentPolarisContext(), null, updatedEntity)))) .orElseThrow( () -> new CommitFailedException( "Concurrent modification on Principal '%s'; retry later", name)); return returnedEntity;
And so on for PrincipalRole
and CatalogRole
.
Besides, CommitFailedException
is already being mapped to 409:
Line 177 in a303a1d
case CommitFailedException e -> Status.CONFLICT.getStatusCode(); |
In short, I don't mind introducing this new type, but unless I'm missing something obvious, I see little value in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point against CommitFailedException
is that it comes from Iceberg API jar and as such is not related to the Iceberg REST API directly. Moreover, Polaris needs to have flexibility in producing REST responses, which is not provided for by CommitFailedException
in general.
I'd personally prefer to gradually migrate all REST error handling to Polaris exceptions in cases where the error is triggered by Polaris code.
I think it's fine to have a mix of exceptions during the transition period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then... let's not forget to do the migration later, it would be nice to have a consistent handling of this kind of conflicts in Polaris.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -756,7 +756,7 @@ public boolean removeProperties(Namespace namespace, Set<String> properties) | |||
.map(PolarisEntity::new) | |||
.orElse(null); | |||
if (returnedEntity == null) { | |||
throw new RuntimeException("Concurrent modification of namespace: " + namespace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning 409 in this case is preferable to 500.
However, as far as #1390 is concerned, I believe Polaris can handle this case better. The Iceberg REST API defines property changes in terms or "removals" and "updates". Nothing is specified about checks for the previous state of Namespace properties. In that regard a read-then-write-if-not-modified pattern in the current Polaris code is prone to errors even if the property changes are non-conflicting in themselves.
I believe a better fix would be to add retries in case the entity is modified by another process in this middle of this operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: I do not think a 409 response is justified in this case from the client's perspective. Per #1390 clients' requests are not conflicting with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add the retry logic but there could still be a case when everything fails, so what do we want to return in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use a configurable retry timeout and the 408 status code when the timeout expires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually 408 has a different purpose... Maybe 504 (Gateway Timeout) fits this use case.
@fabio-rizzo-01 : Could you start a dev
email discussion on this for visibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimas-b email sent
…apache#1390 # Conflicts: # service/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java
4959c60
to
7b93681
Compare
} | ||
return switch (exception) { | ||
case AlreadyExistsException alreadyExistsException -> Response.Status.CONFLICT; | ||
case CommitConflictException commitConflictException -> Response.Status.REQUEST_TIMEOUT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe 408 does not match this use case. Per PRC 9110: The 408 (Request Timeout) status code indicates that the server did not receive a complete request message within the time that it was prepared to wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the custom exception we can return 500 at this point, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current state of the PR (no retries on persistence-level conflicts) I believe the 409 (Conflict) response code is the most relevant.
However, as discussed on the dev email thread, I do not think 409 is the best possible implementation of the Iceberg REST API. I believe Polaris should retry on the server side and respond with 429 (Too Many Requests) if retries do not help.
That said, I understand that implementing retries is a much bigger effort, I'd be ok with 409 for now (as it is incrementally better than 500).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so for now I'll use 409 and we can have an enhancement for the retries, ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @fabio-rizzo-01 ! I'll leave this PR over for a couple of days for other people to review.
365bf4e
good to be merged? I'll create a new task for the retry logic @dimas-b |
I followed the same pattern used in methods of the same class, like commit(...). This will return a 409 back to the caller instead of a 500 error
Solves (#1390)