-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap fatal TX errors in a new vterrors
code
#17669
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Florent Poinsard <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
vterrors
errorvterrors
code
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17669 +/- ##
==========================================
- Coverage 67.94% 67.45% -0.50%
==========================================
Files 1586 1592 +6
Lines 255224 258250 +3026
==========================================
+ Hits 173420 174198 +778
- Misses 81804 84052 +2248 ☔ View full report in Codecov by Sentry. |
go/vt/vterrors/code.go
Outdated
@@ -120,6 +120,8 @@ var ( | |||
VT14004 = errorWithoutState("VT14004", vtrpcpb.Code_UNAVAILABLE, "cannot find keyspace for: %s", "The specified keyspace could not be found.") | |||
VT14005 = errorWithoutState("VT14005", vtrpcpb.Code_UNAVAILABLE, "cannot lookup sidecar database for keyspace: %s", "Failed to read sidecar database identifier.") | |||
|
|||
VT15001 = errorWithNoCode("VT15001", "session invalidated: close/reopen connection if applicable: %s", "This error means that the opened transaction should be closed by the application and re-opened.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"session invalidated" may be misleading
- if you are in a transaction, that needs to be closed, and a new transaction opened.
- if you are not in a transaction, a retry is probably what you need to do.
@harshit-gangal wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the transaction which no longer exists. They can continue to use the same connection.
"rollback", followed by a retry is fine.
outside of transaction, retry is enough as Deepthi pointed.
All 3 error message should have different error code as they happen at different point in time of PRS/ERS
Notes from further offline discussion:
We need to spend some more time exploring these options. |
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
If there is a cluster event error, we will invalidate the session, the client is supposed to catch the error and issue a rollback, until then commit and other queries will fail until a rollback is received. |
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
proto/vtgate.proto
Outdated
|
||
bool tx_error_block_next_queries = 28; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's make it clear, options:
freeze_until_rollback
fail_until_rollback
block_until_rollback
rollback_expected
lock_until_rollback
rollback_required
reject_until_rollback
error_until_rollback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it via 812df8f, I picked error_until_rollback
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of changing in this file, we can do it in go/vt/vttablet/queryservice/wrapped.go
This is what scatter conn uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one step further, we can check the error in scatter_conn itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the wrapping to the wrapper via f5ee299
|
||
reparent(t, clusterInstance, tablets, tabletStopped, commitDone) | ||
|
||
_, err := conn.ExecuteFetch("delete from vt_insert_test", 0, false) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should validate by selecting from the table that no data exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue with selecting is that the commit can be a partial commit, leading to rows on the healthy shards.
go/test/endtoend/reparent/newfeaturetest/reparent_in_tx_test.go
Outdated
Show resolved
Hide resolved
// rollbackExecIfNeeded rollbacks the partial execution if earlier it was detected that it needs partial query execution to be rolled back. | ||
func (e *Executor) rollbackExecIfNeeded(ctx context.Context, safeSession *econtext.SafeSession, bindVars map[string]*querypb.BindVariable, logStats *logstats.LogStats, err error) error { | ||
if safeSession.InTransaction() && safeSession.IsRollbackSet() { | ||
if !safeSession.InTransaction() { | ||
return err | ||
} | ||
if e.rollbackOnFatalTxError(ctx, safeSession, err) { | ||
return err | ||
} | ||
|
||
if safeSession.IsRollbackSet() { | ||
rErr := e.rollbackPartialExec(ctx, safeSession, bindVars, logStats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StreamExecute calls this only when it is called for DML query. This is valid for Select query as well inside a transaction
// Check if there was partial DML execution. If so, rollback the effect of the partially executed query.
if err != nil {
if !canReturnRows(plan.Type) {
return e.rollbackExecIfNeeded(ctx, safeSession, bindVars, logStats, err)
}
return err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle this at the scatter_conn level by deciding to rollback the shard session on a VT15001 error?
Otherwise, the responsibility falls on the callers of scatter_conn methods (Execute, ExecuteMultiShard, and StreamExecuteMulti).
This behavior should be independent of the executed query—it’s purely about whether the transaction is open or not.
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Description
This Pull Request adds a new
vterrors
code (VT15001
) that wraps common errors appearing when a cluster event (rollout, PRS, primary shutdown, ...) happens on a shard with an active transaction.The errors listed in #17668 now look like this:
In addition to wrap these errors, we also make sure to rollback and clear the transaction (if applicable) when such error happens, leaving the client responsible for catching the error and retrying the transaction.
Documentation
Related Issue(s)