Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add document about single region 1PC #57

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

MyonKeminta
Copy link
Contributor

Signed-off-by: MyonKeminta [email protected]

This PR adds a draft document for single region 1PC. This PR is also expected to be the place to do related discussions.

Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
@MyonKeminta
Copy link
Contributor Author

@cfzjywxk @sticnarf @youjiali1995 @coocood PTAL. How do you think the future plan should be?

@sticnarf
Copy link
Contributor

The performance improvement is promising. I think we can get started with dealing with the corner cases and compatibility issues.

@sticnarf
Copy link
Contributor

And I think we can commit your 1PC implementation to the codebase now as long as it's not enabled.


CDC syncs data from TiKV by observing applying events. Prewrites and commits are distinguished, and CDC will use these events to compose complete transactions and then send them to the downstream. So when 1PC is enabled, there need to be some way for CDC to distinguish if a apply event is caused by 1PC committing, otherwise CDC will expect that a commit event must has a corresponding prewrite event to compose a complete transaction.

Possible solutions:
Copy link

@coocood coocood Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, we should have a way that disables 1PC when CDC is connected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will 1PC be shipped with the problem unsolved? Otherwise I think we can allow enabling 1PC only after resolving these problems, then we won't need to automatically disable 1PC when CDC is connected.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can enable 1PC by default on the master branch.
It makes POC much easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if enabling it by default is a good idea unless we've done enough tests and it's stable enough.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable it by default can make the feature mature more quickly.

@youjiali1995
Copy link
Contributor

Land it in v5.0. There isn't any unsolvable problem. We can polish and test it with async-commit in the next few months.

@MyonKeminta
Copy link
Contributor Author

It's exciting that you agree landing it in 5.0. But I'm still afraid that there might still be other corner cases or compatibility issues I didn't notice.

@cfzjywxk
Copy link
Contributor

cfzjywxk commented Oct 13, 2020

Land it in v5.0. There isn't any unsolvable problem. We can polish and test it with async-commit in the next few months.

Agree to land it in v5.0, we could limit the usage scenarios for it and do enough tests. We should take care of compatible problems and do not create extra work for the tools team.

@MyonKeminta
Copy link
Contributor Author

@youjiali1995 Do you know if this affects TiFlash?

@youjiali1995
Copy link
Contributor

youjiali1995 commented Oct 13, 2020

TiFlash may have some protective actions like CDC doing, but it's easy to solve. Besides it, I think no compatibility issues with TiFlash.

No compatibility issues with TiFlash.


For transactions that affect only one region, or more strictly speaking, that can be prewritten with only one prewrite request, can be committed directly while the prewrite request is being handled, so the commit phase can be totally removed. Therefore we can get less latency and more throughput. For TiDB, indices and rows are unsally not in a same region, **so this optimization only works under a limited amount of scenarios, like sysbench oltp_update_non_index**. But in a suitable scenario, it gains significantly better performance.

We used to think about supporting 1PC before but stopped after meeting many difficulties. Now since async commit, which meets and solved mostly the same problem as 1PC, is implemented, we can continue supporting 1PC with much less effort to make.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @cfzjywxk :

Maybe we could extend it to 1PC for single-store transactions with some placement usages in the future.

Copy link
Collaborator

@nrc nrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal looks good. I'd be wary of shipping in 5.0 - async commit is already a really large, high risk change and I'd minimise other changes to the transaction system. Furthermore, although the perf improvements for 1pc are good, they are really good compared to 2pc and ok compared to async commit.


Since async commit is implemented, it's not difficult to implement a working 1PC. However, it's still hard to make it perfectly correct and compatible with other components.

* When committing a transaction, if TiDB finds that the prewrite phase can be done with only one single request, the transaction is allowed to be committed with 1PC protocol. A field named `try_one_pc` in the prewrite request will be set to let TiKV know that 1PC is available for this transaction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this be set or not set? If and only if the transaction only touches one region/store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be set iff the prewrite can be done with only one request. Typically it means the transaction affects only one region. But if there's too many affected keys in a region, it may need to be prewritten with more than one requests, in which case 1PC is not applicable.

CDC syncs data from TiKV by observing applying events. Prewrites and commits are distinguished, and CDC will use these events to compose complete transactions and then send them to the downstream. So when 1PC is enabled, there need to be some way for CDC to distinguish if a apply event is caused by 1PC committing, otherwise CDC will expect that a commit event must has a corresponding prewrite event to compose a complete transaction.

Possible solutions:
1. Passing a `is_1pc` flag from txn layer to apply, just like how `TxnExtra` (which is used to support CDC outputting old value) was written previously. It would be ugly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the least ugly solution :-) 2 is very fragile, 3 seems wasteful and make write records more complex.

@@ -0,0 +1,59 @@
# Single Region 1PC

For transactions that affect only one region, or more strictly speaking, that can be prewritten with only one prewrite request, can be committed directly while the prewrite request is being handled, so the commit phase can be totally removed. Therefore we can get less latency and more throughput. For TiDB, indices and rows are unsally not in a same region, **so this optimization only works under a limited amount of scenarios, like sysbench oltp_update_non_index**. But in a suitable scenario, it gains significantly better performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For transactions that affect only one region, or more strictly speaking, that can be prewritten with only one prewrite request, can be committed directly while the prewrite request is being handled, so the commit phase can be totally removed. Therefore we can get less latency and more throughput. For TiDB, indices and rows are unsally not in a same region, **so this optimization only works under a limited amount of scenarios, like sysbench oltp_update_non_index**. But in a suitable scenario, it gains significantly better performance.
For transactions that affect only one region, or more strictly speaking, that can be prewritten with only one prewrite request, can be committed directly while the prewrite request is being handled, so the commit phase can be totally removed. Therefore we can get less latency and more throughput. For TiDB, indices and rows are usually not in a same region, **so this optimization only works under a limited amount of scenarios, like sysbench oltp_update_non_index**. But in a suitable scenario, it gains significantly better performance.

@andylokandy
Copy link
Collaborator

@MyonKeminta Do we have any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants