-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove operations pull completed tokens from txn-queue #301
base: development
Are you sure you want to change the base?
Remove operations pull completed tokens from txn-queue #301
Conversation
We are directly inserting the Queue, but instead of just blindly copying what was in the old document, apply the $pullAll that we would have applied if this were an Update. Because we are creating the stash document from scratch, we don't need to ask Mongo to do the work.
Essentially, we don't have to do both Insert and Remove, because one of them must have been done in order to do the other. This way, a series of only Insert and Remove operations won't cause the txn-queue to get out of hand.
I should also note that I added a -fast flag to the txn test suite, as the StashStressTest takes about 2minutes for me, while the rest of the test suite takes something like 15s total time. |
hm... I just had a failure on:FAIL: sim_test.go:110: S.TestSimReinsertCopy4Workers sim_test.go:308: I'll investigate as that is the sort of race this change could have introduced. |
It turns out that I missed we had just reread the document as part of the 'add txn-insert' or 'add txn-remove' to the document. That gives us an updated txn-queue that needs to be handled. Since we are now cleaning directly on the txn-queue instead of the cached queue docs, go ahead and do it on both Insert and Remove.
To follow up, I had missed that the insert that flags the item as marked for removal/marked for insert rereads the txn-queue, and that is the queue that we were using for the stash value. Not the queue that we started the function with. But that should be handled, and I ran |
@@ -998,6 +1004,23 @@ func tokensToPull(dqueue []tokenAndId, pull map[bson.ObjectId]*transaction, dont | |||
return result | |||
} | |||
|
|||
// cleanQueue takes an existing queue and removes all the items in pull from it, generating a new txn-queue. | |||
func cleanQueue(queue []token, pull map[bson.ObjectId]*transaction, dontPull token) ([]token) { | |||
cleaned := make([]token, 0, len(queue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When filtering like this, a nice trick I like is:
cleaned := queue[:0]
Because the slice is created with the same underlying array, there's no re-allocation up to the original length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost went this way, but ended up avoiding it because it modifies the original object. We do use it for debug logging later on, so I didn't want to confuse the situation, especially since it doesn't rewrite info.Queue's length, it means an array that could end up with the tail doubled.
txn/flusher.go
Outdated
@@ -998,6 +1004,23 @@ func tokensToPull(dqueue []tokenAndId, pull map[bson.ObjectId]*transaction, dont | |||
return result | |||
} | |||
|
|||
// cleanQueue takes an existing queue and removes all the items in pull from it, generating a new txn-queue. | |||
func cleanQueue(queue []token, pull map[bson.ObjectId]*transaction, dontPull token) ([]token) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is filterQueue
a better name for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works for me. (done)
txn/txn_test.go
Outdated
var qdoc txnQueue | ||
err = s.accounts.FindId(0).One(&qdoc) | ||
c.Assert(err, IsNil) | ||
err = s.accounts.FindId(0).One(&qdoc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there are reason to find the doc twice? Or is this an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a copy paste error in the test, fixed.
This is the patch from juju/mgo#4 globalsign/mgo#301 It filters the txn-queue when copying a document from or to the stash, just like we would do during a normal Update operation. This prevents a sequence of Insert + Remove pairs from filling up the txn-queue.
#9486 ## Description of change This is the patch from juju/mgo#4 globalsign/mgo#301 It filters the txn-queue when copying a document from or to the stash, just like we would do during a normal Update operation. This prevents a sequence of Insert + Remove pairs from filling up the txn-queue. ## QA steps The associated test shows the basic change. To trigger this manually you can: ``` $ juju bootstrap lxd $ while true; do juju deploy cs:~jameinel/ubuntu-lite; sleep 90; juju remove-application ubuntu-lite; sleep 30 done ``` While doing that, you should be able to go into the juju controller and either ``` > db.constraints.find() > db.txns.stash.find({"_id.c": "constraints"}) ``` And see that there is a document whose txn-queue is growing quite large. With this patch, the txn-queue should stay around 1 or 2 entries. ## Documentation changes None. ## Bug reference https://bugs.launchpad.net/juju/+bug/1804197
We ran into an odd edge case in our system, where we had a document that was being created and then removed repeatedly. (People were testing if the system worked by creating an object and then immediately removing it.)
The code for Insert and Remove just copies whatever the txn-queue was from the collection to the stash and back. This changes it so that both clean up the txn-queue while it is creating the document in the stash. This should be functionally equivalent to the
$pullAll
that we do during Update. It should be safe to do this because we are expressly creating the document out of 'whole cloth' with an Insert, so it isn't possible for us to be racing with another thread that is trying to add another txn to the queue. (If it was, this doesn't change the race that we would have already had).There is an extra pass over the freshly read txn-queue, but in well behaved systems, the queue should always be kept small. This is mostly a case where we found we weren't behaving well.