-
Notifications
You must be signed in to change notification settings - Fork 28
WIP: Staged layer creation #378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6414 |
Podman PR containers/podman#27251 and the buildah test PR containers/buildah#6414 from the bot both look good so that means we can remove the special case from ApplyDiff() in overlay I think, ref containers/podman#25862 (comment) I still need to work on the actual feature here though to extract while the store in unlocked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, simplifying ApplyDiff
this way does look correct. (I didn’t carefully look at the tempdir
addition yet.)
Add a new function to stage additions. This should be used to extract the layer content into a temp directory without holding the storage lock and then under the lock just rename the directory into the final location to reduce the lock contention. Signed-off-by: Paul Holzinger <[email protected]>
Used to create temporary files that can be "commited" at a later point by renaming them. Signed-off-by: Paul Holzinger <[email protected]>
That caller in create() already had the layer created in memory so another lookup roundtrip is unnecessary here. Signed-off-by: Paul Holzinger <[email protected]>
It is not clear to me when it will hit the code path there, by normal layer creation we always pass a valid parent so this branch is never reached AFAICT. Let's remove it and see if all tests still pass in podman, buildah and others... Signed-off-by: Paul Holzinger <[email protected]>
MAke it so the apply logic can be provided as argument which should help the future work to call this function unlocked and let it extract to a temp dir instead. Signed-off-by: Paul Holzinger <[email protected]>
348a11e
to
b7780f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m mostly looking because I was curious — feel free to disregard.
The tar-split comment might explain some of the “unexpected EOF” test failures.
} | ||
td.counter++ | ||
if err := callback(tmpAddPath); err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Non-blocking: It might be useful to delete anything created inside at this point already, it’s clearly not going to be used. .Cleanup
will eventually do it, so that’s fine — doing it earlier might make more of the disk space available again immediately. But then again, any users that care about such a difference are so out of space that they have bigger problems to worry about.
Applies similarly, but even less urgently, to StageFileAddition
.)
storage/layers.go
Outdated
return -1, err | ||
} | ||
return size, err | ||
return applyFunc(layer.ID, layer.Parent, options, &tsdata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively moves the write of tsdata
inside this closure, and I don’t think that works: we need compressor
to be closed before consuming tsdata
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is what I figured out after some debugging as well, good catch.
storage/drivers/overlay/overlay.go
Outdated
tempDirRoot := d.getTempDirRoot(id) | ||
t, err := tempdir.NewTempDir(tempDirRoot) | ||
if err != nil { | ||
return nil, nil, 0, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Generally I’d prefer -1
or other clearly invalid “size” values on error paths, to be a tiny bit more likely to fail in hypothetical error handling mistakes… *grumble* Rust does this so much better.)
if _, idInUse := r.byid[id]; idInUse { | ||
return ErrDuplicateID | ||
} | ||
names = dedupeStrings(names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Absolutely non-blocking, and pre-existing: dedupeStrings
does at least a hash table lookup, and potentially an allocation and more; AFAICS it would be more efficient to just do the r.byname[…]
check for all entries of names
, even if they were duplicates.
… and, anyway, the two callers provide de-duplicated names
already.)
applyDiffTemporaryDriver, ok := r.driver.(drivers.ApplyDiffStaging) | ||
if ok && diff != nil { | ||
// CRITICAL, this releases the lock so we can extract this unlocked | ||
r.stopWriting() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of design rather worries me; it’s not transparent to callers who just see “// Requires startWriting.
” in the documentation and assume that if they obtained a startWriting
lock, their state will not change by the call to this create
. It’s hard to reason about.
Conceptually, I think the overlay driver doesn’t really need to know the precise layer ID for a newly-created layer in order to determine the right getTempDirRoot
, if this caller assures the driver that the ID is fresh and not conflicting with anything. (For image layers, the ID is deterministic, and we check that it doesn’t exist before trying to pull; but a concurrent process might create it before we finish, so conflicts can and do occur, and need to be carefully considered.) In such a design, I think most of the code in create
before this point does not strictly need to run before the applyDiffUnlocked
, but also I didn’t carefully read/check everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I might have been to focused to make proper quick ID and name conflict lookup first before doing the expensive lookup to "fail fast" when possible.
I guess design wise it makes sense to push this all the way up the stack. I do agree that unlock/lock patter is quite dangerous and I have seen it fail to many times in podman already so if we can avoid it then we should do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we probably want the “ID already exists” check to exist when creating image layers — so on the substance of the thing, this might be ~exactly right already.
Shaping the call stack is a maintainability concern that is really only worth worrying about after the code works.
I’m mentioning this early mostly in hope that it might avoid work on “perfect” implementation of the current approach, some of which would need to be re-done afterwards; and because the “give me a staging directory for an a future layer, I don’t know the ID yet” method would be a new concept not currently existing in the driver API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW we can make the current design work — rename create
to createTemporarilyUnlockingLock
or something like that.
} | ||
slices.Sort(layer.GIDs) | ||
|
||
err = r.saveFor(layer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the caller is required to do this afterward, that should be documented.
storage/layers.go
Outdated
return err | ||
} | ||
|
||
applyDiff = func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know it’s way too early to have an opinion.) The nested closures-returning-closures might be somewhat difficult to track.
Maybe some of this should be a stateful object / interface (driverLayerDiffAdapter
???) that ~hides the difference between drivers that do and don’t support unlocked layer staging.
Add a function to apply the diff into a tmporary directory so we can do that unlcoked and only rename under the lock. Signed-off-by: Paul Holzinger <[email protected]>
Signed-off-by: Paul Holzinger <[email protected]>
The compressor must be closed before we write the bytes. However overall I am not sure why we did write all bytes fully into memory first. So chnage it to directly write to a file but still use a buffer for that to avoid many small writes. Signed-off-by: Paul Holzinger <[email protected]>
b7780f2
to
bbb2266
Compare
@mtrmac FYI I have not really addressed most of your comments yet, I am just trying to push things to see how much things break. Still seeing plenty of test failures. Issue 1 I see is that I just use the 700 permission from the tmpdir due the rename instead of the proper diff dir creation permissions that are in the driver.create() code
Not sure if I should expose that into the tmpdir creation logic, I guess that makes the most sense since only the dirver should now the exact permission that should be used? Second problem I see are timeouts (in parallel running tests) which I guess mean I added a deadlock situation? I guess looking at the code this unlock/lock again thing I did is indeed completely broken and unsafe due ABBA deadlock, i.e. in putlayer we also hold the containerStore lock so only unlocking the layer store makes it possible that another process can get the layer lock and then blocks on the still gold container store thus both process handing forever. |
I think that could work. I was thinking
Per the locking hierarchy documented at the top of |
Yeah my thinking was that the callback provides a "lifetime" of when the path is safe to use, if I return a string/struct with the path then the caller can cleanup/commit and then still use the path afterwards. This is really where I start to hate go because in rust this would be trivial to enforce so that there could only ever be one call to commit and then render the object useless afterwards. But yes usage wise this callback is indeed getting quite ugly to the point where just returning the path is much simpler and well how go works in general. I do like the suggestion of just returning the path to consolidate both tmpdir functions into one so I will go with that. |
No description provided.