Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External hinting system for automatic layering #113

Open
mikepurvis opened this issue Jan 30, 2024 · 2 comments
Open

External hinting system for automatic layering #113

mikepurvis opened this issue Jan 30, 2024 · 2 comments

Comments

@mikepurvis
Copy link
Contributor

Some limitations of the "popularity" based approach for automatic layering:

  • It can only think in single store paths, rather than recognizing clusters that make sense to group together.
  • It doesn't have any temporal context, for example optimizing blobs for how much their contents change over time.
  • It doesn't have any awareness of dependency chains (other than for the popularity number).
  • It doesn't account for the size of the store paths.

None of these are huge issues with "small" images, but they start to really limit the effectiveness of the layering once there are thousands of store paths going into an image.

One possible way to improve this situation would be to have some kind external scanner tool that could examine a bunch of related images, and maybe also instances of the images/closures over time, and produce an output that could be checked into source control and used to better optimize automatic layer generation for successive builds. By checking it in, builds remain pure and the developer is in control of how frequently to update the hint file (likely in conjunction with dependency changes or flake updates).

If there's interest in such a thing, perhaps this ticket can be a place to discuss what such a file could look like and how would be most effective to collect the data.

@nlewo
Copy link
Owner

nlewo commented Feb 22, 2024

That would be really fun to implement: collecting all image graphs and find common subgraphs to isolate them in layers!

some kind external scanner tool that could examine a bunch of related images

I don't know what exactly you mean by "related images" but i think to generate a pertinent profile, we would need to have the whole closure graph of all images which are not available in the built images. This means this profile should have to be generated by consuming image Nix expressions.
Maybe we could generate useful profile from the image JSON file, but this profile would be suboptimal, and i don't see an advantage of consuming the image JSON file instead of Nix expression.

It doesn't account for the size of the store paths.

I think this should be added to the current algorithm because generating tiny deep layers doesn't make sense.

It doesn't have any temporal context, for example optimizing blobs for how much their contents change over time.

In practice, i'm not sure this could be convenient since the analyzer would have to checkout several commits to compute a profile.
Or, we could store the graph in the image filesystem or image metadata: this would also to fetch a bunch of image from a registry to compute a profile.

@adminy
Copy link

adminy commented Feb 28, 2025

tvix store is storing packages in their cool new object store, which is now got a metadata and a chunk service which might be OCI compliant since they are also trying to have OCI builders backends, so just wondering if that could be used to just pull nix packages directly as layers. The best layering is when someone else does it for you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants