Skip to content

DockerWorkspace: building large projects can be slower than ideal #495

@d4l3k

Description

@d4l3k

🐛 Bug

Module (check all that applies):

  • [x ] 'torchx.workspace`

To Reproduce

For large projects users may only care about a subset of the files. This means that they either have to restructure their projects -- build custom per project Docker images or suffer some slow builds.

The patched layer can be a few hundred megabytes and take maybe 30s per layer to build and push.

This seems to be torchx specific (BuildKit builds the context faster) so likely due to how we walk the workspace + build/transfer the tar.

Options

Optimize Tar building for large files

There's likely some work we can do to optimize building the tarballs when passing them to the docker context. Not sure if we need a special path for on disk images, fsspec changes or checking compression settings.

Differential Build Based Off of Git Hash

We could try and diff the previous layer added via git so we just apply a layer on top of the existing hash we've built. There's some caveats to this but might be a reasonable option to build a tag based off of the Docker commit and then apply the local changes on top

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocker

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions