Skip to content

RFC: Improve OCI Image Python Tooling #388

Open
@d4l3k

Description

@d4l3k

Description

Quite a few of the cloud services / cluster tools for running ML jobs use OCI/Docker containers so I've been looking into how to make dealing with these easier.

Container based services:

TorchX currently supports patches on top of existing images to make it fast to iterate and then launch a training job. These patches are just overlaying files from the local directory on top of a base image. Our current patching implementation relies on having a local docker daemon to build a patch layer and push it: https://github.com/pytorch/torchx/blob/main/torchx/schedulers/docker_scheduler.py#L437-L493

Ideally we could build a patch layer and push it in pure Python without requiring any local docker instances since that's an extra burden on ML researchers/users. Building a patch should be fairly straightforward since it's just appending to a layer and pushing will require some ability to talk to the registry to download/upload containers.

It seems like OCI containers are a logical choice to use for packaging ML training jobs/apps but the current Python tooling is fairly lacking as far as I can see. Making it easier to work with this will likely help with the cloud story.

Detailed Proposal

Create a library for Python to manipulate OCI images with the following subset of features:

  • download/upload images to OCI repos
  • append layers to OCI images

Non-goals:

  • Execute containers
  • Dockerfiles

Alternatives

Additional context/links

There is an existing oci-python library but it's fairly early. May be able to build upon it to enable this.

I opened an issue there as well: vsoch/oci-python#15

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest for Feedback & RoadmapsenhancementNew feature or requestkuberneteskubernetes and volcano schedulersslurmslurm scheduler

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions