Skip to content

Conversation

clumsy
Copy link
Contributor

@clumsy clumsy commented Sep 25, 2025

Adding TORCHX_IMAGE to environment variables so that we give a chance to the payload to record the value beyond the retention restrictions of the scheduler so as to facilitate reproducibility.

Test plan:
[x] updated unit tests

[x] local_docker --workspace "" -> no image built or pushed -> we expect to see the base image in TORCHX_IMAGE env vars

torchx run -s local_docker --workspace "" utils.sh --image alpine:latest -- env

torchx 2025-09-25 11:32:59 INFO     Pulling container image: alpine:latest (this may take a while)
...
sh/0 TORCHX_IMAGE=alpine:latest
...

[x] local_docker -> image built locally, but not pushed -> we expect to see local image SHA in TORCHX_IMAGE env vars

torchx run -s local_docker utils.sh --image alpine:latest -- env

...
torchx 2025-09-25 11:32:45 INFO     Built new image `sha256:7ab1cd30c98bd9ebea5760b86fdb9984ced914aace098589ccc986bb8dd1b508` based on original image `alpine:latest` and changes in workspace `/private/tmp/torchx-test` for role[0]=sh.
...
sh/0 TORCHX_IMAGE=sha256:7ab1cd30c98bd9ebea5760b86fdb9984ced914aace098589ccc986bb8dd1b508
...

[x] aws_batch --image_repo -> image built locally, tagged and pushed to image repo -> we expect to see image tag in TORCHX_IMAGE env vars

torchx run -s aws_batch --scheduler_args 'queue=<queue-name>,priority=<priority>,image_repo=<image-repo>,share_id=<share-id>' utils.sh -h g5.4xlarge --image alpine:latest

...
torchx 2025-09-25 11:37:02 INFO     pushing image <image-repo>:7ab1cd30c98bd9ebea5760b86fdb9984ced914aace098589ccc986bb8dd1b508...
...

Observe in Batch Job details:

TORCHX_IMAGE
<image-repo>:7ab1cd30c98bd9ebea5760b86fdb9984ced914aace098589ccc986bb8dd1b508

@clumsy
Copy link
Contributor Author

clumsy commented Sep 25, 2025

For your consideration, @kiukchung @andywag @d4l3k

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2025
@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.63%. Comparing base (b72ba03) to head (6091ca1).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1129   +/-   ##
=======================================
  Coverage   91.63%   91.63%           
=======================================
  Files          83       83           
  Lines        6392     6397    +5     
=======================================
+ Hits         5857     5862    +5     
  Misses        535      535           
Flag Coverage Δ
unittests 91.63% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@clumsy
Copy link
Contributor Author

clumsy commented Sep 25, 2025

The lint failure seems unrelated, @kiukchung

Warning, treated as error:
/home/runner/work/torchx/torchx/docs/source/schedulers/kubernetes.rst:8:broken link: https://github.com/pytorch/torchx/issues/120 (404 Client Error: Not Found for url: https://github.com/pytorch/torchx/issues/120)
make: *** [Makefile:29: linkcheck] Error 2
(schedulers/kubernetes: line    8) 

@kiukchung
Copy link
Contributor

The lint failure seems unrelated, @kiukchung

Warning, treated as error:
/home/runner/work/torchx/torchx/docs/source/schedulers/kubernetes.rst:8:broken link: https://github.com/pytorch/torchx/issues/120 (404 Client Error: Not Found for url: https://github.com/pytorch/torchx/issues/120)
make: *** [Makefile:29: linkcheck] Error 2
(schedulers/kubernetes: line    8) 

this is unrelated to your changes I'll override it. We're experiencing a few broken assets due to the migration of this repo from github.com/pytorch/torchx to github.com/meta-pytorch/torchx

fix for doctest is: #1131 but it seems that our AWS creds are broken at the moment.

@clumsy
Copy link
Contributor Author

clumsy commented Sep 25, 2025

Sounds good, @kiukchung!

Looks like aws ecr login command is not working to get credentials for Docker to push the image. Let me know if I can help somehow.

@facebook-github-bot
Copy link
Contributor

@kiukchung has imported this pull request. If you are a Meta employee, you can view this in D83282650.

@kiukchung kiukchung merged commit 3b5df3a into meta-pytorch:main Oct 1, 2025
20 of 22 checks passed
@clumsy clumsy deleted the feat/docker_image_env_var branch October 1, 2025 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants