-
Notifications
You must be signed in to change notification settings - Fork 705
ci: OPS-2136: Setup caching for docker build #4535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Dillon Cullinan <[email protected]>
WalkthroughCentralized container registry authentication by introducing a new reusable docker-login GitHub Actions action. Refactored workflows and actions to use this action instead of inline login commands, eliminating distributed credential handling. Modified docker-build action to dynamically compute ECR hostname and extract architecture from platform for cache optimization. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/actions/docker-build/action.yml (2)
159-159: Consolidate duplicate architecture extraction logic.Line 159 extracts platform architecture using
sed 's/linux\///', but line 85 already extracts this same value into theARCHvariable using"${PLATFORM##*/}". The metrics section should reuse the earlierARCHvariable instead of re-extracting it. This eliminates duplication and potential inconsistency if either method is later modified.- PLATFORM_ARCH=$(echo "${{ inputs.platform }}" | sed 's/linux\///') + PLATFORM_ARCH="${ARCH}" # Already extracted at line 85
77-77: ECR_HOSTNAME construction is technically valid concern but all callers provide credentials.The action inputs
aws_account_idandaws_default_regionare optional with no defaults. Line 77 would create a malformed hostname like.dkr.ecr..amazonaws.comif either is missing. However, all five current invocations of this action—across container-validation-backends.yml and nightly-ci.yml—consistently provide both credentials via GitHub secrets. In practice, this is a CI/CD infrastructure action with implicit credential requirements that all callers satisfy.The defensive programming suggestion remains valid: consider either marking these inputs as
required: trueor adding graceful handling (e.g., conditionally skip cache-from flags when credentials are absent). This would prevent future misuse if the action is called outside these established workflows.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/actions/docker-build/action.yml(2 hunks).github/actions/docker-login/action.yml(1 hunks).github/actions/docker-tag-push/action.yml(0 hunks).github/workflows/container-validation-backends.yml(4 hunks).github/workflows/nightly-ci.yml(4 hunks)
💤 Files with no reviewable changes (1)
- .github/actions/docker-tag-push/action.yml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.
Applied to files:
.github/actions/docker-login/action.yml
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Applied to files:
.github/actions/docker-login/action.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: sglang (amd64)
- GitHub Check: trtllm (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: vllm (arm64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (7)
.github/workflows/nightly-ci.yml (3)
21-29: Approve Docker Login integration in vllm job.The Docker Login step is correctly positioned and all registry credentials are provided. This pattern is consistent with the PR objective of centralizing authentication.
92-92: Approve YAML anchor pattern for framework builds.The YAML anchor
&framework-build-stepsat line 92 is correctly defined and referenced via*framework-build-stepsin the sglang-framework (line 146) and trtllm-framework (lines 148-159) jobs. This reduces duplication and ensures consistent build steps across all framework variants. The Docker Login step is properly included in the shared definition.Also applies to: 146-146, 159-159
31-47: Approve removal of ngc_ci_access_token from docker-build inputs.The ngc_ci_access_token is correctly removed from the docker-build action call, as authentication is now centralized via the Docker Login step. The ci_token (GitHub token) and other credentials are properly passed to the build action.
.github/workflows/container-validation-backends.yml (2)
74-82: Approve consistent Docker Login integration across all build jobs.The Docker Login step is correctly and consistently placed before container build/push operations across operator, vllm, sglang, and trtllm jobs. All registry credentials are uniformly provided, aligning with the PR objective of centralizing authentication.
Also applies to: 150-158, 224-232, 294-302
86-86: Approve local ECR_HOSTNAME computation in operator job.The operator job computes ECR_HOSTNAME locally for use in its custom docker build commands (not using the docker-build action). While docker-login has already authenticated to ECR, computing the hostname locally is acceptable for reference purposes and keeps the build logic self-contained. This is a minor duplication that doesn't impact functionality.
Also applies to: 93-93, 104-104, 113-113
.github/actions/docker-login/action.yml (1)
32-48: Approve conditional registry login implementations.The three conditional login steps (ECR, NGC, ACR) are correctly implemented:
- Each step only runs if its required credentials are provided
- Credentials are passed via stdin (
--password-stdin) to avoid CLI argument exposure- ECR uses AWS CLI (
aws ecr get-login-password), which is standard on GitHub runners- NGC hardcodes the username as
'$oauthtoken', which is correct for NGC API- ACR accepts provided username/password
Ensure the GitHub runner environment includes AWS CLI for ECR login, or add a setup step if targeting self-hosted runners.
.github/actions/docker-build/action.yml (1)
112-113: Verify that cache-from image tags exist in ECR registry.The cache-from references (lines 112-113) use hardcoded image tags (
main-dynamo-base-${ARCH}andmain-${framework}-framework-${ARCH}) that don't appear elsewhere in this repository and aren't produced by any visible build or push operations here. These tags must pre-exist in your ECR registry at${ECR_HOSTNAME}/ai-dynamo/dynamoor Docker will silently skip the cache without warning, negating the build performance benefit.Confirm:
- These images are built and pushed by an external CI process (e.g., nightly builds)
- The tags follow the exact pattern specified (especially
main-prefix and-${ARCH}suffix)- The base image tag (
main-dynamo-base-${ARCH}) actually exists; it's less obvious whether this is actively maintained
| name: 'Docker Build' | ||
| description: 'Build Dynamo container images' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Fix action metadata: name, description, and remove unused outputs.
The action metadata contains errors copied from the docker-build action:
- Line 1:
nameshould be "Docker Login" not "Docker Build" - Line 2:
descriptionshould describe registry authentication, not building images - Lines 24-27: The
outputssection referencessteps.build.outputs.image_tag, but this action has no build step and shouldn't output an image tag
-name: 'Docker Build'
-description: 'Build Dynamo container images'
+name: 'Docker Login'
+description: 'Log in to container registries (ECR, NGC, ACR)'
-outputs:
- image_tag:
- description: 'Image Tag'
- value: ${{ steps.build.outputs.image_tag }}Run the following script to verify that no workflows depend on the docker-login action's image_tag output:
Also applies to: 24-27
🏁 Script executed:
#!/bin/bash
# Description: Check if any workflows reference docker-login outputs
rg -B3 -A3 'docker-login.*outputs' --type yamlLength of output: 0
🏁 Script executed:
#!/bin/bash
# Step 1: Verify the docker-login action file exists and show its content
echo "=== Checking docker-login action.yml ==="
if [ -f ".github/actions/docker-login/action.yml" ]; then
cat ".github/actions/docker-login/action.yml"
else
echo "File not found at .github/actions/docker-login/action.yml"
find . -name "action.yml" -type f 2>/dev/null | head -20
fiLength of output: 1638
🏁 Script executed:
#!/bin/bash
# Step 2: Search for workflows that use docker-login action
echo "=== Workflows using docker-login ==="
rg -n 'docker-login' --type yamlLength of output: 36
🏁 Script executed:
#!/bin/bash
# Step 3: Search for references to image_tag output more broadly
echo "=== References to image_tag ==="
rg -n 'image_tag' --type yamlLength of output: 31
🏁 Script executed:
#!/bin/bash
# Step 1: Verify the docker-login action file exists and show its content
echo "=== Checking docker-login action.yml ==="
if [ -f ".github/actions/docker-login/action.yml" ]; then
cat ".github/actions/docker-login/action.yml"
else
echo "File not found at .github/actions/docker-login/action.yml"
find . -name "action.yml" -type f 2>/dev/null | head -20
fiLength of output: 1638
🏁 Script executed:
#!/bin/bash
# Step 2: Search for workflows that use docker-login action
echo "=== Workflows using docker-login ==="
rg -n 'docker-login' --type yamlLength of output: 36
🏁 Script executed:
#!/bin/bash
# Step 3: Search for references to image_tag output more broadly in workflow files
echo "=== References to image_tag in workflows ==="
rg -n 'image_tag' --type yaml -A2 -B2Length of output: 44
Fix action metadata: name, description, and remove invalid outputs section.
The action metadata contains errors copied from the docker-build action:
- Line 1:
nameshould be "Docker Login" not "Docker Build" - Line 2:
descriptionshould describe registry authentication, not building images - Lines 24-27: The
outputssection referencessteps.build.outputs.image_tag, but this action has no build step (only login steps: ECR Login, NGC Login, ACR Login) and should not output an image tag
-name: 'Docker Build'
-description: 'Build Dynamo container images'
+name: 'Docker Login'
+description: 'Log in to container registries (ECR, NGC, ACR)'
-outputs:
- image_tag:
- description: 'Image Tag'
- value: ${{ steps.build.outputs.image_tag }}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
.github/actions/docker-login/action.yml lines 1-2 (and remove lines 24-27):
Update the action metadata to reflect that this is a Docker Login action: change
name from "Docker Build" to "Docker Login" and replace the description with a
short sentence about authenticating to registries (e.g., "Authenticate to
container registries (ECR, NGC, ACR)"). Remove the invalid outputs section that
references steps.build.outputs.image_tag (lines 24-27) since this action does
not perform a build and should not publish an image_tag output.
Signed-off-by: Dillon Cullinan <[email protected]>
5dee782 to
8cf3034
Compare
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Overview:
Summary:
--cache-fromargs to our docker build, to attempt to use pre-built images as a cache targetSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.