Skip to content

Conversation

@dillon-cullinan
Copy link
Contributor

@dillon-cullinan dillon-cullinan commented Nov 21, 2025

Overview:

Summary:

  • Moves all login steps into a shared action
  • All unneeded inputs are removed, and references to changed actions updated
  • Adds --cache-from args to our docker build, to attempt to use pre-built images as a cache target

Summary by CodeRabbit

  • Chores
    • Improved CI/CD infrastructure with centralized container registry authentication across workflows and architecture-specific build caching for faster deployments.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Dillon Cullinan <[email protected]>
@dillon-cullinan dillon-cullinan requested a review from a team as a code owner November 21, 2025 17:49
@github-actions github-actions bot added the ci Issues/PRs that reference CI build/test label Nov 21, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

Centralized container registry authentication by introducing a new reusable docker-login GitHub Actions action. Refactored workflows and actions to use this action instead of inline login commands, eliminating distributed credential handling. Modified docker-build action to dynamically compute ECR hostname and extract architecture from platform for cache optimization.

Changes

Cohort / File(s) Summary
New Docker Login Action
​.github/actions/docker-login/action.yml
New composite action providing unified login support for ECR, NGC, and ACR registries with optional credentials for each. Exposes image_tag output.
Docker Build Action
​.github/actions/docker-build/action.yml
Removed ngc_ci_access_token input and standalone login steps. Added ECR_HOSTNAME environment variable derivation from AWS credentials and ARCH extraction from PLATFORM. Added two cache-from registry flags for base and framework images.
Docker Tag/Push Action
​.github/actions/docker-tag-push/action.yml
Removed AWS and Azure credential inputs (aws_access_key_id, aws_secret_access_key, azure_acr_user, azure_acr_password) and corresponding ACR login step.
Container Validation Workflows
​.github/workflows/container-validation-backends.yml
Replaced inline ECR and Azure login commands with centralized docker-login action invocations across operator, vllm, sglang, trtllm, and deploy jobs.
Nightly CI Workflow
​.github/workflows/nightly-ci.yml
Added docker-login steps in vllm, vllm-framework, and framework build sections. Consolidated token handling by shifting ngc_ci_access_token usage from inline to docker-login action. Introduced sglang-framework and trtllm-framework sections with consistent auth.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • ARCH extraction logic in docker-build: verify "${PLATFORM##*/}" correctly extracts architecture from platform strings (e.g., linux/amd64 → amd64)
  • Cache-from registry references: ensure ECR_HOSTNAME and ARCH variables resolve correctly in cache paths for both base and framework images
  • Credential flow migration: confirm all workflows properly pass secrets to docker-login action and that no direct credential lines were left behind
  • New docker-login action contract: validate that image_tag output and all conditional inputs (ECR, NGC, ACR) align with downstream consumers

Poem

🐰 A login once scattered now hops as one,
Registry gates unified—the auth dance is done!
ARCH extracted clean, cache now wise,
Centralized credentials—a rabbit's reprise! 🚀

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The description covers Overview with key changes and summary, but is missing 'Details' section with comprehensive change breakdown and 'Where should the reviewer start' guidance with specific file locations. Add a 'Details' section describing changes across modified files and a 'Where should the reviewer start' section highlighting key files like docker-build and docker-login actions for reviewer focus.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly references the main change: adding caching setup for Docker builds, which aligns with the core objective of adding --cache-from arguments.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
.github/actions/docker-build/action.yml (2)

159-159: Consolidate duplicate architecture extraction logic.

Line 159 extracts platform architecture using sed 's/linux\///', but line 85 already extracts this same value into the ARCH variable using "${PLATFORM##*/}". The metrics section should reuse the earlier ARCH variable instead of re-extracting it. This eliminates duplication and potential inconsistency if either method is later modified.

- PLATFORM_ARCH=$(echo "${{ inputs.platform }}" | sed 's/linux\///')
+ PLATFORM_ARCH="${ARCH}"  # Already extracted at line 85

77-77: ECR_HOSTNAME construction is technically valid concern but all callers provide credentials.

The action inputs aws_account_id and aws_default_region are optional with no defaults. Line 77 would create a malformed hostname like .dkr.ecr..amazonaws.com if either is missing. However, all five current invocations of this action—across container-validation-backends.yml and nightly-ci.yml—consistently provide both credentials via GitHub secrets. In practice, this is a CI/CD infrastructure action with implicit credential requirements that all callers satisfy.

The defensive programming suggestion remains valid: consider either marking these inputs as required: true or adding graceful handling (e.g., conditionally skip cache-from flags when credentials are absent). This would prevent future misuse if the action is called outside these established workflows.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c61e0dd and 96bc89c.

📒 Files selected for processing (5)
  • .github/actions/docker-build/action.yml (2 hunks)
  • .github/actions/docker-login/action.yml (1 hunks)
  • .github/actions/docker-tag-push/action.yml (0 hunks)
  • .github/workflows/container-validation-backends.yml (4 hunks)
  • .github/workflows/nightly-ci.yml (4 hunks)
💤 Files with no reviewable changes (1)
  • .github/actions/docker-tag-push/action.yml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • .github/actions/docker-login/action.yml
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

  • .github/actions/docker-login/action.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: sglang (amd64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (7)
.github/workflows/nightly-ci.yml (3)

21-29: Approve Docker Login integration in vllm job.

The Docker Login step is correctly positioned and all registry credentials are provided. This pattern is consistent with the PR objective of centralizing authentication.


92-92: Approve YAML anchor pattern for framework builds.

The YAML anchor &framework-build-steps at line 92 is correctly defined and referenced via *framework-build-steps in the sglang-framework (line 146) and trtllm-framework (lines 148-159) jobs. This reduces duplication and ensures consistent build steps across all framework variants. The Docker Login step is properly included in the shared definition.

Also applies to: 146-146, 159-159


31-47: Approve removal of ngc_ci_access_token from docker-build inputs.

The ngc_ci_access_token is correctly removed from the docker-build action call, as authentication is now centralized via the Docker Login step. The ci_token (GitHub token) and other credentials are properly passed to the build action.

.github/workflows/container-validation-backends.yml (2)

74-82: Approve consistent Docker Login integration across all build jobs.

The Docker Login step is correctly and consistently placed before container build/push operations across operator, vllm, sglang, and trtllm jobs. All registry credentials are uniformly provided, aligning with the PR objective of centralizing authentication.

Also applies to: 150-158, 224-232, 294-302


86-86: Approve local ECR_HOSTNAME computation in operator job.

The operator job computes ECR_HOSTNAME locally for use in its custom docker build commands (not using the docker-build action). While docker-login has already authenticated to ECR, computing the hostname locally is acceptable for reference purposes and keeps the build logic self-contained. This is a minor duplication that doesn't impact functionality.

Also applies to: 93-93, 104-104, 113-113

.github/actions/docker-login/action.yml (1)

32-48: Approve conditional registry login implementations.

The three conditional login steps (ECR, NGC, ACR) are correctly implemented:

  • Each step only runs if its required credentials are provided
  • Credentials are passed via stdin (--password-stdin) to avoid CLI argument exposure
  • ECR uses AWS CLI (aws ecr get-login-password), which is standard on GitHub runners
  • NGC hardcodes the username as '$oauthtoken', which is correct for NGC API
  • ACR accepts provided username/password

Ensure the GitHub runner environment includes AWS CLI for ECR login, or add a setup step if targeting self-hosted runners.

.github/actions/docker-build/action.yml (1)

112-113: Verify that cache-from image tags exist in ECR registry.

The cache-from references (lines 112-113) use hardcoded image tags (main-dynamo-base-${ARCH} and main-${framework}-framework-${ARCH}) that don't appear elsewhere in this repository and aren't produced by any visible build or push operations here. These tags must pre-exist in your ECR registry at ${ECR_HOSTNAME}/ai-dynamo/dynamo or Docker will silently skip the cache without warning, negating the build performance benefit.

Confirm:

  1. These images are built and pushed by an external CI process (e.g., nightly builds)
  2. The tags follow the exact pattern specified (especially main- prefix and -${ARCH} suffix)
  3. The base image tag (main-dynamo-base-${ARCH}) actually exists; it's less obvious whether this is actively maintained

Comment on lines +1 to +2
name: 'Docker Build'
description: 'Build Dynamo container images'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix action metadata: name, description, and remove unused outputs.

The action metadata contains errors copied from the docker-build action:

  • Line 1: name should be "Docker Login" not "Docker Build"
  • Line 2: description should describe registry authentication, not building images
  • Lines 24-27: The outputs section references steps.build.outputs.image_tag, but this action has no build step and shouldn't output an image tag
-name: 'Docker Build'
-description: 'Build Dynamo container images'
+name: 'Docker Login'
+description: 'Log in to container registries (ECR, NGC, ACR)'

-outputs:
-  image_tag:
-    description: 'Image Tag'
-    value: ${{ steps.build.outputs.image_tag }}

Run the following script to verify that no workflows depend on the docker-login action's image_tag output:

Also applies to: 24-27


🏁 Script executed:

#!/bin/bash
# Description: Check if any workflows reference docker-login outputs

rg -B3 -A3 'docker-login.*outputs' --type yaml

Length of output: 0



🏁 Script executed:

#!/bin/bash
# Step 1: Verify the docker-login action file exists and show its content
echo "=== Checking docker-login action.yml ==="
if [ -f ".github/actions/docker-login/action.yml" ]; then
  cat ".github/actions/docker-login/action.yml"
else
  echo "File not found at .github/actions/docker-login/action.yml"
  find . -name "action.yml" -type f 2>/dev/null | head -20
fi

Length of output: 1638


🏁 Script executed:

#!/bin/bash
# Step 2: Search for workflows that use docker-login action
echo "=== Workflows using docker-login ==="
rg -n 'docker-login' --type yaml

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Step 3: Search for references to image_tag output more broadly
echo "=== References to image_tag ==="
rg -n 'image_tag' --type yaml

Length of output: 31


🏁 Script executed:

#!/bin/bash
# Step 1: Verify the docker-login action file exists and show its content
echo "=== Checking docker-login action.yml ==="
if [ -f ".github/actions/docker-login/action.yml" ]; then
  cat ".github/actions/docker-login/action.yml"
else
  echo "File not found at .github/actions/docker-login/action.yml"
  find . -name "action.yml" -type f 2>/dev/null | head -20
fi

Length of output: 1638


🏁 Script executed:

#!/bin/bash
# Step 2: Search for workflows that use docker-login action
echo "=== Workflows using docker-login ==="
rg -n 'docker-login' --type yaml

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Step 3: Search for references to image_tag output more broadly in workflow files
echo "=== References to image_tag in workflows ==="
rg -n 'image_tag' --type yaml -A2 -B2

Length of output: 44


Fix action metadata: name, description, and remove invalid outputs section.

The action metadata contains errors copied from the docker-build action:

  • Line 1: name should be "Docker Login" not "Docker Build"
  • Line 2: description should describe registry authentication, not building images
  • Lines 24-27: The outputs section references steps.build.outputs.image_tag, but this action has no build step (only login steps: ECR Login, NGC Login, ACR Login) and should not output an image tag
-name: 'Docker Build'
-description: 'Build Dynamo container images'
+name: 'Docker Login'
+description: 'Log in to container registries (ECR, NGC, ACR)'

-outputs:
-  image_tag:
-    description: 'Image Tag'
-    value: ${{ steps.build.outputs.image_tag }}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
.github/actions/docker-login/action.yml lines 1-2 (and remove lines 24-27):
Update the action metadata to reflect that this is a Docker Login action: change
name from "Docker Build" to "Docker Login" and replace the description with a
short sentence about authenticating to registries (e.g., "Authenticate to
container registries (ECR, NGC, ACR)"). Remove the invalid outputs section that
references steps.build.outputs.image_tag (lines 24-27) since this action does
not perform a build and should not publish an image_tag output.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Issues/PRs that reference CI build/test size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants