feat(ci): Add pre-compiled test binaries for EC2 integration tests#2026
feat(ci): Add pre-compiled test binaries for EC2 integration tests#2026
Conversation
Add WarmTestCache workflow job that pre-compiles all test packages and uploads GOCACHE/GOMODCACHE to S3. EC2 integration test instances download this cache before running go test, eliminating redundant compilation. Changes: - Add .github/workflows/warm-test-cache.yml reusable workflow - Add WarmTestCache job to test-artifacts.yml (runs in parallel with GenerateTestMatrix) - Add cache_key input to ec2-integration-test.yml reusable workflow - Pass cache_key to terraform for EC2 Linux, OnPrem, and SELinux tests The cache is keyed by branch, Go version, OS, and architecture. It runs once per workflow execution and all EC2 test jobs consume it. Companion PR: aws/amazon-cloudwatch-agent-test#650 🤖 Assisted by AI
Wire WarmTestCache job into PR-test.yml with the same pattern used in test-artifacts.yml. All EC2 Linux test pages (0-4) and SELinux tests now receive cache_key and depend on WarmTestCache. Also added WarmTestCache to verify-all gate check. 🤖 Assisted by AI
Terraform errors with 'undeclared variable' if cache_key is passed to a terraform config that doesn't declare it. This happens when the test repo hasn't been updated yet, or for terraform dirs that don't support caching. Conditionally build the -var flag so it's only included when cache_key is set, ensuring backward compatibility. 🤖 Assisted by AI
The test repo may not have the cache_key variable yet (PR pending). Check variables.tf for the declaration before passing -var to terraform to avoid "undeclared variable" errors during the rollout period. 🤖 Assisted by AI
Add BuildTestBinaries workflow that compiles all test packages with go test -c and uploads static binaries to S3. EC2 test instances download and run these directly, eliminating ~3 min of compilation per test. Changes: - Add .github/workflows/build-test-binaries.yml reusable workflow - Add BuildTestBinaries job to test-artifacts.yml and PR-test.yml (runs in parallel with GenerateTestMatrix and WarmTestCache) - Add test_binaries_prefix input to ec2-integration-test.yml - Conditionally pass -var=test_binaries_prefix to terraform Binaries are cross-compiled with CGO_ENABLED=0 GOOS=linux GOARCH=amd64 for full portability across all Linux distros (AL2, AL2023, Ubuntu, RHEL, SLES, etc). 🤖 Assisted by AI
The test matrix includes arm64 instances. Build binaries for both
architectures and output the prefix without arch so terraform can
append /linux/${arc} to select the correct binary.
🤖 Assisted by AI
The WarmTestCache job added complexity without significant benefit compared to pre-compiled binaries. Removing it simplifies the workflow. Removed: - .github/workflows/warm-test-cache.yml - cache_key input from ec2-integration-test.yml - WarmTestCache job references from PR-test.yml and test-artifacts.yml The BuildTestBinaries job remains - that's where the real savings come from.
- Add ITAR/CN bucket inputs to build-test-binaries.yml - Upload test binaries to all three regions (commercial, ITAR, CN) - Pass test_binaries_prefix to ITAR and CN integration test jobs - ITAR/CN will now use pre-compiled binaries instead of go test
Cross-partition assume-role doesn't work. Each partition needs its own OIDC authentication via configure-aws-credentials. - Split ITAR/CN uploads into separate jobs - Use GitHub artifacts to pass binaries between jobs - Add Complete job to ensure all uploads finish before tests start
| name: 'Complete' | ||
| needs: [BuildTestBinaries, UploadTestBinariesITAR, UploadTestBinariesCN] | ||
| if: ${{ always() }} | ||
| runs-on: ubuntu-latest | ||
| outputs: | ||
| test_binaries_prefix: ${{ needs.BuildTestBinaries.outputs.test_binaries_prefix }} | ||
| steps: | ||
| - name: Check results | ||
| run: | | ||
| echo "BuildTestBinaries: ${{ needs.BuildTestBinaries.result }}" | ||
| echo "UploadTestBinariesITAR: ${{ needs.UploadTestBinariesITAR.result }}" | ||
| echo "UploadTestBinariesCN: ${{ needs.UploadTestBinariesCN.result }}" | ||
| if [[ "${{ needs.BuildTestBinaries.result }}" != "success" ]]; then | ||
| exit 1 | ||
| fi |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
To fix the problem, we should explicitly define a permissions block for the Complete job so it no longer relies on repository/organization defaults. Since the Complete job only echoes results and evaluates conditions without interacting with GitHub APIs or repository contents, it does not need any token permissions at all, so the least‑privilege configuration is to disable the GITHUB_TOKEN for this job using permissions: {}.
Concretely, in .github/workflows/build-test-binaries.yml, in the Complete job definition starting at line 182, add a permissions: {} entry alongside runs-on and outputs. For example, immediately after runs-on: ubuntu-latest insert a line permissions: {} with the same indentation as runs-on. No imports or additional methods are needed, as this is purely a YAML configuration change.
| @@ -184,6 +184,7 @@ | ||
| needs: [BuildTestBinaries, UploadTestBinariesITAR, UploadTestBinariesCN] | ||
| if: ${{ always() }} | ||
| runs-on: ubuntu-latest | ||
| permissions: {} | ||
| outputs: | ||
| test_binaries_prefix: ${{ needs.BuildTestBinaries.outputs.test_binaries_prefix }} | ||
| steps: | ||
| @@ -194,4 +195,3 @@ | ||
| echo "UploadTestBinariesCN: ${{ needs.UploadTestBinariesCN.result }}" | ||
| if [[ "${{ needs.BuildTestBinaries.result }}" != "success" ]]; then | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Is this warning still valid, or did you resolve it?
- Split test binary builds into parallel amd64/arm64 jobs - Split MakeMacPkg into parallel matrix jobs for each architecture - Add single-arch Makefile targets for darwin builds Expected improvements: - Test binary build: ~50% faster (parallel arch builds) - MakeMacPkg: ~50% faster (parallel arch builds)
Add if conditions to skip jobs when their matrix arrays are empty, preventing 'Matrix vector does not contain any values' errors.
b7dc8ae to
9c15a56
Compare
|
pr tests will fail until new parameter is added, look at linked test run |
The splitByTestFunc feature increased ec2_linux_matrix from ~306 to 939 entries (729KB). Combined with other matrices, total job outputs exceeded GitHub's 1MB limit. Write ec2_linux_matrix to a temp file instead of GITHUB_OUTPUT, then paginate from the file. Only the paginated pages are output as job outputs.
The splitByTestFunc feature increased matrix size significantly. Strip empty/zero/false/null fields from matrix JSON, reducing total output size from ~935KB to ~573KB (well under GitHub's 1MB limit).
…limit" This reverts commit 88210e6.
Even with omitempty in the generator, total output size (~648KB) plus overhead can exceed GitHub's 1MB job output limit. Add jq filtering as additional safety to strip any remaining empty/zero/false fields.
Upload ec2_linux matrix pages as artifacts instead of job outputs. Add loader jobs that download the artifact and output each page. This completely bypasses the 1MB job output limit for GenerateTestMatrix.
|
This PR was marked stale due to lack of activity. |
Summary
Optimize EC2 integration test runtime by pre-compiling test binaries and uploading to S3. Saves 4.2 min per test on average (24% reduction) across 306 EC2 Linux tests.
Measured Results
Full unfiltered run (22117878901) vs baseline (21757853087):
Per-test (al2023, us-west-2)
Duration distribution shift
BuildTestBinariesruns in parallel withGenerateTestMatrix— zero added latency to the critical path.How this differs from the validator
The test repo already has a validator binary that pre-compiles and runs tests on EC2 — used by performance, stress, and Windows tests. It was originally created to work around OOM issues on Windows hosts when running
go test.The validator requires each test to export a
Validate()function and be registered in a switch statement in main.go. Only 7 tests are registered. Extending it to cover all ~50 test packages would require refactoring every test.This PR uses
go test -cinstead, which compiles any standard Go test package into a standalone binary with zero code changes. The pre-compiled binaries are functionally identical togo test— same test framework, same flags, same behavior — just without the compilation step on the EC2 instance.Changes
New:
.github/workflows/build-test-binaries.ymlCompiles every
./test/...package withCGO_ENABLED=0 go test -cfor bothlinux/amd64andlinux/arm64. Uploads tos3://${bucket}/integration-test/test-binaries/${sha}/linux/${arch}/.Modified:
.github/workflows/test-artifacts.ymlBuildTestBinariesjob (parallel withGenerateTestMatrix)test_binaries_prefixto each test jobModified:
.github/workflows/PR-test.ymlSame integration as test-artifacts.yml.
Modified:
.github/workflows/ec2-integration-test.ymltest_binaries_prefixinputCompanion PR
aws/amazon-cloudwatch-agent-test#650 — terraform-side changes.
Not in scope
Historical context (original PR also included Go cache warming)
The original PR included both pre-compiled binaries AND Go module/build cache warming. The cache warming was removed as it added complexity without significant additional benefit over pre-compiled binaries alone.
Removed in later commits:
.github/workflows/warm-test-cache.ymlcache_keyinput/variable throughoutWarmTestCachejob references