Skip to content

feat(ci): Add pre-compiled test binaries for EC2 integration tests#2026

Open
the-mann wants to merge 21 commits intomainfrom
feat/go-build-cache-s3
Open

feat(ci): Add pre-compiled test binaries for EC2 integration tests#2026
the-mann wants to merge 21 commits intomainfrom
feat/go-build-cache-s3

Conversation

@the-mann
Copy link
Copy Markdown
Contributor

@the-mann the-mann commented Feb 13, 2026

Summary

Optimize EC2 integration test runtime by pre-compiling test binaries and uploading to S3. Saves 4.2 min per test on average (24% reduction) across 306 EC2 Linux tests.

Measured Results

Full unfiltered run (22117878901) vs baseline (21757853087):

Metric Baseline (go test) With pre-compiled binaries Change
Average duration 17.7 min 13.5 min -4.2 min (-24%)
Min duration 6.7 min 3.6 min -3.1 min
Max duration 61.2 min 48.9 min -12.3 min

Per-test (al2023, us-west-2)

Test Baseline With binaries Saved
run_as_user_test ~12 min ~4 min ~8 min
xray_test ~15 min ~6 min ~9 min
metric_dimension_test ~14 min ~6 min ~8 min
restart_test ~18 min ~8 min ~10 min
cloudwatchlogs_test ~25 min ~17 min ~8 min
entity_metrics_benchmark_test ~36 min ~26 min ~10 min
otlp_test ~24 min ~15 min ~9 min

Duration distribution shift

Bucket Baseline With binaries
< 5 min 0 22
5-10 min 36 133
10-15 min 109 30
15-20 min 46 62
20-30 min 80 45
30+ min 26 14

BuildTestBinaries runs in parallel with GenerateTestMatrix — zero added latency to the critical path.

How this differs from the validator

The test repo already has a validator binary that pre-compiles and runs tests on EC2 — used by performance, stress, and Windows tests. It was originally created to work around OOM issues on Windows hosts when running go test.

The validator requires each test to export a Validate() function and be registered in a switch statement in main.go. Only 7 tests are registered. Extending it to cover all ~50 test packages would require refactoring every test.

This PR uses go test -c instead, which compiles any standard Go test package into a standalone binary with zero code changes. The pre-compiled binaries are functionally identical to go test — same test framework, same flags, same behavior — just without the compilation step on the EC2 instance.

Changes

New: .github/workflows/build-test-binaries.yml

Compiles every ./test/... package with CGO_ENABLED=0 go test -c for both linux/amd64 and linux/arm64. Uploads to s3://${bucket}/integration-test/test-binaries/${sha}/linux/${arch}/.

Modified: .github/workflows/test-artifacts.yml

  • Added BuildTestBinaries job (parallel with GenerateTestMatrix)
  • Added as dependency for EC2 Linux test jobs (pages 0-4, OnPrem, SELinux)
  • Passes test_binaries_prefix to each test job

Modified: .github/workflows/PR-test.yml

Same integration as test-artifacts.yml.

Modified: .github/workflows/ec2-integration-test.yml

  • Added test_binaries_prefix input
  • Conditionally passes var to terraform only when the terraform config declares it (backward compatible)

Companion PR

aws/amazon-cloudwatch-agent-test#650 — terraform-side changes.

Not in scope

  • EKS/ECS local-exec tests
  • Windows/Mac tests
  • ITAR/CN region binaries

Historical context (original PR also included Go cache warming)

The original PR included both pre-compiled binaries AND Go module/build cache warming. The cache warming was removed as it added complexity without significant additional benefit over pre-compiled binaries alone.

Removed in later commits:

  • .github/workflows/warm-test-cache.yml
  • cache_key input/variable throughout
  • WarmTestCache job references

Add WarmTestCache workflow job that pre-compiles all test packages and
uploads GOCACHE/GOMODCACHE to S3. EC2 integration test instances
download this cache before running go test, eliminating redundant
compilation.

Changes:
- Add .github/workflows/warm-test-cache.yml reusable workflow
- Add WarmTestCache job to test-artifacts.yml (runs in parallel with
  GenerateTestMatrix)
- Add cache_key input to ec2-integration-test.yml reusable workflow
- Pass cache_key to terraform for EC2 Linux, OnPrem, and SELinux tests

The cache is keyed by branch, Go version, OS, and architecture. It
runs once per workflow execution and all EC2 test jobs consume it.

Companion PR: aws/amazon-cloudwatch-agent-test#650

🤖 Assisted by AI
@the-mann the-mann requested a review from a team as a code owner February 13, 2026 23:04
@the-mann the-mann marked this pull request as draft February 16, 2026 19:48
@the-mann the-mann added the ready for testing Indicates this PR is ready for integration tests to run label Feb 16, 2026
Wire WarmTestCache job into PR-test.yml with the same pattern used in
test-artifacts.yml. All EC2 Linux test pages (0-4) and SELinux tests
now receive cache_key and depend on WarmTestCache.

Also added WarmTestCache to verify-all gate check.

🤖 Assisted by AI
Terraform errors with 'undeclared variable' if cache_key is passed to a
terraform config that doesn't declare it. This happens when the test
repo hasn't been updated yet, or for terraform dirs that don't support
caching.

Conditionally build the -var flag so it's only included when cache_key
is set, ensuring backward compatibility.

🤖 Assisted by AI
The test repo may not have the cache_key variable yet (PR pending).
Check variables.tf for the declaration before passing -var to terraform
to avoid "undeclared variable" errors during the rollout period.

🤖 Assisted by AI
Add BuildTestBinaries workflow that compiles all test packages with
go test -c and uploads static binaries to S3. EC2 test instances
download and run these directly, eliminating ~3 min of compilation
per test.

Changes:
- Add .github/workflows/build-test-binaries.yml reusable workflow
- Add BuildTestBinaries job to test-artifacts.yml and PR-test.yml
  (runs in parallel with GenerateTestMatrix and WarmTestCache)
- Add test_binaries_prefix input to ec2-integration-test.yml
- Conditionally pass -var=test_binaries_prefix to terraform

Binaries are cross-compiled with CGO_ENABLED=0 GOOS=linux GOARCH=amd64
for full portability across all Linux distros (AL2, AL2023, Ubuntu,
RHEL, SLES, etc).

🤖 Assisted by AI
The test matrix includes arm64 instances. Build binaries for both
architectures and output the prefix without arch so terraform can
append /linux/${arc} to select the correct binary.

🤖 Assisted by AI
@the-mann the-mann marked this pull request as ready for review February 18, 2026 19:15
The WarmTestCache job added complexity without significant benefit compared
to pre-compiled binaries. Removing it simplifies the workflow.

Removed:
- .github/workflows/warm-test-cache.yml
- cache_key input from ec2-integration-test.yml
- WarmTestCache job references from PR-test.yml and test-artifacts.yml

The BuildTestBinaries job remains - that's where the real savings come from.
@the-mann the-mann changed the title feat(ci): Add Go build cache warming for EC2 integration tests feat(ci): Add pre-compiled test binaries for EC2 integration tests Feb 19, 2026
- Add ITAR/CN bucket inputs to build-test-binaries.yml
- Upload test binaries to all three regions (commercial, ITAR, CN)
- Pass test_binaries_prefix to ITAR and CN integration test jobs
- ITAR/CN will now use pre-compiled binaries instead of go test
Cross-partition assume-role doesn't work. Each partition needs its own
OIDC authentication via configure-aws-credentials.

- Split ITAR/CN uploads into separate jobs
- Use GitHub artifacts to pass binaries between jobs
- Add Complete job to ensure all uploads finish before tests start
Comment on lines +166 to +180
name: 'Complete'
needs: [BuildTestBinaries, UploadTestBinariesITAR, UploadTestBinariesCN]
if: ${{ always() }}
runs-on: ubuntu-latest
outputs:
test_binaries_prefix: ${{ needs.BuildTestBinaries.outputs.test_binaries_prefix }}
steps:
- name: Check results
run: |
echo "BuildTestBinaries: ${{ needs.BuildTestBinaries.result }}"
echo "UploadTestBinariesITAR: ${{ needs.UploadTestBinariesITAR.result }}"
echo "UploadTestBinariesCN: ${{ needs.UploadTestBinariesCN.result }}"
if [[ "${{ needs.BuildTestBinaries.result }}" != "success" ]]; then
exit 1
fi

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {}

Copilot Autofix

AI about 1 month ago

To fix the problem, we should explicitly define a permissions block for the Complete job so it no longer relies on repository/organization defaults. Since the Complete job only echoes results and evaluates conditions without interacting with GitHub APIs or repository contents, it does not need any token permissions at all, so the least‑privilege configuration is to disable the GITHUB_TOKEN for this job using permissions: {}.

Concretely, in .github/workflows/build-test-binaries.yml, in the Complete job definition starting at line 182, add a permissions: {} entry alongside runs-on and outputs. For example, immediately after runs-on: ubuntu-latest insert a line permissions: {} with the same indentation as runs-on. No imports or additional methods are needed, as this is purely a YAML configuration change.

Suggested changeset 1
.github/workflows/build-test-binaries.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/build-test-binaries.yml b/.github/workflows/build-test-binaries.yml
--- a/.github/workflows/build-test-binaries.yml
+++ b/.github/workflows/build-test-binaries.yml
@@ -184,6 +184,7 @@
     needs: [BuildTestBinaries, UploadTestBinariesITAR, UploadTestBinariesCN]
     if: ${{ always() }}
     runs-on: ubuntu-latest
+    permissions: {}
     outputs:
       test_binaries_prefix: ${{ needs.BuildTestBinaries.outputs.test_binaries_prefix }}
     steps:
@@ -194,4 +195,3 @@
           echo "UploadTestBinariesCN: ${{ needs.UploadTestBinariesCN.result }}"
           if [[ "${{ needs.BuildTestBinaries.result }}" != "success" ]]; then
             exit 1
-          fi
EOF
@@ -184,6 +184,7 @@
needs: [BuildTestBinaries, UploadTestBinariesITAR, UploadTestBinariesCN]
if: ${{ always() }}
runs-on: ubuntu-latest
permissions: {}
outputs:
test_binaries_prefix: ${{ needs.BuildTestBinaries.outputs.test_binaries_prefix }}
steps:
@@ -194,4 +195,3 @@
echo "UploadTestBinariesCN: ${{ needs.UploadTestBinariesCN.result }}"
if [[ "${{ needs.BuildTestBinaries.result }}" != "success" ]]; then
exit 1
fi
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this warning still valid, or did you resolve it?

- Split test binary builds into parallel amd64/arm64 jobs
- Split MakeMacPkg into parallel matrix jobs for each architecture
- Add single-arch Makefile targets for darwin builds

Expected improvements:
- Test binary build: ~50% faster (parallel arch builds)
- MakeMacPkg: ~50% faster (parallel arch builds)
Add if conditions to skip jobs when their matrix arrays are empty,
preventing 'Matrix vector does not contain any values' errors.
@the-mann the-mann force-pushed the feat/go-build-cache-s3 branch from b7dc8ae to 9c15a56 Compare February 20, 2026 20:26
@the-mann
Copy link
Copy Markdown
Contributor Author

pr tests will fail until new parameter is added, look at linked test run

The splitByTestFunc feature increased ec2_linux_matrix from ~306 to 939
entries (729KB). Combined with other matrices, total job outputs exceeded
GitHub's 1MB limit.

Write ec2_linux_matrix to a temp file instead of GITHUB_OUTPUT, then
paginate from the file. Only the paginated pages are output as job outputs.
The splitByTestFunc feature increased matrix size significantly. Strip
empty/zero/false/null fields from matrix JSON, reducing total output
size from ~935KB to ~573KB (well under GitHub's 1MB limit).
Even with omitempty in the generator, total output size (~648KB) plus
overhead can exceed GitHub's 1MB job output limit. Add jq filtering
as additional safety to strip any remaining empty/zero/false fields.
Upload ec2_linux matrix pages as artifacts instead of job outputs.
Add loader jobs that download the artifact and output each page.
This completely bypasses the 1MB job output limit for GenerateTestMatrix.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 7, 2026

This PR was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for testing Indicates this PR is ready for integration tests to run Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants