Skip to content

Conversation

@adilhusain-s
Copy link
Collaborator

Overview
This PR introduces a robust set of tooling to automate partial manifest generation, optimizes CI/CD workflows to prevent race conditions during releases, and rotates the version manifest data to support Python 3.13 and 3.14.
It addresses flaky builds caused by network transient errors and decouples the artifact build process from the Git push operations.

Key Changes

  1. Infrastructure & Security
    • Reliability: Added configurable retry logic (8 attempts, 5s delay) to dotnet-install.py to handle transient network errors during dependency fetching.
    • Security: Upgraded Trivy to v0.68.2 and enabled strict build failures (FAIL_ON_HIGH=1, FAIL_ON_CRITICAL=1) to ensure security standards are met before release.
    • Cleanup: Simplified the Makefile by removing unnecessary sudo calls and streamlining the build commands.

  2. New Tooling (Backfill & Manifests)
    Introduced a new Python-based toolchain to handle manifest operations programmatically:
    • generate_partial_manifest.py: Generates architecture-specific manifest JSONs using assets from GitHub Releases.
    • apply_partial_manifests.py: Merges partial manifests into the main version files.
    • backfill-manifests.yml: A new workflow to manually or conceptually trigger manifest updates for existing tags without rebuilding binaries.
    • Testing: Added unit tests in tests/ to verify manifest generation logic.

  3. CI/CD Architecture Refactor
    Major refactoring of release-matching-python-tags.yml and reusable-release-python-tar.yml:
    • Atomic Updates: Removed "git push" logic from the individual build matrix jobs. Instead, jobs now upload partial manifest artifacts.
    • Aggregation: Added a new update-manifests job that runs after the build matrix completes. It downloads all partial artifacts, merges them, and commits them in a single atomic operation.
    • Concurrency: Implemented concurrency groups (release-matching-${{ github.ref }}) to prevent overlapping runs from corrupting the git history.
    • Resilience: Disabled fail-fast and tuned max-parallel to ensure temporary failures in one architecture (e.g., s390x) do not cancel builds for others.

  4. Data Rotation
    • Removed: Legacy manifest files for Python 3.9, 3.10, 3.11, and 3.12.
    • Added/Updated: Full manifest definitions for Python 3.13.x and 3.14.x for ppc64le and s390x.

Technical Context
Why the workflow change? Previously, the reusable release workflow attempted to push to main from within the matrix strategy. If multiple architectures finished simultaneously, they would trigger race conditions, causing git push failures or merge conflicts. By moving to an Artifact -> Aggregate -> Commit pattern, we eliminate these race conditions and ensure a clean git history.

Verification
• [x] Unit Tests: Verified tests/test_generate_partial_manifest.py and tests/test_apply_partial_manifests.py.
• [x] Infrastructure: Validated make commands with the new Trivy version.
• [x] Workflows: Verified the Backfill workflow parses tags and generates artifacts correctly.

- dotnet-install.py: Add retry logic (8 attempts) for JSON fetching to handle network flakes.
- Makefile: Upgrade Trivy to v0.68.2 and enforce build failure on High/Critical vulnerabilities.

Signed-off-by: Adilhusain Shaikh <[email protected]>
- Add 'generate_partial_manifest.py' and 'apply_partial_manifests.py' scripts.
- Add 'backfill-manifests.yml' workflow to process partial manifests.
- Add unit tests for manifest generation and application logic.

Signed-off-by: Adilhusain Shaikh <[email protected]>

fix(tests): update error message assertion for invalid JSON handling

Signed-off-by: Adilhusain Shaikh <[email protected]>
- release-matching-python-tags: Target Python 3.13.* and implement concurrency groups.
- reusable-release-python-tar: Remove direct Git push logic; generate partial manifest artifacts instead.
- release-matching-python-tags: Add 'update-manifests' job to aggregate partials and commit atomically.
- Optimize 'max-parallel' and disable 'fail-fast' for better resilience.

Signed-off-by: Adilhusain Shaikh <[email protected]>
- Drop legacy manifest files for Python 3.9, 3.10, 3.11, and 3.12.
- Add and update manifest definitions for Python 3.13.x and 3.14.x on ppc64le and s390x architectures.

Signed-off-by: Adilhusain Shaikh <[email protected]>
@adilhusain-s
Copy link
Collaborator Author

@anup-kodlekere

please review.

@adilhusain-s adilhusain-s deleted the release-python-3.13.x branch December 24, 2025 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants