Skip to content

new method for ingesting tarballs via a single staging PR #232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

trz42
Copy link
Contributor

@trz42 trz42 commented Jun 29, 2025

This PR is based on the proof-of-concept #213. It aims at keeping only necessary code from the proof-of-concept, move the main loop into a separate script, and thus leave the existing ingestion code unchanged.

Summary of the ideas/changes/additions:

  • Adds CI for ensuring code quality (flake8, existing code is not validated) and running pytest.
  • We aim at using type hints for all function arguments and return values.
  • Improved logging combining levels (as before), scopes (to limit logging to parts of the code) and a decorator for logging function entry & exit. We aim at using the decorator for all functions to provide detailed debugging means.
  • Model the client to fetch files and ETags from a remote storage service.
  • Model an S3 bucket (e.g., hosted on AWS or Minio).
  • Model a file and its signature including functions to download them, use ETags to only download them if they have changed on the remote storage, verify the signature. A file can be the payload (tarball), a metadata (or task) file, or any other file of interest.
  • Model a task description (essentially the read in metadata or task file and some associated convenience functions such as obtaining the architecture from the name of the metadata/task file).
  • Model a task payload (could be a list of directories/files to be removed from CVMFS repo, a tarball containing software installations, or anything that should be applied to a CVMFS repo)
  • Model a task (combines the task description and the task payload, provides most of the logic to process a task, ensures that a task for a single payload is bundled in a single staging PR, updates its information in the staging repo, ...)

States, repository directory structure ... in a picture

ingest_bundles_infographics

High-level overview of state handler functions

_handle_add_undetermined

  1. Determine sequence number (corresponds to open or yet-to-be-opened pull request)
  2. Create files and directories with a single commit in default branch (see picture above)

_handle_add_new_task

  1. Init payload object (EESSITaskPayload) by downloading payload
  2. Update TaskState file

_handle_add_payload_staged

  1. Determines feature branch name
  2. Creates feature branch if it doesn't exist (TaskState is still PAYLOAD_STAGED in default and feature branch after it was created)
  3. Search for PR for feature branch
  4. none found: update states (default branch: PULL_REQUEST, feature branch: APPROVED) and create pull request
  5. found and closed: open issue (TO BE IMPLEMENTED)
  6. found and open: update states (default branch: PULL_REQUEST, feature branch: APPROVED) and update pull request
    Creating/updating a pull request will create and update a TaskSummary.html file and create/update the description of the pull request.

_handle_add_pull_request

  1. Determines state of PR
  2. If PR was closed, it changes state to REJECTED

@trz42 trz42 added enhancement New feature or request help wanted Extra attention is needed labels Jun 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants