-
Notifications
You must be signed in to change notification settings - Fork 6
Issue 59: Transfer Flows between CFS and HPSS #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
davramov
wants to merge
128
commits into
als-computing:main
Choose a base branch
from
davramov:issue_59
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
128 commits
Select commit
Hold shift + click to select a range
01aad62
Initial commit for issue #59. This is still a WIP and requires thorou…
davramov 1d10213
Adding a generic BeamlineConfig(ABC) class in orchestration/config.py…
davramov 1a4e893
Added logic for HPSSToCFSTransferController() copy() method. Now it w…
davramov cfa5a0c
Moving endpoint definitions to orchestration/transfer_endpoint.py so …
davramov 92c2eb6
Added a get_prune_controller() method.
davramov 3a02357
Getting ready to test CFSToHPSSTransferController flow. Created a Pre…
davramov 7db34e9
Fixed syntax errors
davramov 65ab2e0
Cleaned up logging for the CFSToHPSSTransferController slurm job scri…
davramov 17a1a52
adding support for dynaconf for handling beamline configurations
davramov 343ef75
Adding logic for the prune_filesystem_files flow in prune_controller.py
davramov f2bc19b
Moving specific prune implementations as private/internal methods in …
davramov 7c686c1
Adding documentation outlining shared infrastructure between beamline…
davramov 3071709
Linting and adding a few TODO comments
davramov a68009c
Fixed get_prune_controller() method to accept an Enum for supported t…
davramov f5af676
Working on a SciCat Ingestor Controller ABC, and a BL832 implementati…
davramov 4f0170f
sourceFolder expects a string
davramov fa62fa7
generalizing relevant code from ingest_tomo832.py, refactoring into a…
davramov 40ae3df
Refactoring scicat ingestion to be more modular and structured
davramov 6941438
Adding a custom Prefect Block to keep track of pending project names …
davramov 49c5b79
Added three dispatchers for bl832 to handle archiving data to HPSS: 1…
davramov a7496b7
Updating documentation and comments
davramov 8e384da
Testing HPSS flow, verified that I can create new directories on HPSS…
davramov c1e652e
Updating documentation:
davramov 9cd6ddc
Verified that htar bundling and building on HPSS works
davramov 0e7ff47
Updating HPSS documentation typo
davramov c78dc96
Adding a _find_dataset() method to ingestor_controller base class, in…
davramov 3278a1f
Successfully extracted files from a .tar archive on HPSS back to CFS!…
davramov 6618d99
Moving all HPSS related transfer/prune implementations into orchestra…
davramov 4e9215c
Adjusting hpss imports in the transfer_controller pytest
davramov 58eeec5
Verified that the HPSSPruneController successfully pruned from HPSS
davramov 33fce11
Simplified file paths within the .tar archives on HPSS so it referenc…
davramov 09e3403
Updated the HPSS flows in bl832/dispatcher.py to include updating the…
davramov abe84fe
Updated docstrings, logging, error handling
davramov 0fb0f3e
Updated docstrings, logging, typing, error handling
davramov 8674291
Improved error logging and exception handling for tape flows
davramov f0b7651
Verified SciCat login to the latest Docker version of scicatlive. It …
davramov 274418f
Fixing pytest errors and failures
davramov df53e7c
Testing scicat ingestion locally on a small test h5 dataset from 832.…
davramov 1f60795
Testing scicat ingestion locally on a small test h5 dataset from 832.…
davramov 3504f5a
updated and tested add_new_dataset_location() in orchestration/flows/…
davramov 300f92e
Addedsupport for linking tomography reconstructions as derived datase…
davramov 0045461
Fixed thumbnails uploaded for derived tiff/zarr datasets in SciCat
davramov db6b4fc
Adding logic to the remove_dataset_location() function in ingestor_co…
davramov f65b800
Adding a test script (not pytest) for end-to-end validation of the co…
davramov 05ed15a
Moved test_controllers_end_to_end.py to the scripts/ folder does it d…
davramov ded7bec
Addressing Dylan and Raja's comments
davramov 99d64cf
Updating end-to-end tests
davramov edfe26f
For the HPSS->CFS controller, convert slashes to underscores for the …
davramov 60250b8
Added filter for files >65GB to be moved with HTAR. Verified that thi…
davramov 07663de
Updating .env.example and README with current environment requirements.
davramov 72c7449
Added comment that this will be moved to the scicat_beamline repo at …
davramov 6ec45bb
Updating documentation for HPSS
davramov de1702b
Updating documentation for HPSS
davramov e98c17b
Adding ability to hpss controllers for ls command to see what's on tape
davramov 44455ac
Fixed commenting
davramov 49d7d2a
fixed pytest after rebasing
davramov 9568274
bumping python from 3.11->3.12.5 to see if it fixes a TypeError where…
davramov af03920
Fix error message grammar
davramov 35dc149
Fixing which flow the globus prune controller calls
davramov 7e808cf
making the days_from_now parameter a float, which is converted into d…
davramov 220b3a4
adding try and except to build_thumbnail, in case of edge cases (like…
davramov 2121eb9
Updated documentation regarding hsi mkdir -p
davramov afdcda0
Addressing Garrett's comment about the add_new_dataset_location metho…
davramov 6286208
Removing redunant datafile.path = file_path lines, as it is now confi…
davramov 036ac15
updating requirements.txt
davramov 2137159
Adding a TODO comment to _find_dataset to support searching dataFileL…
davramov 0c26fb3
Adjusting sfapi pytest to match new expected value
davramov e11cc0f
Adjusting prune_controller.prune() calls to use a float for days_from…
davramov 36c40f7
Renamining dummy to mock
davramov 980e681
linting
davramov 4c699a3
Fixed type (tranfer_client -> transfer_client)
davramov 3bb47af
Fixed typo (tranfer_client -> transfer_client)
davramov b71ea42
Fixed typo (tranfer_client -> transfer_client)
davramov cc9d55f
Fixed typo (tranfer_client -> transfer_client)
davramov d030114
Adding a comment at the top
davramov 4105d3b
Updating test_prune_controller.py based on Dylan's comments. Ensuring…
davramov 70ff27e
Removing redundant logger.debug messages when initializing transfer c…
davramov 9178769
Updating 733 docs sequence diagram
davramov 6330d38
Using the tmp_path fixture for pytest
davramov 0f1008a
Adding checks in prune_controller (Globus and Filesystem) for days_fr…
davramov 70e7ab1
Adding a new pytest functino called test_globus_prune_schedules_when_…
davramov e898be0
updating test_prune_controller with a new function test_fs_prune_sche…
davramov 26f78fa
Fixing tests for when days_from_now == 0
davramov 30b2542
Using Pathlib instead of os.path
davramov c63ff16
removing redundat MockClient assertion
davramov 6109c92
Rewriting the filesystem transfercontroller to follow Kate's suggesti…
davramov 2b046dd
Rewriting the filesystem transfercontroller exception handling to fol…
davramov 67626ff
Linting
davramov e7125cc
Set self.config = None after assigning beamline-specific configs
davramov f04b0ae
Enforcing consistency for beamline_id to be a str with numbers delimi…
davramov 81915be
Adding env var check to the start of the main method
davramov a9db743
Making the check_required_envars() function more verbose with which v…
davramov 13509ca
Removing unused imports
davramov ec2afc6
Adding better checks to make sure env variables are set
davramov 2f89be9
Using pathlib to build the full path
davramov c365640
Updating docstring
davramov 45b56a9
Including comment about circular dependencies
davramov 3f69853
Removing main block (tests are handled in scripts/test_controllers_en…
davramov 4992204
Prefect/Globus env variable checks throw warnings rather than errors …
davramov 74bc670
Elevating pruning flows out of the controller classes.
davramov fbb1a18
Moving HPSS slurm jobs into a folder orchestration/slurm/
davramov fcb1cb2
Ensuring all variables are passed into hpss_to_cfs.slurm
davramov 2cf5e44
moving hpss main method to it's own check_hpss script
davramov 05712ad
Adding a note about encode_thumbnail being part of scicat
davramov ed35aeb
capitalizing severity enum options
davramov 30aa7c0
Adding comment about cput not overwriting
davramov 80a95bc
Deleting unused sfapi key variable
davramov 08a1dcc
Docstring
davramov 6ebe839
Docstring
davramov 21e8d82
renaming login_to_scicat to get_scicat_client
davramov 6a4fcf6
removing redundant error message
davramov b0a7d1e
Docstring
davramov 354e151
removing test code
davramov 4a7030f
Fixing project path endpoint to nersc cfs endpoint. Updating the log …
davramov a64745b
adding log paths to error message
davramov 7ce99bb
Addressing comments regarding hpss dispatcher flows.
davramov 0faae48
Adjusting comments to be clearer
davramov 2749500
Adding typing throughout
davramov fba4b5d
Docstrings
davramov 2c548f2
replacing old move.py with move_refactor.py
davramov d7c353b
Documentation
davramov 0a4e74b
Adding screenshots to SFAPI documentation
davramov 40a7b61
hyperlink to sfapi example
davramov 59c54bf
Reducing the memory from 20GB to 2GB due to sbatch: error: More resou…
davramov ae01fb2
Check if scicat's base url includes localhost, and pass in the backen…
davramov bcea0a2
removing main method used for testing
davramov e13b0a3
replacing run_specific_flow with run_deployment after rebase
davramov ca3db6d
making sure process_new_832_file_task is called by dispatcher rather …
davramov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,15 @@ | ||
| GLOBUS_CLIENT_ID=<globus_client_id> | ||
| GLOBUS_CLIENT_SECRET=<globus_client_secret> | ||
| PREFECT_API_URL=<url_of_prefect_server> | ||
| PREFECT_API_KEY=<prefect_client_secret> | ||
| PUSHGATEWAY_URL=<url_of_pushgateway_server> | ||
| JOB_NAME=<jobname_for_pushgateway> | ||
| INSTANCE_LABEL=<label_for_pushgateway> | ||
| GLOBUS_CLIENT_ID=<globus_client_id> # For Globus Transfer | ||
| GLOBUS_CLIENT_SECRET=<globus_client_secret> # For Globus Transfer | ||
| GLOBUS_COMPUTE_CLIENT_ID=<globus_client_id> # For ALCF Jobs | ||
| GLOBUS_COMPUTE_CLIENT_SECRET=<globus_client_secret> # For ALCF Jobs | ||
| GLOBUS_COMPUTE_ENDPOINT=<globus_compute_endpoint> # For ALCF Jobs | ||
| PREFECT_API_URL=<url_of_prefect_server> # For Prefect Flows | ||
| PREFECT_API_KEY=<prefect_client_secret> # For Prefect Flows | ||
| SCICAT_API_URL=<url_of_scicat_api> # For SciCat Ingest | ||
| SCICAT_INGEST_USER=<scicat_ingest_user> # For SciCat Ingest | ||
| SCICAT_INGEST_PASSWORD=<scicat_ingest_password> # For SciCat Ingest | ||
| PATH_NERSC_CLIENT_ID=<path_nersc_client_id> # For NERSC SFAPI | ||
| PATH_NERSC_PRI_KEY=<path_nersc_private_key> # For NERSC SFAPI | ||
| PUSHGATEWAY_URL=<url_of_pushgateway_server> # For Grafana Pushgateway | ||
| JOB_NAME=<jobname_for_pushgateway> # For Grafana Pushgateway | ||
| INSTANCE_LABEL=<label_for_pushgateway> # For Grafana Pushgateway |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| export $(grep -v '^#' .env | xargs) | ||
|
|
||
|
|
||
| prefect work-pool create 'hpss_pool' | ||
|
|
||
| prefect deployment build ./orchestration/flows/bl832/hpss.py:cfs_to_hpss_flow -n cfs_to_hpss_flow -q cfs_to_hpss_queue -p hpss_pool | ||
| prefect deployment apply cfs_to_hpss_flow-deployment.yaml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| # Beamline 7.3.3 | ||
|
|
||
|
|
||
| ## Flow Diagram | ||
| ```mermaid | ||
| sequenceDiagram | ||
| participant DET as Detector/<br/>File Watcher | ||
| participant DISP as Prefect<br/>Dispatcher | ||
| participant D733 as data733<br/>Storage | ||
| participant GLOB as Globus<br/>Transfer | ||
| participant CFS as NERSC<br/>CFS | ||
| participant CAT as SciCat<br/>Metadata | ||
| participant SFAPI as SFAPI | ||
| participant HPC as HPC<br/>Compute | ||
| participant HPSS as HPSS<br/>Tape | ||
|
|
||
| %% Initial Trigger | ||
| DET->>DET: Monitor filesystem | ||
| DET->>DISP: Trigger on new file | ||
| DISP->>DISP: Coordinate flows | ||
|
|
||
| %% Flow 1: new_file_733 | ||
| rect rgb(220, 230, 255) | ||
| note over DISP,CAT: FLOW 1: new_file_733 | ||
| DISP->>GLOB: Init transfer | ||
| activate GLOB | ||
| GLOB->>D733: Initiate copy | ||
| activate D733 | ||
| D733-->>GLOB: Copy initiated | ||
| deactivate D733 | ||
| %% note right of GLOB: Transfer in progress | ||
| GLOB-->>DISP: Transfer complete | ||
| deactivate GLOB | ||
|
|
||
| DISP->>CAT: Register metadata | ||
| end | ||
|
|
||
| %% Flow 2: HPSS Transfer | ||
| rect rgb(220, 255, 230) | ||
| note over DISP,CAT: FLOW 2: Scheduled HPSS Transfer | ||
| DISP->>SFAPI: Submit tape job | ||
| activate SFAPI | ||
| SFAPI->>HPSS: Initiate archive | ||
| activate HPSS | ||
| HPSS-->>SFAPI: Archive complete | ||
| deactivate HPSS | ||
| SFAPI-->>DISP: Job complete | ||
| deactivate SFAPI | ||
|
|
||
| DISP->>CAT: Update metadata | ||
| end | ||
|
|
||
| %% Flow 3: HPC Analysis | ||
| rect rgb(255, 230, 230) | ||
| note over DISP,HPC: FLOW 3: HPC Downstream Analysis | ||
| DISP->>SFAPI: Submit compute job | ||
| activate SFAPI | ||
| SFAPI->>HPC: Execute job | ||
| activate HPC | ||
| HPC->>HPC: Process data | ||
| HPC-->>SFAPI: Compute complete | ||
| deactivate HPC | ||
| SFAPI-->>DISP: Job complete | ||
| deactivate SFAPI | ||
|
|
||
| DISP->>CAT: Update metadata | ||
| end | ||
|
|
||
| %% Flow 4: Scheduled Pruning | ||
| rect rgb(255, 255, 220) | ||
| note over DISP,CAT: FLOW 4: Scheduled Pruning | ||
| DISP->>DISP: Scheduled pruning trigger | ||
|
|
||
| DISP->>D733: Prune old files | ||
| activate D733 | ||
| D733->>D733: Delete expired data | ||
| D733-->>DISP: Pruning complete | ||
| deactivate D733 | ||
|
|
||
| DISP->>CFS: Prune old files | ||
| activate CFS | ||
| CFS->>CFS: Delete expired data | ||
| CFS-->>DISP: Pruning complete | ||
| deactivate CFS | ||
|
|
||
| DISP->>CAT: Update metadata | ||
| end | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
davramov marked this conversation as resolved.
Show resolved
Hide resolved
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| # Common Infrastructure | ||
|
|
||
| ## Overview | ||
| The common infrastructure for this project includes: | ||
| - **Shared Code**: There are general functions and classes used across beamline workflows to reduce code duplication. | ||
| - **Beamline Specific Implementation Patterns**: We organize each beamline's implementation in a similar way, making it easier to understand and maintain. | ||
|
|
||
| ## Shared Code | ||
| Shared code is organized into modules that can be imported in beamline specific implementations. Key modules include: | ||
| - **`orchestration/config.py`** | ||
| - Contains an Abstract Base Class (ABC) called `BeamlineConfig()` which serves as the base for all beamline-specific configuration classes. It uses the `Dynaconf` package to load the configuration file,`config.yml`, which contains information about endpoints, containers, and more. | ||
| - **`orchestration/transfer_endpoints.py`** | ||
| - Contains an ABC called `TransferEndpoint()`, which is extended by `FileSystemEndpoint`, `HPSSEndpoint` and `GlobusEndpoint`. These definitions are used to enforce typing and ensure the correct transfer and pruning implmentation are used. | ||
| - **`orchestration/transfer_controller.py`**: | ||
| - Contains an ABC called `TransferController()` with specific implementations for Globus, Local File Systems, and NERSC HPSS. | ||
| - **`orchestration/prune_controller.py`** | ||
| - This module is responsible for managing the pruning of data off of storage systems. It uses a configurable retention policy to determine when to remove files. It contains an ABC called `PruneController()` that is extended by specific implementations for `FileSystemEndpoint`, `GlobusEndpoint`, and `HPSSEndpoint`. | ||
| - **`orchestration/sfapi.py`**: Create an SFAPI Client to launch remote jobs at NERSC. | ||
| - **`orchestration/flows/scicat/ingest.py`**: Ingests datasets into SciCat, our metadata management system. | ||
| - **`orchestration/hpss.py`**: Schedule a Prefect Flow to copy data between NERSC CFS and HPSS. These call the relevant TransferControllers for HPSS, which handle the underlying tape-safe logic. | ||
|
|
||
|
|
||
| ## Beamline Specific Implementation Patterns | ||
| In order to balance generalizability, maintainability, and scalability of this project to multiple beamlines, we try to organize specific implementations in a similar way. We keep specific implementaqtions in the directory `orchestration/flows/bl{beamline_id}/`, which generally contains a few things: | ||
| - **`config.py`** | ||
| - Extend `BeamlineConfig()` from `orchestration/config.py` for specific implementations (e.g. `Config832`, `Config733`, etc.) This ensures only the relevant beamline specific configurations are used in each case. | ||
| - **`dispatcher.py`** | ||
| - This script is the starting point for each beamline's data transfer and analysis workflow. The Prefect Flow it contains is generally invoked by a File Watcher script on the beamline computer. The Dispatcher contains the logic for calling subflows, ensures that steps are completed in the correct order, and prevents subsequent steps from being called if there is a failure along the way. | ||
| - **`move.py`** | ||
| - This script is usually the first one the Dispatcher calls synchronously, and contains the logic for immediately moving data, scheduling pruning flows, and ingesting into SciCat. Downstream steps typically rely on this action completing first. | ||
| - **`job_controller.py`** | ||
| - For beamlines that trigger remote analysis workflows, the `JobController()` ABC allows us to define HPC or machine specific implementations, which may differ in how code can be deployed. For example, it can be extended to define how to run tomography reconstruction at ALCF and NERSC. | ||
| - **`{hpc}.py`** | ||
| - We separate HPC implementations for `JobController()` in their own files. | ||
| - **`ingest.py`** | ||
| - This is where we define SciCat implementations for each beamline, as each technique will have specific metadata fields that are important to capture. | ||
|
|
||
| ## Testing | ||
| We write Unit Tests using [pytest](https://pytest.org/) for individual components, which can be found in `orchestration/_tests/`. We run these tests as part of our Github Actions. | ||
|
|
||
| ## CI/CD | ||
| The project is integrated with [GitHub Actions](https://github.com/features/actions) for continuous integration and deployment. The specifics for these can be found in `.github/workflows/`. The features we support here includes: | ||
|
|
||
| - **Automated Test Execution**: All the unit tests are run automatically with every Git Push. | ||
| - **Linting**: `flake8` is used to check for syntax and styling errors. | ||
| - **MkDocs**: The documentation site is automatically updated whenever a Pull Request is merged into the main branch. | ||
| - **Docker**: A Docker image is aumatically created and registered on the Github Container Repository (ghcr.io) when a new release is made. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.