Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
01aad62
Initial commit for issue #59. This is still a WIP and requires thorou…
davramov Feb 11, 2025
1d10213
Adding a generic BeamlineConfig(ABC) class in orchestration/config.py…
davramov Feb 11, 2025
1a4e893
Added logic for HPSSToCFSTransferController() copy() method. Now it w…
davramov Feb 11, 2025
cfa5a0c
Moving endpoint definitions to orchestration/transfer_endpoint.py so …
davramov Feb 11, 2025
92c2eb6
Added a get_prune_controller() method.
davramov Feb 11, 2025
3a02357
Getting ready to test CFSToHPSSTransferController flow. Created a Pre…
davramov Feb 14, 2025
7db34e9
Fixed syntax errors
davramov Feb 14, 2025
65ab2e0
Cleaned up logging for the CFSToHPSSTransferController slurm job scri…
davramov Feb 15, 2025
17a1a52
adding support for dynaconf for handling beamline configurations
davramov Feb 18, 2025
343ef75
Adding logic for the prune_filesystem_files flow in prune_controller.py
davramov Feb 19, 2025
f2bc19b
Moving specific prune implementations as private/internal methods in …
davramov Feb 19, 2025
7c686c1
Adding documentation outlining shared infrastructure between beamline…
davramov Feb 19, 2025
3071709
Linting and adding a few TODO comments
davramov Feb 19, 2025
a68009c
Fixed get_prune_controller() method to accept an Enum for supported t…
davramov Feb 19, 2025
f5af676
Working on a SciCat Ingestor Controller ABC, and a BL832 implementati…
davramov Feb 22, 2025
4f0170f
sourceFolder expects a string
davramov Feb 22, 2025
fa62fa7
generalizing relevant code from ingest_tomo832.py, refactoring into a…
davramov Feb 24, 2025
40ae3df
Refactoring scicat ingestion to be more modular and structured
davramov Feb 25, 2025
6941438
Adding a custom Prefect Block to keep track of pending project names …
davramov Feb 25, 2025
49c5b79
Added three dispatchers for bl832 to handle archiving data to HPSS: 1…
davramov Feb 25, 2025
a7496b7
Updating documentation and comments
davramov Feb 26, 2025
8e384da
Testing HPSS flow, verified that I can create new directories on HPSS…
davramov Feb 26, 2025
c1e652e
Updating documentation:
davramov Feb 26, 2025
9cd6ddc
Verified that htar bundling and building on HPSS works
davramov Feb 26, 2025
0e7ff47
Updating HPSS documentation typo
davramov Feb 26, 2025
c78dc96
Adding a _find_dataset() method to ingestor_controller base class, in…
davramov Feb 27, 2025
3278a1f
Successfully extracted files from a .tar archive on HPSS back to CFS!…
davramov Feb 27, 2025
6618d99
Moving all HPSS related transfer/prune implementations into orchestra…
davramov Feb 27, 2025
4e9215c
Adjusting hpss imports in the transfer_controller pytest
davramov Feb 27, 2025
58eeec5
Verified that the HPSSPruneController successfully pruned from HPSS
davramov Feb 28, 2025
33fce11
Simplified file paths within the .tar archives on HPSS so it referenc…
davramov Feb 28, 2025
09e3403
Updated the HPSS flows in bl832/dispatcher.py to include updating the…
davramov Mar 4, 2025
abe84fe
Updated docstrings, logging, error handling
davramov Mar 5, 2025
0fb0f3e
Updated docstrings, logging, typing, error handling
davramov Mar 5, 2025
8674291
Improved error logging and exception handling for tape flows
davramov Mar 7, 2025
f0b7651
Verified SciCat login to the latest Docker version of scicatlive. It …
davramov Mar 7, 2025
274418f
Fixing pytest errors and failures
davramov Mar 7, 2025
df53e7c
Testing scicat ingestion locally on a small test h5 dataset from 832.…
davramov Mar 7, 2025
1f60795
Testing scicat ingestion locally on a small test h5 dataset from 832.…
davramov Mar 7, 2025
3504f5a
updated and tested add_new_dataset_location() in orchestration/flows/…
davramov Mar 10, 2025
300f92e
Addedsupport for linking tomography reconstructions as derived datase…
davramov Mar 11, 2025
0045461
Fixed thumbnails uploaded for derived tiff/zarr datasets in SciCat
davramov Mar 11, 2025
db6b4fc
Adding logic to the remove_dataset_location() function in ingestor_co…
davramov Mar 12, 2025
f65b800
Adding a test script (not pytest) for end-to-end validation of the co…
davramov Mar 18, 2025
05ed15a
Moved test_controllers_end_to_end.py to the scripts/ folder does it d…
davramov Mar 18, 2025
ded7bec
Addressing Dylan and Raja's comments
davramov Apr 7, 2025
99d64cf
Updating end-to-end tests
davramov Apr 23, 2025
edfe26f
For the HPSS->CFS controller, convert slashes to underscores for the …
davramov Apr 23, 2025
60250b8
Added filter for files >65GB to be moved with HTAR. Verified that thi…
davramov Apr 23, 2025
07663de
Updating .env.example and README with current environment requirements.
davramov Apr 23, 2025
72c7449
Added comment that this will be moved to the scicat_beamline repo at …
davramov Apr 23, 2025
6ec45bb
Updating documentation for HPSS
davramov Apr 25, 2025
de1702b
Updating documentation for HPSS
davramov Apr 25, 2025
e98c17b
Adding ability to hpss controllers for ls command to see what's on tape
davramov Apr 29, 2025
44455ac
Fixed commenting
davramov May 5, 2025
49d7d2a
fixed pytest after rebasing
davramov May 5, 2025
9568274
bumping python from 3.11->3.12.5 to see if it fixes a TypeError where…
davramov May 5, 2025
af03920
Fix error message grammar
davramov May 15, 2025
35dc149
Fixing which flow the globus prune controller calls
davramov May 15, 2025
7e808cf
making the days_from_now parameter a float, which is converted into d…
davramov May 15, 2025
220b3a4
adding try and except to build_thumbnail, in case of edge cases (like…
davramov May 15, 2025
2121eb9
Updated documentation regarding hsi mkdir -p
davramov May 21, 2025
afdcda0
Addressing Garrett's comment about the add_new_dataset_location metho…
davramov May 21, 2025
6286208
Removing redunant datafile.path = file_path lines, as it is now confi…
davramov May 21, 2025
036ac15
updating requirements.txt
davramov May 21, 2025
2137159
Adding a TODO comment to _find_dataset to support searching dataFileL…
davramov May 21, 2025
0c26fb3
Adjusting sfapi pytest to match new expected value
davramov May 21, 2025
e11cc0f
Adjusting prune_controller.prune() calls to use a float for days_from…
davramov May 21, 2025
36c40f7
Renamining dummy to mock
davramov Jun 5, 2025
980e681
linting
davramov Jun 5, 2025
4c699a3
Fixed type (tranfer_client -> transfer_client)
davramov Jun 11, 2025
3bb47af
Fixed typo (tranfer_client -> transfer_client)
davramov Jun 11, 2025
b71ea42
Fixed typo (tranfer_client -> transfer_client)
davramov Jun 11, 2025
cc9d55f
Fixed typo (tranfer_client -> transfer_client)
davramov Jun 11, 2025
d030114
Adding a comment at the top
davramov Jun 11, 2025
4105d3b
Updating test_prune_controller.py based on Dylan's comments. Ensuring…
davramov Jun 11, 2025
70ff27e
Removing redundant logger.debug messages when initializing transfer c…
davramov Jun 11, 2025
9178769
Updating 733 docs sequence diagram
davramov Oct 13, 2025
6330d38
Using the tmp_path fixture for pytest
davramov Oct 13, 2025
0f1008a
Adding checks in prune_controller (Globus and Filesystem) for days_fr…
davramov Oct 13, 2025
70e7ab1
Adding a new pytest functino called test_globus_prune_schedules_when_…
davramov Oct 13, 2025
e898be0
updating test_prune_controller with a new function test_fs_prune_sche…
davramov Oct 13, 2025
26f78fa
Fixing tests for when days_from_now == 0
davramov Oct 13, 2025
30b2542
Using Pathlib instead of os.path
davramov Oct 13, 2025
c63ff16
removing redundat MockClient assertion
davramov Oct 13, 2025
6109c92
Rewriting the filesystem transfercontroller to follow Kate's suggesti…
davramov Oct 13, 2025
2b046dd
Rewriting the filesystem transfercontroller exception handling to fol…
davramov Oct 13, 2025
67626ff
Linting
davramov Oct 13, 2025
e7125cc
Set self.config = None after assigning beamline-specific configs
davramov Oct 14, 2025
f04b0ae
Enforcing consistency for beamline_id to be a str with numbers delimi…
davramov Oct 14, 2025
81915be
Adding env var check to the start of the main method
davramov Oct 14, 2025
a9db743
Making the check_required_envars() function more verbose with which v…
davramov Oct 14, 2025
13509ca
Removing unused imports
davramov Oct 14, 2025
ec2afc6
Adding better checks to make sure env variables are set
davramov Oct 14, 2025
2f89be9
Using pathlib to build the full path
davramov Oct 14, 2025
c365640
Updating docstring
davramov Oct 14, 2025
45b56a9
Including comment about circular dependencies
davramov Oct 14, 2025
3f69853
Removing main block (tests are handled in scripts/test_controllers_en…
davramov Oct 14, 2025
4992204
Prefect/Globus env variable checks throw warnings rather than errors …
davramov Oct 14, 2025
74bc670
Elevating pruning flows out of the controller classes.
davramov Oct 14, 2025
fbb1a18
Moving HPSS slurm jobs into a folder orchestration/slurm/
davramov Oct 15, 2025
fcb1cb2
Ensuring all variables are passed into hpss_to_cfs.slurm
davramov Oct 15, 2025
2cf5e44
moving hpss main method to it's own check_hpss script
davramov Oct 15, 2025
05712ad
Adding a note about encode_thumbnail being part of scicat
davramov Oct 15, 2025
ed35aeb
capitalizing severity enum options
davramov Oct 15, 2025
30aa7c0
Adding comment about cput not overwriting
davramov Oct 15, 2025
80a95bc
Deleting unused sfapi key variable
davramov Oct 15, 2025
08a1dcc
Docstring
davramov Oct 15, 2025
6ebe839
Docstring
davramov Oct 15, 2025
21e8d82
renaming login_to_scicat to get_scicat_client
davramov Oct 15, 2025
6a4fcf6
removing redundant error message
davramov Oct 15, 2025
b0a7d1e
Docstring
davramov Oct 15, 2025
354e151
removing test code
davramov Oct 15, 2025
4a7030f
Fixing project path endpoint to nersc cfs endpoint. Updating the log …
davramov Oct 15, 2025
a64745b
adding log paths to error message
davramov Oct 15, 2025
7ce99bb
Addressing comments regarding hpss dispatcher flows.
davramov Oct 21, 2025
0faae48
Adjusting comments to be clearer
davramov Oct 21, 2025
2749500
Adding typing throughout
davramov Oct 21, 2025
fba4b5d
Docstrings
davramov Oct 21, 2025
2c548f2
replacing old move.py with move_refactor.py
davramov Oct 22, 2025
d7c353b
Documentation
davramov Oct 22, 2025
0a4e74b
Adding screenshots to SFAPI documentation
davramov Oct 23, 2025
40a7b61
hyperlink to sfapi example
davramov Oct 23, 2025
59c54bf
Reducing the memory from 20GB to 2GB due to sbatch: error: More resou…
davramov Oct 24, 2025
ae01fb2
Check if scicat's base url includes localhost, and pass in the backen…
davramov Oct 28, 2025
bcea0a2
removing main method used for testing
davramov Nov 4, 2025
e13b0a3
replacing run_specific_flow with run_deployment after rebase
davramov Nov 4, 2025
ca3db6d
making sure process_new_832_file_task is called by dispatcher rather …
davramov Nov 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 15 additions & 7 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
GLOBUS_CLIENT_ID=<globus_client_id>
GLOBUS_CLIENT_SECRET=<globus_client_secret>
PREFECT_API_URL=<url_of_prefect_server>
PREFECT_API_KEY=<prefect_client_secret>
PUSHGATEWAY_URL=<url_of_pushgateway_server>
JOB_NAME=<jobname_for_pushgateway>
INSTANCE_LABEL=<label_for_pushgateway>
GLOBUS_CLIENT_ID=<globus_client_id> # For Globus Transfer
GLOBUS_CLIENT_SECRET=<globus_client_secret> # For Globus Transfer
GLOBUS_COMPUTE_CLIENT_ID=<globus_client_id> # For ALCF Jobs
GLOBUS_COMPUTE_CLIENT_SECRET=<globus_client_secret> # For ALCF Jobs
GLOBUS_COMPUTE_ENDPOINT=<globus_compute_endpoint> # For ALCF Jobs
PREFECT_API_URL=<url_of_prefect_server> # For Prefect Flows
PREFECT_API_KEY=<prefect_client_secret> # For Prefect Flows
SCICAT_API_URL=<url_of_scicat_api> # For SciCat Ingest
SCICAT_INGEST_USER=<scicat_ingest_user> # For SciCat Ingest
SCICAT_INGEST_PASSWORD=<scicat_ingest_password> # For SciCat Ingest
PATH_NERSC_CLIENT_ID=<path_nersc_client_id> # For NERSC SFAPI
PATH_NERSC_PRI_KEY=<path_nersc_private_key> # For NERSC SFAPI
PUSHGATEWAY_URL=<url_of_pushgateway_server> # For Grafana Pushgateway
JOB_NAME=<jobname_for_pushgateway> # For Grafana Pushgateway
INSTANCE_LABEL=<label_for_pushgateway> # For Grafana Pushgateway
4 changes: 2 additions & 2 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ jobs:

steps:
- uses: actions/checkout@v2
- name: Set up Python 3.11
- name: Set up Python 3.12.5
uses: actions/setup-python@v5
with:
python-version: 3.11
python-version: 3.12.5
cache: 'pip'
- name: Install dependencies
run: |
Expand Down
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,18 @@ $ pip3 install -e .
Use `.env.example` as a template.

```
GLOBUS_CLIENT_ID=<globus_client_id>
GLOBUS_CLIENT_SECRET=<globus_client_secret>
PREFECT_API_URL=<url_of_prefect_server>
PREFECT_API_KEY=<prefect_client_secret>
GLOBUS_CLIENT_ID=<globus_client_id> # For Globus Transfer
GLOBUS_CLIENT_SECRET=<globus_client_secret> # For Globus Transfer
GLOBUS_COMPUTE_CLIENT_ID=<globus_client_id> # For ALCF Jobs
GLOBUS_COMPUTE_CLIENT_SECRET=<globus_client_secret> # For ALCF Jobs
GLOBUS_COMPUTE_ENDPOINT=<globus_compute_endpoint> # For ALCF Jobs
PREFECT_API_URL=<url_of_prefect_server> # For Prefect Flows
PREFECT_API_KEY=<prefect_client_secret> # For Prefect Flows
SCICAT_API_URL=<url_of_scicat_api> # For SciCat Ingest
SCICAT_INGEST_USER=<scicat_ingest_user> # For SciCat Ingest
SCICAT_INGEST_PASSWORD=<scicat_ingest_password> # For SciCat Ingest
PATH_NERSC_CLIENT_ID=<path_nersc_client_id> # For NERSC SFAPI, generate on https://iris.nersc.gov/
PATH_NERSC_PRI_KEY=<path_nersc_private_key> # For NERSC SFAPI
```

## Current workflow overview and status:
Expand Down
5 changes: 5 additions & 0 deletions config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,11 @@ globus:
client_id: ${GLOBUS_CLIENT_ID}
client_secret: ${GLOBUS_CLIENT_SECRET}

hpss_alsdev:
root_path: /home/a/alsdev/data_mover
uri: nersc.gov
name: hpss_alsdev

harbor_images832:
recon_image: tomorecon_nersc_mpi_hdf5@sha256:cc098a2cfb6b1632ea872a202c66cb7566908da066fd8f8c123b92fa95c2a43c
multires_image: tomorecon_nersc_mpi_hdf5@sha256:cc098a2cfb6b1632ea872a202c66cb7566908da066fd8f8c123b92fa95c2a43c
Expand Down
12 changes: 11 additions & 1 deletion create_deployment_832_dispatcher.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,14 @@ export $(grep -v '^#' .env | xargs)

prefect work-pool create 'dispatcher_pool'
prefect deployment build ./orchestration/flows/bl832/dispatcher.py:dispatcher -n run_832_dispatcher -q bl832 -p dispatcher_pool
prefect deployment apply dispatcher-deployment.yaml
prefect deployment apply dispatcher-deployment.yaml

prefect work-pool create 'hpss_pool'
prefect deployment build ./orchestration/flows/bl832/dispatcher.py:archive_832_project_dispatcher -n run_archive_832_project_dispatcher -q hpss_dispatcher_queue -p hpss_pool
prefect deployment apply archive_832_project_dispatcher-deployment.yaml

prefect deployment build ./orchestration/flows/bl832/dispatcher.py:archive_832_projects_from_previous_cycle_dispatcher -n run_archive_832_projects_from_previous_cycle_dispatcher -q hpss_dispatcher_queue -p hpss_pool
prefect deployment apply archive_832_projects_from_previous_cycle_dispatcher-deployment.yaml

prefect deployment build ./orchestration/flows/bl832/dispatcher.py:archive_all_832_raw_projects_dispatcher -n run_archive_all_832_raw_projects_dispatcher -q hpss_dispatcher_queue -p hpss_pool
prefect deployment apply archive_all_832_raw_projects_dispatcher-deployment.yaml
7 changes: 7 additions & 0 deletions create_deployment_832_hpss.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
export $(grep -v '^#' .env | xargs)


prefect work-pool create 'hpss_pool'

prefect deployment build ./orchestration/flows/bl832/hpss.py:cfs_to_hpss_flow -n cfs_to_hpss_flow -q cfs_to_hpss_queue -p hpss_pool
prefect deployment apply cfs_to_hpss_flow-deployment.yaml
2 changes: 1 addition & 1 deletion docs/bl832_ALCF.md
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ def prune_alcf832_raw(relative_path: str):
prune_one_safe(
file=relative_path,
if_older_than_days=0,
tranfer_client=tc,
transfer_client=tc,
source_endpoint=config.alcf832_raw,
check_endpoint=config.nersc832_alsdev_raw,
logger=p_logger,
Expand Down
88 changes: 88 additions & 0 deletions docs/mkdocs/docs/733.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Beamline 7.3.3


## Flow Diagram
```mermaid
sequenceDiagram
participant DET as Detector/<br/>File Watcher
participant DISP as Prefect<br/>Dispatcher
participant D733 as data733<br/>Storage
participant GLOB as Globus<br/>Transfer
participant CFS as NERSC<br/>CFS
participant CAT as SciCat<br/>Metadata
participant SFAPI as SFAPI
participant HPC as HPC<br/>Compute
participant HPSS as HPSS<br/>Tape

%% Initial Trigger
DET->>DET: Monitor filesystem
DET->>DISP: Trigger on new file
DISP->>DISP: Coordinate flows

%% Flow 1: new_file_733
rect rgb(220, 230, 255)
note over DISP,CAT: FLOW 1: new_file_733
DISP->>GLOB: Init transfer
activate GLOB
GLOB->>D733: Initiate copy
activate D733
D733-->>GLOB: Copy initiated
deactivate D733
%% note right of GLOB: Transfer in progress
GLOB-->>DISP: Transfer complete
deactivate GLOB

DISP->>CAT: Register metadata
end

%% Flow 2: HPSS Transfer
rect rgb(220, 255, 230)
note over DISP,CAT: FLOW 2: Scheduled HPSS Transfer
DISP->>SFAPI: Submit tape job
activate SFAPI
SFAPI->>HPSS: Initiate archive
activate HPSS
HPSS-->>SFAPI: Archive complete
deactivate HPSS
SFAPI-->>DISP: Job complete
deactivate SFAPI

DISP->>CAT: Update metadata
end

%% Flow 3: HPC Analysis
rect rgb(255, 230, 230)
note over DISP,HPC: FLOW 3: HPC Downstream Analysis
DISP->>SFAPI: Submit compute job
activate SFAPI
SFAPI->>HPC: Execute job
activate HPC
HPC->>HPC: Process data
HPC-->>SFAPI: Compute complete
deactivate HPC
SFAPI-->>DISP: Job complete
deactivate SFAPI

DISP->>CAT: Update metadata
end

%% Flow 4: Scheduled Pruning
rect rgb(255, 255, 220)
note over DISP,CAT: FLOW 4: Scheduled Pruning
DISP->>DISP: Scheduled pruning trigger

DISP->>D733: Prune old files
activate D733
D733->>D733: Delete expired data
D733-->>DISP: Pruning complete
deactivate D733

DISP->>CFS: Prune old files
activate CFS
CFS->>CFS: Delete expired data
CFS-->>DISP: Pruning complete
deactivate CFS

DISP->>CAT: Update metadata
end
```
2 changes: 1 addition & 1 deletion docs/mkdocs/docs/alcf832.md
Original file line number Diff line number Diff line change
Expand Up @@ -389,7 +389,7 @@ def prune_alcf832_raw(relative_path: str):
prune_one_safe(
file=relative_path,
if_older_than_days=0,
tranfer_client=tc,
transfer_client=tc,
source_endpoint=config.alcf832_raw,
check_endpoint=config.nersc832_alsdev_raw,
logger=p_logger,
Expand Down
Binary file added docs/mkdocs/docs/assets/images/sfapi_step1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/mkdocs/docs/assets/images/sfapi_step2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/mkdocs/docs/assets/images/sfapi_step3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/mkdocs/docs/assets/images/sfapi_step4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
47 changes: 47 additions & 0 deletions docs/mkdocs/docs/common_infrastructure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Common Infrastructure

## Overview
The common infrastructure for this project includes:
- **Shared Code**: There are general functions and classes used across beamline workflows to reduce code duplication.
- **Beamline Specific Implementation Patterns**: We organize each beamline's implementation in a similar way, making it easier to understand and maintain.

## Shared Code
Shared code is organized into modules that can be imported in beamline specific implementations. Key modules include:
- **`orchestration/config.py`**
- Contains an Abstract Base Class (ABC) called `BeamlineConfig()` which serves as the base for all beamline-specific configuration classes. It uses the `Dynaconf` package to load the configuration file,`config.yml`, which contains information about endpoints, containers, and more.
- **`orchestration/transfer_endpoints.py`**
- Contains an ABC called `TransferEndpoint()`, which is extended by `FileSystemEndpoint`, `HPSSEndpoint` and `GlobusEndpoint`. These definitions are used to enforce typing and ensure the correct transfer and pruning implmentation are used.
- **`orchestration/transfer_controller.py`**:
- Contains an ABC called `TransferController()` with specific implementations for Globus, Local File Systems, and NERSC HPSS.
- **`orchestration/prune_controller.py`**
- This module is responsible for managing the pruning of data off of storage systems. It uses a configurable retention policy to determine when to remove files. It contains an ABC called `PruneController()` that is extended by specific implementations for `FileSystemEndpoint`, `GlobusEndpoint`, and `HPSSEndpoint`.
- **`orchestration/sfapi.py`**: Create an SFAPI Client to launch remote jobs at NERSC.
- **`orchestration/flows/scicat/ingest.py`**: Ingests datasets into SciCat, our metadata management system.
- **`orchestration/hpss.py`**: Schedule a Prefect Flow to copy data between NERSC CFS and HPSS. These call the relevant TransferControllers for HPSS, which handle the underlying tape-safe logic.


## Beamline Specific Implementation Patterns
In order to balance generalizability, maintainability, and scalability of this project to multiple beamlines, we try to organize specific implementations in a similar way. We keep specific implementaqtions in the directory `orchestration/flows/bl{beamline_id}/`, which generally contains a few things:
- **`config.py`**
- Extend `BeamlineConfig()` from `orchestration/config.py` for specific implementations (e.g. `Config832`, `Config733`, etc.) This ensures only the relevant beamline specific configurations are used in each case.
- **`dispatcher.py`**
- This script is the starting point for each beamline's data transfer and analysis workflow. The Prefect Flow it contains is generally invoked by a File Watcher script on the beamline computer. The Dispatcher contains the logic for calling subflows, ensures that steps are completed in the correct order, and prevents subsequent steps from being called if there is a failure along the way.
- **`move.py`**
- This script is usually the first one the Dispatcher calls synchronously, and contains the logic for immediately moving data, scheduling pruning flows, and ingesting into SciCat. Downstream steps typically rely on this action completing first.
- **`job_controller.py`**
- For beamlines that trigger remote analysis workflows, the `JobController()` ABC allows us to define HPC or machine specific implementations, which may differ in how code can be deployed. For example, it can be extended to define how to run tomography reconstruction at ALCF and NERSC.
- **`{hpc}.py`**
- We separate HPC implementations for `JobController()` in their own files.
- **`ingest.py`**
- This is where we define SciCat implementations for each beamline, as each technique will have specific metadata fields that are important to capture.

## Testing
We write Unit Tests using [pytest](https://pytest.org/) for individual components, which can be found in `orchestration/_tests/`. We run these tests as part of our Github Actions.

## CI/CD
The project is integrated with [GitHub Actions](https://github.com/features/actions) for continuous integration and deployment. The specifics for these can be found in `.github/workflows/`. The features we support here includes:

- **Automated Test Execution**: All the unit tests are run automatically with every Git Push.
- **Linting**: `flake8` is used to check for syntax and styling errors.
- **MkDocs**: The documentation site is automatically updated whenever a Pull Request is merged into the main branch.
- **Docker**: A Docker image is aumatically created and registered on the Github Container Repository (ghcr.io) when a new release is made.
Loading