|
| 1 | +--- |
| 2 | +id: Data_Structure_and_Policies |
| 3 | +toplevel: true |
| 4 | +title: Data Structure and Policies |
| 5 | +--- |
| 6 | + |
| 7 | +## UCGD Directory Structure |
| 8 | + |
| 9 | +- $COMMON is a non-lustre directory for shared applications and data |
| 10 | + |
| 11 | + /uufs/chpc.utah.edu/common/PE/proj_UCGD/ |
| 12 | + or |
| 13 | + /uufs/chpc.utah.edu/common/PE/proj_UCGDCollab/ |
| 14 | + ├──apps |
| 15 | + ├──modulefiles |
| 16 | + ├──data |
| 17 | + ├──lustre_ACLs |
| 18 | + └──shell_scripts |
| 19 | + |
| 20 | + apps - Installed applications and modules.<br/> |
| 21 | + modulefiles - Configuration files for application in apps (used by lmod).<br/> |
| 22 | + data - Shared datasets used by apps (i.e. GATK Bundles)<br/> |
| 23 | + lustre_ACLs - ACL scripts applied to lustre<br/> |
| 24 | + shell_scripts - Template scripts for jobs, prelaunch/postlaunch scripts, and environmental scripts (bashrc, bash_profile, etc). |
| 25 | + |
| 26 | + |
| 27 | +- Recharge Center Lustre |
| 28 | + |
| 29 | + /scratch/ucgd/lustre/ |
| 30 | + ├──common |
| 31 | + │ └──data #synced with …/proj_UCGD/common/data |
| 32 | + ├──Purgatory |
| 33 | + ├──UCGD_Analysis |
| 34 | + ├──UCGD_Processing |
| 35 | + ├──UCGD_Datahub |
| 36 | + │ └──Repository |
| 37 | + └──work |
| 38 | + └──proj_UCGD |
| 39 | + └──u0123456 |
| 40 | + |
| 41 | + common/data - Fast IO sync of $COMMON/data<br/> |
| 42 | + Purgatory - Holding place for data waiting to be deleted (see if anyone complains before emptying the trash)<br/> |
| 43 | + UCGD_Analysis - Project directories for UCGD analysts<br/> |
| 44 | + UCGD_Processing - Working directory for data download and pipeline processing<br/> |
| 45 | + UCGD_Datahub - Final storage location of data. Served up by Gnomex. |
| 46 | + |
| 47 | + |
| 48 | +- UCGD legacy Lustre |
| 49 | + |
| 50 | + /scratch/ucgd/lustre-work/ |
| 51 | + ├──marth |
| 52 | + │ └──u0123456 |
| 53 | + ├──quinlan |
| 54 | + │ └──u0123456 |
| 55 | + └──yandell |
| 56 | + └──u0123456 |
| 57 | + |
| 58 | + marth, quinlan, yandell - Each lab in the UCGD gets a group work directory with a hard quota of 600TB and 200 million files (200TB and 50 million file user limit)<br/> |
| 59 | + temp - An unlimited work location for temporary files and data. Will be cleaned aggressively. |
| 60 | + |
| 61 | + |
| 62 | +- Isilon (general environment) |
| 63 | + |
| 64 | + /uufs/chpc.utah.edu/common/home/ |
| 65 | + ├── marth-ucgdstor |
| 66 | + ├── quinlan-ucgdstor |
| 67 | + └── yandell-ucgdstor |
| 68 | + |
| 69 | +marth, quinlan, yandell - Each lab in the UCGD gets a group work directory with 100TB of storage. |
| 70 | + |
| 71 | + |
| 72 | +- UCGD Serial (general environment) |
| 73 | + |
| 74 | + /scratch/ucgd/serial/ |
| 75 | + |
| 76 | +Space is currently being used by the UCGD-SRC group as a temporary storage space as we finalize other storage options. 175 TB of storage. |
| 77 | + |
| 78 | +Contact Carson Holt if you have questions about this space. |
| 79 | + |
| 80 | + |
| 81 | +## CEPH Storage |
| 82 | + |
| 83 | +The CEPH object storage is used to archive data in UCGD_Datahub under |
| 84 | +PolishedBams directories. AS long as result files are lossless, they |
| 85 | +function as an archive of the original Primary_Data files. You can |
| 86 | +access CEPH archives using rclone. |
| 87 | + |
| 88 | +See documentation on how to setup and use Rclone: |
| 89 | +[CHPC Documentation on Rclone](https://www.chpc.utah.edu/documentation/software/rclone.php) |
| 90 | + |
| 91 | + |
| 92 | +## UCGD_Datahub Repository Policies |
| 93 | + |
| 94 | +1. No softlinks in Primary_Data or Project_Setup unless it's to another |
| 95 | + Primary_Data or PolishedBams file directory. |
| 96 | +2. No softlinks in PolishedBams unless it's to another PolishedBams |
| 97 | + file. |
| 98 | +3. No Primary or Polished data should be inside ExternalData unless |
| 99 | + it's a softlink. |
| 100 | +4. PolishedBams and not Primary_Data is what gets backed up |
| 101 | + (Primary_Data is considered a temporary directory). |
| 102 | +5. 3 months after Billing, Primary_Data and Project_Setup get deleted |
| 103 | + after meeting the following criteria. |
| 104 | + 1. All files in PolishedBams have been lossless validated |
| 105 | + 2. All PolishedBam files have been backed up to CEPH storage |
| 106 | + 3. Immutable bit is set on PolishedBams |
| 107 | +6. PolishedBams are always CRAM and not BAM. |
| 108 | +7. 3 years after billing project is retired and all data removed. |
0 commit comments