diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml new file mode 100644 index 0000000..6b3a861 --- /dev/null +++ b/.github/workflows/deploy.yml @@ -0,0 +1,26 @@ +name: deploy + +on: + push: + branches: + - main + + deploy-book: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - name: Set up Python 3.10.13 + uses: actions/setup-python@v2 + with: + python-version: 3.10.13 + - name: Install Dependencies + run: | + pip install -r requirements.txt + - name: Build Content + run: | + jupyter-book build source + - name: GitHub Pages action + uses: peaceiris/actions-gh-pages@v3.6.1 + with: + github_token: ${{ secrets.GITHUB_TOKEN }} + publish_dir: source/_build/html diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..69fa449 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +_build/ diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..deef5c8 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,3 @@ +python==3.10.13 +jupyter-book==0.15.1 +jupytext==1.15.2 diff --git a/source/_config.yml b/source/_config.yml new file mode 100644 index 0000000..b6d4d50 --- /dev/null +++ b/source/_config.yml @@ -0,0 +1,29 @@ +title: "Tufts University HPC Cluster User Guide" +author: Tufts University Research Technology +copyright: "2022" +logo: logo.png + +html: + favicon: favicon.ico + use_edit_page_button: true + use_repository_button: true + use_issues_button: true + home_page_in_navbar: false + +repository: + url: https://github.com/tuftsrt/hpc-documentation + path_to_book: source + branch: main + +sphinx: + recursive_update: true + config: + html_last_updated_fmt: "%b %d, %Y" + html_theme_options: + logo: + text: "HPC Cluster User Guide" + repository_provider: github + nb_custom_formats: + .Rmd: + - jupytext.reads + - fmt: Rmd diff --git a/source/_toc.yml b/source/_toc.yml new file mode 100644 index 0000000..9289b0b --- /dev/null +++ b/source/_toc.yml @@ -0,0 +1,15 @@ +format: jb-book +root: index +parts: + - caption: Migrated Materials + chapters: + - file: migrated-materials/what-is-the-cluster + - file: migrated-materials/navigate-to-cluster + - caption: Examples + chapters: + - file: examples/dynamic-command-example + - file: examples/jupyter-notebook-example + - file: examples/markdown-notebook-example + - file: examples/r-markdown-example + - file: examples/alphafold.rst + title: reStructuredText Example diff --git a/source/examples/alphafold.rst b/source/examples/alphafold.rst new file mode 100644 index 0000000..dc6fbfc --- /dev/null +++ b/source/examples/alphafold.rst @@ -0,0 +1,142 @@ +.. _backbone-label: + +Alphafold +========= + +Introduction +~~~~~~~~~~~~ +``Alphafold`` is a protein structure prediction tool developed by DeepMind (Google). It uses a novel machine learning approach to predict 3D protein structures from primary sequences alone. The source code is available on `Github`_. It has been deployed in all RCAC clusters, supporting both CPU and GPU. + +It also relies on a huge database. The full database (~2.2TB) has been downloaded and setup for users. + +Protein struction prediction by alphafold is performed in the following steps: + +* Search the amino acid sequence in uniref90 database by jackhmmer (using CPU) +* Search the amino acid sequence in mgnify database by jackhmmer (using CPU) +* Search the amino acid sequence in pdb70 database (for monomers) or pdb_seqres database (for multimers) by hhsearch (using CPU) +* Search the amino acid sequence in bfd database and uniclust30 (updated to uniref30 since v2.3.0) database by hhblits (using CPU) +* Search structure templates in pdb_mmcif database (using CPU) +* Search the amino acid sequence in uniprot database (for multimers) by jackhmmer (using CPU) +* Predict 3D structure by machine learning (using CPU or GPU) +* Structure optimisation with OpenMM (using CPU or GPU) + +| For more information, please check: +| Home page: https://github.com/deepmind/alphafold + +Versions +~~~~~~~~ +- 2.3.0 +- 2.3.1 + +Commands +~~~~~~~~ +- run_alphafold.sh + +Usage +~~~~~ +The usage of Alphafold on our cluster is very straightford, users can create a flagfile containing the database path information:: + + run_alphafold.sh --flagfile=full_db.ff --fasta_paths=XX --output_dir=XX ... + +Users can check its detaied user guide in its `Github`_. + +full_db_20230311.ff (for alphafold v3) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Example contents of full_db_20231031.ff for monomer:: + + --db_preset=full_dbs + --bfd_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt + --data_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/ + --uniref90_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref90/uniref90.fasta + --mgnify_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/mgnify/mgy_clusters_2022_05.fa + --uniref30_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref30/UniRef30_2021_03 + --pdb70_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb70/pdb70 + --template_mmcif_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/mmcif_files + --obsolete_pdbs_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/obsolete.dat + --hhblits_binary_path=/usr/bin/hhblits + --hhsearch_binary_path=/usr/bin/hhsearch + --jackhmmer_binary_path=/usr/bin/jackhmmer + --kalign_binary_path=/usr/bin/kalign + +Example contents of full_db_20231031.ff for multimer:: + + --db_preset=full_dbs + --bfd_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt + --data_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/ + --uniref90_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref90/uniref90.fasta + --mgnify_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/mgnify/mgy_clusters_2022_05.fa + --uniref30_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref30/UniRef30_2021_03 + --pdb_seqres_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_seqres/pdb_seqres.txt + --uniprot_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniprot/uniprot.fasta + --template_mmcif_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/mmcif_files + --obsolete_pdbs_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/obsolete.dat + --hhblits_binary_path=/usr/bin/hhblits + --hhsearch_binary_path=/usr/bin/hhsearch + --jackhmmer_binary_path=/usr/bin/jackhmmer + --kalign_binary_path=/usr/bin/kalign + + +Example job using CPU +~~~~~~~~~~~~~~~~~~~~~ +.. warning:: + Using ``#!/bin/sh -l`` as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use ``#!/bin/bash`` instead. + +.. note:: + Notice that since version 2.2.0, the parameter ``--use_gpu_relax=False`` is required. + +To run alphafold using CPU:: + + #!/bin/bash + #SBATCH -p PartitionName # batch or your group's own partition + #SBATCH -t 24:00:00 + #SBATCH -N 1 + #SBATCH -n 1 + #SBATCH -c 10 + #SBATCH --mem=64G + #SBATCH --job-name=alphafold + #SBATCH --mail-type=FAIL,BEGIN,END + #SBATCH --error=%x-%J-%u.err + #SBATCH --output=%x-%J-%u.out + + module purge + module load alphafold/2.3.1 + + run_alphafold.sh --flagfile=full_db_20231031.ff \ + --fasta_paths=sample.fasta --max_template_date=2022-02-01 \ + --output_dir=af2_full_out --model_preset=monomer \ + --use_gpu_relax=False + +Example job using GPU +~~~~~~~~~~~~~~~~~~~~~ +.. warning:: + Using ``#!/bin/sh -l`` as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use ``#!/bin/bash`` instead. + +To run alphafold using GPU:: + + #!/bin/bash + #SBATCH -p PartitionName # gpu or preempt + #SBATCH -t 24:00:00 + #SBATCH -N 1 + #SBATCH -n 1 + #SBATCH -c 10 + #SBATCH --mem=64G + #SBATCH --gres=gpu:1 + #SBATCH --job-name=alphafold + #SBATCH --mail-type=FAIL,BEGIN,END + #SBATCH --error=%x-%J-%u.err + #SBATCH --output=%x-%J-%u.out + + module purge + module load alphafold/2.3.1 + + run_alphafold.sh --flagfile=full_db_20231031.ff \ + --fasta_paths=sample.fasta --max_template_date=2022-02-01 \ + --output_dir=af2_full_out --model_preset=monomer \ + --use_gpu_relax=True + + + + + + +.. _Github: https://github.com/deepmind/alphafold/ diff --git a/source/examples/dynamic-command-example.md b/source/examples/dynamic-command-example.md new file mode 100644 index 0000000..2852ef9 --- /dev/null +++ b/source/examples/dynamic-command-example.md @@ -0,0 +1,23 @@ +# Dynamic Command Example + +Tufts Username: + +```bash +ssh YOUR_UTLN@login.pax.tufts.edu +``` +The document itself was still originally written in Markdown with the textbox +and button added as HTML elements. The code performing the replacement can be +embedded into the Markdown document or loaded from a static JavaScript file. + +```{note} +This is just some hastily put together JavaScript so it will only work once. +``` + + + diff --git a/source/examples/jupyter-notebook-example.ipynb b/source/examples/jupyter-notebook-example.ipynb new file mode 100644 index 0000000..c8c2740 --- /dev/null +++ b/source/examples/jupyter-notebook-example.ipynb @@ -0,0 +1,31 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Jupyter Notebook Example\n", + "\n", + "This page was generated from a Jupyter Notebook. All code gets executed when the\n", + "page is generated and the outputs are included There is also an option to make\n", + "pages containing code interactive by configuring a suitable Binder backend." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import this\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/source/examples/markdown-notebook-example.md b/source/examples/markdown-notebook-example.md new file mode 100644 index 0000000..3f8c86b --- /dev/null +++ b/source/examples/markdown-notebook-example.md @@ -0,0 +1,20 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Markdown Notebook Example + +This page was created from a MyST Markdown document with a special YAML metadata +header and `{code-cell}` blocks. Any code in the aforementioned blocks gets +executed when the page is generated and the outputs displayed. + +```{code-cell} +import this +``` diff --git a/source/examples/r-markdown-example.Rmd b/source/examples/r-markdown-example.Rmd new file mode 100644 index 0000000..c27c8cc --- /dev/null +++ b/source/examples/r-markdown-example.Rmd @@ -0,0 +1,17 @@ +--- +jupyter: + kernelspec: + display_name: Python + language: python + name: python3 +--- + +# R Markdown Example + +This page is generated from an R Markdown document. Currently only execution +of Python code is supported as the Rmd document simply gets converted into a +Jupyter Notebook. But an R backend coud potentially be configured if desired. + +```{python} +import this +``` diff --git a/source/favicon.ico b/source/favicon.ico new file mode 100644 index 0000000..17f55d7 Binary files /dev/null and b/source/favicon.ico differ diff --git a/source/index.md b/source/index.md new file mode 100644 index 0000000..1e6080e --- /dev/null +++ b/source/index.md @@ -0,0 +1,7 @@ +# Tufts University HPC Cluster User Guide + +```{caution} +This website is for demonstration purposes only and does not contain any useable +documentation. Please see the current user guide hosted on Box for Tufts HPC +cluster documentation: https://tufts.box.com/HPC-New-User +``` diff --git a/source/logo.png b/source/logo.png new file mode 100644 index 0000000..e15a89b Binary files /dev/null and b/source/logo.png differ diff --git a/source/migrated-materials/images/coreNode.png b/source/migrated-materials/images/coreNode.png new file mode 100644 index 0000000..5ef4792 Binary files /dev/null and b/source/migrated-materials/images/coreNode.png differ diff --git a/source/migrated-materials/images/cpuGpu.png b/source/migrated-materials/images/cpuGpu.png new file mode 100644 index 0000000..c510199 Binary files /dev/null and b/source/migrated-materials/images/cpuGpu.png differ diff --git a/source/migrated-materials/images/hpcImage.png b/source/migrated-materials/images/hpcImage.png new file mode 100644 index 0000000..d918cc0 Binary files /dev/null and b/source/migrated-materials/images/hpcImage.png differ diff --git a/source/migrated-materials/images/memStore.png b/source/migrated-materials/images/memStore.png new file mode 100644 index 0000000..d4cd315 Binary files /dev/null and b/source/migrated-materials/images/memStore.png differ diff --git a/source/migrated-materials/navigate-to-cluster.md b/source/migrated-materials/navigate-to-cluster.md new file mode 100644 index 0000000..ebdd42b --- /dev/null +++ b/source/migrated-materials/navigate-to-cluster.md @@ -0,0 +1,11 @@ +# Accessing the Cluster + +## Command Line: + +You can access the Tufts HPC Cluster via command line with: +- The Terminal app on a Mac or Linux machine +- PuTTy or Cygwin SSH or SecureCRT or other SSH clients on a Windows machine + +```sh +ssh YOUR_UTLN@login.pax.tufts.edu +``` diff --git a/source/migrated-materials/what-is-the-cluster.md b/source/migrated-materials/what-is-the-cluster.md new file mode 100644 index 0000000..50dbfac --- /dev/null +++ b/source/migrated-materials/what-is-the-cluster.md @@ -0,0 +1,25 @@ +# What is the Cluster? + +Before getting to the cluster it is worth discussing what a cluster is and some of the terminology. First, let's discuss the difference between a CPU and a GPU. + +## CPU -- Central Processing Unit + - A CPU can never be fully replaced by a GPU + - Can be thought of as the taskmaster of the entire system, coordinating a wide range of general-purpose computing tasks + +## GPU -- Graphics Processing Unit + - GPUs were originally designed to create images for computer graphics and video game consoles + - Performing a narrower range of more specialized tasks + +![](images/cpuGpu.png) + +You'll notice that in the picture above the CPU is composed of a smaller unit, a **core**. A core is the computing unit in a CPU. You'll also note that the whole system (including CPUs, GPUs and Storage) is a single computer in the system called a **node**. + +![](images/coreNode.png) + +When a CPU performs some computation they use a storage hierarchy. This hierarchy places small/fast storage options close to the CPU and slower/larger options away from the CPU. These small/fast options are called **memory/RAM** while the slower/larger options are simply called **storage**. + +![](images/memStore.png) + +Now that we now the components we can put together an image of what a computer cluster is. A **computer cluster** is a group of loosely or tightly connected computers that work together as a single system. A **HPC (High Performance Compute) cluster** is a computer cluster capable of performing computations at high speeds. + +![](images/hpcImage.png)