upload demo

TuftsRT · Nov 28, 2023 · c8894e3 · c8894e3
1 parent f31bf89
commit c8894e3
Show file tree

Hide file tree

Showing 19 changed files with 350 additions and 0 deletions.
diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -0,0 +1,26 @@
+name: deploy
+
+on:
+  push:
+    branches:
+    - main
+
+  deploy-book:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python 3.10.13
+      uses: actions/setup-python@v2
+      with:
+        python-version: 3.10.13
+    - name: Install Dependencies
+      run: |
+        pip install -r requirements.txt
+    - name: Build Content
+      run: |
+        jupyter-book build source
+    - name: GitHub Pages action
+      uses: peaceiris/[email protected]
+      with:
+        github_token: ${{ secrets.GITHUB_TOKEN }}
+        publish_dir: source/_build/html
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+_build/
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,3 @@
+python==3.10.13
+jupyter-book==0.15.1
+jupytext==1.15.2
diff --git a/source/_config.yml b/source/_config.yml
@@ -0,0 +1,29 @@
+title: "Tufts University HPC Cluster User Guide"
+author: Tufts University Research Technology
+copyright: "2022"
+logo: logo.png
+
+html:
+  favicon: favicon.ico
+  use_edit_page_button: true
+  use_repository_button: true
+  use_issues_button: true
+  home_page_in_navbar: false
+
+repository:
+  url: https://github.com/tuftsrt/hpc-documentation
+  path_to_book: source
+  branch: main
+
+sphinx:
+  recursive_update: true
+  config:
+    html_last_updated_fmt: "%b %d, %Y"
+    html_theme_options:
+      logo:
+        text: "HPC Cluster User Guide"
+      repository_provider: github
+    nb_custom_formats:
+        .Rmd:
+            - jupytext.reads
+            - fmt: Rmd
diff --git a/source/_toc.yml b/source/_toc.yml
@@ -0,0 +1,15 @@
+format: jb-book
+root: index
+parts:
+  - caption: Migrated Materials
+    chapters:
+    - file: migrated-materials/what-is-the-cluster
+    - file: migrated-materials/navigate-to-cluster
+  - caption: Examples
+    chapters:
+    - file: examples/dynamic-command-example
+    - file: examples/jupyter-notebook-example
+    - file: examples/markdown-notebook-example
+    - file: examples/r-markdown-example
+    - file: examples/alphafold.rst
+      title: reStructuredText Example
diff --git a/source/examples/alphafold.rst b/source/examples/alphafold.rst
@@ -0,0 +1,142 @@
+.. _backbone-label:
+
+Alphafold
+=========
+
+Introduction
+~~~~~~~~~~~~
+``Alphafold`` is a protein structure prediction tool developed by DeepMind (Google). It uses a novel machine learning approach to predict 3D protein structures from primary sequences alone. The source code is available on `Github`_. It has been deployed in all RCAC clusters, supporting both CPU and GPU.
+
+It also relies on a huge database. The full database (~2.2TB) has been downloaded and setup for users.
+
+Protein struction prediction by alphafold is performed in the following steps:
+
+* Search the amino acid sequence in uniref90 database by jackhmmer (using CPU)
+* Search the amino acid sequence in  mgnify database by jackhmmer (using CPU)
+* Search the amino acid sequence in pdb70 database (for monomers) or pdb_seqres database (for multimers) by hhsearch (using CPU)
+* Search the amino acid sequence in bfd database and uniclust30 (updated to uniref30 since v2.3.0) database by hhblits (using CPU)
+* Search structure templates in pdb_mmcif database (using CPU)
+* Search the amino acid sequence in uniprot database (for multimers) by jackhmmer (using CPU)
+* Predict 3D structure by machine learning (using CPU or GPU)
+* Structure optimisation with OpenMM (using CPU or GPU)
+
+| For more information, please check:
+| Home page: https://github.com/deepmind/alphafold
+
+Versions
+~~~~~~~~
+- 2.3.0
+- 2.3.1
+
+Commands
+~~~~~~~~
+- run_alphafold.sh
+
+Usage
+~~~~~
+The usage of Alphafold on our cluster is very straightford, users can create a flagfile containing the database path information::
+
+   run_alphafold.sh --flagfile=full_db.ff --fasta_paths=XX --output_dir=XX ...
+
+Users can check its detaied user guide in its `Github`_.
+
+full_db_20230311.ff (for alphafold v3)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Example contents of full_db_20231031.ff for monomer::
+
+  --db_preset=full_dbs
+  --bfd_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
+  --data_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/
+  --uniref90_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref90/uniref90.fasta
+  --mgnify_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/mgnify/mgy_clusters_2022_05.fa
+  --uniref30_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref30/UniRef30_2021_03
+  --pdb70_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb70/pdb70
+  --template_mmcif_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/mmcif_files
+  --obsolete_pdbs_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/obsolete.dat
+  --hhblits_binary_path=/usr/bin/hhblits
+  --hhsearch_binary_path=/usr/bin/hhsearch
+  --jackhmmer_binary_path=/usr/bin/jackhmmer
+  --kalign_binary_path=/usr/bin/kalign
+
+Example contents of full_db_20231031.ff for multimer::
+
+  --db_preset=full_dbs
+  --bfd_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
+  --data_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/
+  --uniref90_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref90/uniref90.fasta
+  --mgnify_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/mgnify/mgy_clusters_2022_05.fa
+  --uniref30_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref30/UniRef30_2021_03
+  --pdb_seqres_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_seqres/pdb_seqres.txt
+  --uniprot_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniprot/uniprot.fasta
+  --template_mmcif_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/mmcif_files
+  --obsolete_pdbs_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/obsolete.dat
+  --hhblits_binary_path=/usr/bin/hhblits
+  --hhsearch_binary_path=/usr/bin/hhsearch
+  --jackhmmer_binary_path=/usr/bin/jackhmmer
+  --kalign_binary_path=/usr/bin/kalign
+
+
+Example job using CPU
+~~~~~~~~~~~~~~~~~~~~~
+.. warning::
+    Using ``#!/bin/sh -l`` as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use ``#!/bin/bash`` instead.
+
+.. note::
+   Notice that since version 2.2.0, the parameter ``--use_gpu_relax=False`` is required.
+
+To run alphafold using CPU::
+
+    #!/bin/bash
+    #SBATCH -p PartitionName  # batch or your group's own partition
+    #SBATCH -t 24:00:00
+    #SBATCH -N 1
+    #SBATCH -n 1
+    #SBATCH -c 10
+    #SBATCH --mem=64G
+    #SBATCH --job-name=alphafold
+    #SBATCH --mail-type=FAIL,BEGIN,END
+    #SBATCH --error=%x-%J-%u.err
+    #SBATCH --output=%x-%J-%u.out
+
+    module purge
+    module load alphafold/2.3.1
+
+    run_alphafold.sh --flagfile=full_db_20231031.ff  \
+        --fasta_paths=sample.fasta --max_template_date=2022-02-01 \
+        --output_dir=af2_full_out --model_preset=monomer \
+        --use_gpu_relax=False
+
+Example job using GPU
+~~~~~~~~~~~~~~~~~~~~~
+.. warning::
+    Using ``#!/bin/sh -l`` as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use ``#!/bin/bash`` instead.
+
+To run alphafold using GPU::
+
+    #!/bin/bash
+    #SBATCH -p PartitionName  # gpu or preempt
+    #SBATCH -t 24:00:00
+    #SBATCH -N 1
+    #SBATCH -n 1
+    #SBATCH -c 10
+    #SBATCH --mem=64G
+    #SBATCH --gres=gpu:1
+    #SBATCH --job-name=alphafold
+    #SBATCH --mail-type=FAIL,BEGIN,END
+    #SBATCH --error=%x-%J-%u.err
+    #SBATCH --output=%x-%J-%u.out
+
+    module purge
+    module load alphafold/2.3.1
+
+    run_alphafold.sh --flagfile=full_db_20231031.ff  \
+        --fasta_paths=sample.fasta --max_template_date=2022-02-01 \
+        --output_dir=af2_full_out --model_preset=monomer \
+        --use_gpu_relax=True
+
+
+
+
+
+
+.. _Github: https://github.com/deepmind/alphafold/
diff --git a/source/examples/dynamic-command-example.md b/source/examples/dynamic-command-example.md
@@ -0,0 +1,23 @@
+# Dynamic Command Example
+
+Tufts Username: <input type="text" id="utln" value="" size="10"> <button onclick="replaceUsername()">OK</button>
+
+```bash
+ssh [email protected]
+```
+The document itself was still originally written in Markdown with the textbox
+and button added as HTML elements. The code performing the replacement can be
+embedded into the Markdown document or loaded from a static JavaScript file.
+
+```{note}
+This is just some hastily put together JavaScript so it will only work once.
+```
+
+
+<script>
+function replaceUsername()
+{
+    new_username = document.getElementById("utln").value;
+    document.body.innerHTML = document.body.innerHTML.replace("YOUR_UTLN", new_username);
+}
+</script>
diff --git a/source/examples/jupyter-notebook-example.ipynb b/source/examples/jupyter-notebook-example.ipynb
@@ -0,0 +1,31 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Jupyter Notebook Example\n",
+    "\n",
+    "This page was generated from a Jupyter Notebook. All code gets executed when the\n",
+    "page is generated and the outputs are included There is also an option to make\n",
+    "pages containing code interactive by configuring a suitable Binder backend."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import this\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/source/examples/markdown-notebook-example.md b/source/examples/markdown-notebook-example.md
@@ -0,0 +1,20 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+# Markdown Notebook Example
+
+This page was created from a MyST Markdown document with a special YAML metadata
+header and `{code-cell}` blocks. Any code in the aforementioned blocks gets
+executed when the page is generated and the outputs displayed.
+
+```{code-cell}
+import this
+```
diff --git a/source/examples/r-markdown-example.Rmd b/source/examples/r-markdown-example.Rmd
@@ -0,0 +1,17 @@
+---
+jupyter:
+  kernelspec:
+    display_name: Python
+    language: python
+    name: python3
+---
+
+# R Markdown Example
+
+This page is generated from an R Markdown document. Currently only execution
+of Python code is supported as the Rmd document simply gets converted into a
+Jupyter Notebook. But an R backend coud potentially be configured if desired.
+
+```{python}
+import this
+```
diff --git a/source/favicon.ico b/source/favicon.ico
diff --git a/source/index.md b/source/index.md
@@ -0,0 +1,7 @@
+# Tufts University HPC Cluster User Guide
+
+```{caution}
+This website is for demonstration purposes only and does not contain any useable
+documentation. Please see the current user guide hosted on Box for Tufts HPC
+cluster documentation: https://tufts.box.com/HPC-New-User
+```
diff --git a/source/logo.png b/source/logo.png
diff --git a/source/migrated-materials/images/coreNode.png b/source/migrated-materials/images/coreNode.png
diff --git a/source/migrated-materials/images/cpuGpu.png b/source/migrated-materials/images/cpuGpu.png
diff --git a/source/migrated-materials/images/hpcImage.png b/source/migrated-materials/images/hpcImage.png
diff --git a/source/migrated-materials/images/memStore.png b/source/migrated-materials/images/memStore.png
diff --git a/source/migrated-materials/navigate-to-cluster.md b/source/migrated-materials/navigate-to-cluster.md
@@ -0,0 +1,11 @@
+# Accessing the Cluster
+
+## Command Line:
+
+You can access the Tufts HPC Cluster via command line with:
+- The Terminal app on a Mac or Linux machine
+- PuTTy or Cygwin SSH or SecureCRT or other SSH clients on a Windows machine
+
+```sh
+ssh [email protected]
+```
diff --git a/source/migrated-materials/what-is-the-cluster.md b/source/migrated-materials/what-is-the-cluster.md
@@ -0,0 +1,25 @@
+# What is the Cluster?
+
+Before getting to the cluster it is worth discussing what a cluster is and some of the terminology. First, let's discuss the difference between a CPU and a GPU.
+
+## CPU -- Central Processing Unit
+  - A CPU can never be fully replaced by a GPU
+  - Can be thought of as the taskmaster of the entire system, coordinating a wide range of general-purpose computing tasks
+
+## GPU -- Graphics Processing Unit
+  - GPUs were originally designed to create images for computer graphics and video game consoles
+  - Performing a narrower range of more specialized tasks
+
+![](images/cpuGpu.png)
+
+You'll notice that in the picture above the CPU is composed of a smaller unit, a **core**. A core is the computing unit in a CPU. You'll also note that the whole system (including CPUs, GPUs and Storage) is a single computer in the system called a **node**.
+
+![](images/coreNode.png)
+
+When a CPU performs some computation they use a storage hierarchy. This hierarchy places small/fast storage options close to the CPU and slower/larger options away from the CPU. These small/fast options are called **memory/RAM** while the slower/larger options are simply called **storage**.
+
+![](images/memStore.png)
+
+Now that we now the components we can put together an image of what a computer cluster is. A **computer cluster** is a group of loosely or tightly connected computers that work together as a single system. A **HPC (High Performance Compute) cluster** is a computer cluster capable of performing computations at high speeds.
+
+![](images/hpcImage.png)