Skip to content

Commit

Permalink
upload demo
Browse files Browse the repository at this point in the history
  • Loading branch information
ukukas committed Nov 28, 2023
1 parent f31bf89 commit c8894e3
Show file tree
Hide file tree
Showing 19 changed files with 350 additions and 0 deletions.
26 changes: 26 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: deploy

on:
push:
branches:
- main

deploy-book:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.10.13
uses: actions/setup-python@v2
with:
python-version: 3.10.13
- name: Install Dependencies
run: |
pip install -r requirements.txt
- name: Build Content
run: |
jupyter-book build source
- name: GitHub Pages action
uses: peaceiris/[email protected]
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: source/_build/html
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_build/
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
python==3.10.13
jupyter-book==0.15.1
jupytext==1.15.2
29 changes: 29 additions & 0 deletions source/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
title: "Tufts University HPC Cluster User Guide"
author: Tufts University Research Technology
copyright: "2022"
logo: logo.png

html:
favicon: favicon.ico
use_edit_page_button: true
use_repository_button: true
use_issues_button: true
home_page_in_navbar: false

repository:
url: https://github.com/tuftsrt/hpc-documentation
path_to_book: source
branch: main

sphinx:
recursive_update: true
config:
html_last_updated_fmt: "%b %d, %Y"
html_theme_options:
logo:
text: "HPC Cluster User Guide"
repository_provider: github
nb_custom_formats:
.Rmd:
- jupytext.reads
- fmt: Rmd
15 changes: 15 additions & 0 deletions source/_toc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
format: jb-book
root: index
parts:
- caption: Migrated Materials
chapters:
- file: migrated-materials/what-is-the-cluster
- file: migrated-materials/navigate-to-cluster
- caption: Examples
chapters:
- file: examples/dynamic-command-example
- file: examples/jupyter-notebook-example
- file: examples/markdown-notebook-example
- file: examples/r-markdown-example
- file: examples/alphafold.rst
title: reStructuredText Example
142 changes: 142 additions & 0 deletions source/examples/alphafold.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
.. _backbone-label:

Alphafold
=========

Introduction
~~~~~~~~~~~~
``Alphafold`` is a protein structure prediction tool developed by DeepMind (Google). It uses a novel machine learning approach to predict 3D protein structures from primary sequences alone. The source code is available on `Github`_. It has been deployed in all RCAC clusters, supporting both CPU and GPU.

It also relies on a huge database. The full database (~2.2TB) has been downloaded and setup for users.

Protein struction prediction by alphafold is performed in the following steps:

* Search the amino acid sequence in uniref90 database by jackhmmer (using CPU)
* Search the amino acid sequence in mgnify database by jackhmmer (using CPU)
* Search the amino acid sequence in pdb70 database (for monomers) or pdb_seqres database (for multimers) by hhsearch (using CPU)
* Search the amino acid sequence in bfd database and uniclust30 (updated to uniref30 since v2.3.0) database by hhblits (using CPU)
* Search structure templates in pdb_mmcif database (using CPU)
* Search the amino acid sequence in uniprot database (for multimers) by jackhmmer (using CPU)
* Predict 3D structure by machine learning (using CPU or GPU)
* Structure optimisation with OpenMM (using CPU or GPU)

| For more information, please check:
| Home page: https://github.com/deepmind/alphafold
Versions
~~~~~~~~
- 2.3.0
- 2.3.1

Commands
~~~~~~~~
- run_alphafold.sh

Usage
~~~~~
The usage of Alphafold on our cluster is very straightford, users can create a flagfile containing the database path information::

run_alphafold.sh --flagfile=full_db.ff --fasta_paths=XX --output_dir=XX ...

Users can check its detaied user guide in its `Github`_.

full_db_20230311.ff (for alphafold v3)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Example contents of full_db_20231031.ff for monomer::

--db_preset=full_dbs
--bfd_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--data_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/
--uniref90_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref90/uniref90.fasta
--mgnify_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/mgnify/mgy_clusters_2022_05.fa
--uniref30_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref30/UniRef30_2021_03
--pdb70_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb70/pdb70
--template_mmcif_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/mmcif_files
--obsolete_pdbs_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/obsolete.dat
--hhblits_binary_path=/usr/bin/hhblits
--hhsearch_binary_path=/usr/bin/hhsearch
--jackhmmer_binary_path=/usr/bin/jackhmmer
--kalign_binary_path=/usr/bin/kalign

Example contents of full_db_20231031.ff for multimer::

--db_preset=full_dbs
--bfd_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--data_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/
--uniref90_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref90/uniref90.fasta
--mgnify_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/mgnify/mgy_clusters_2022_05.fa
--uniref30_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniref30/UniRef30_2021_03
--pdb_seqres_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_seqres/pdb_seqres.txt
--uniprot_database_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/uniprot/uniprot.fasta
--template_mmcif_dir=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/mmcif_files
--obsolete_pdbs_path=/cluster/tufts/biocontainers/datasets/alphafold/db_20231031/pdb_mmcif/obsolete.dat
--hhblits_binary_path=/usr/bin/hhblits
--hhsearch_binary_path=/usr/bin/hhsearch
--jackhmmer_binary_path=/usr/bin/jackhmmer
--kalign_binary_path=/usr/bin/kalign


Example job using CPU
~~~~~~~~~~~~~~~~~~~~~
.. warning::
Using ``#!/bin/sh -l`` as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use ``#!/bin/bash`` instead.

.. note::
Notice that since version 2.2.0, the parameter ``--use_gpu_relax=False`` is required.

To run alphafold using CPU::

#!/bin/bash
#SBATCH -p PartitionName # batch or your group's own partition
#SBATCH -t 24:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 10
#SBATCH --mem=64G
#SBATCH --job-name=alphafold
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out

module purge
module load alphafold/2.3.1

run_alphafold.sh --flagfile=full_db_20231031.ff \
--fasta_paths=sample.fasta --max_template_date=2022-02-01 \
--output_dir=af2_full_out --model_preset=monomer \
--use_gpu_relax=False

Example job using GPU
~~~~~~~~~~~~~~~~~~~~~
.. warning::
Using ``#!/bin/sh -l`` as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use ``#!/bin/bash`` instead.

To run alphafold using GPU::

#!/bin/bash
#SBATCH -p PartitionName # gpu or preempt
#SBATCH -t 24:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 10
#SBATCH --mem=64G
#SBATCH --gres=gpu:1
#SBATCH --job-name=alphafold
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out

module purge
module load alphafold/2.3.1

run_alphafold.sh --flagfile=full_db_20231031.ff \
--fasta_paths=sample.fasta --max_template_date=2022-02-01 \
--output_dir=af2_full_out --model_preset=monomer \
--use_gpu_relax=True






.. _Github: https://github.com/deepmind/alphafold/
23 changes: 23 additions & 0 deletions source/examples/dynamic-command-example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Dynamic Command Example

Tufts Username: <input type="text" id="utln" value="" size="10"> <button onclick="replaceUsername()">OK</button>

```bash
ssh [email protected]
```
The document itself was still originally written in Markdown with the textbox
and button added as HTML elements. The code performing the replacement can be
embedded into the Markdown document or loaded from a static JavaScript file.

```{note}
This is just some hastily put together JavaScript so it will only work once.
```


<script>
function replaceUsername()
{
new_username = document.getElementById("utln").value;
document.body.innerHTML = document.body.innerHTML.replace("YOUR_UTLN", new_username);
}
</script>
31 changes: 31 additions & 0 deletions source/examples/jupyter-notebook-example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Jupyter Notebook Example\n",
"\n",
"This page was generated from a Jupyter Notebook. All code gets executed when the\n",
"page is generated and the outputs are included There is also an option to make\n",
"pages containing code interactive by configuring a suitable Binder backend."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import this\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
20 changes: 20 additions & 0 deletions source/examples/markdown-notebook-example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
kernelspec:
display_name: Python 3
language: python
name: python3
---

# Markdown Notebook Example

This page was created from a MyST Markdown document with a special YAML metadata
header and `{code-cell}` blocks. Any code in the aforementioned blocks gets
executed when the page is generated and the outputs displayed.

```{code-cell}
import this
```
17 changes: 17 additions & 0 deletions source/examples/r-markdown-example.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
jupyter:
kernelspec:
display_name: Python
language: python
name: python3
---

# R Markdown Example

This page is generated from an R Markdown document. Currently only execution
of Python code is supported as the Rmd document simply gets converted into a
Jupyter Notebook. But an R backend coud potentially be configured if desired.

```{python}
import this
```
Binary file added source/favicon.ico
Binary file not shown.
7 changes: 7 additions & 0 deletions source/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Tufts University HPC Cluster User Guide

```{caution}
This website is for demonstration purposes only and does not contain any useable
documentation. Please see the current user guide hosted on Box for Tufts HPC
cluster documentation: https://tufts.box.com/HPC-New-User
```
Binary file added source/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/migrated-materials/images/coreNode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/migrated-materials/images/cpuGpu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/migrated-materials/images/hpcImage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/migrated-materials/images/memStore.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions source/migrated-materials/navigate-to-cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Accessing the Cluster

## Command Line:

You can access the Tufts HPC Cluster via command line with:
- The Terminal app on a Mac or Linux machine
- PuTTy or Cygwin SSH or SecureCRT or other SSH clients on a Windows machine

```sh
ssh [email protected]
```
25 changes: 25 additions & 0 deletions source/migrated-materials/what-is-the-cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# What is the Cluster?

Before getting to the cluster it is worth discussing what a cluster is and some of the terminology. First, let's discuss the difference between a CPU and a GPU.

## CPU -- Central Processing Unit
- A CPU can never be fully replaced by a GPU
- Can be thought of as the taskmaster of the entire system, coordinating a wide range of general-purpose computing tasks

## GPU -- Graphics Processing Unit
- GPUs were originally designed to create images for computer graphics and video game consoles
- Performing a narrower range of more specialized tasks

![](images/cpuGpu.png)

You'll notice that in the picture above the CPU is composed of a smaller unit, a **core**. A core is the computing unit in a CPU. You'll also note that the whole system (including CPUs, GPUs and Storage) is a single computer in the system called a **node**.

![](images/coreNode.png)

When a CPU performs some computation they use a storage hierarchy. This hierarchy places small/fast storage options close to the CPU and slower/larger options away from the CPU. These small/fast options are called **memory/RAM** while the slower/larger options are simply called **storage**.

![](images/memStore.png)

Now that we now the components we can put together an image of what a computer cluster is. A **computer cluster** is a group of loosely or tightly connected computers that work together as a single system. A **HPC (High Performance Compute) cluster** is a computer cluster capable of performing computations at high speeds.

![](images/hpcImage.png)

0 comments on commit c8894e3

Please sign in to comment.