bm-tk

Simple pipeline for predicting bacterial base modification in bulk from PacBio HiFi sequencing data with kinetics tags.

Currently the output design is to store the output BAMs alongside the input files, with the prefix jasmine_predict.{input_bam}. We implement a custom check for whether output already exists, and filter any inputs which have output files in the expected location. These will be logged by the pipeline. This behaviour can be disabled by setting --clobber true, which will force prediction to be rerun for all inputs.

Currently, the pipeline will

Filter out BAMs which seem irrelevant by name (contain fail, unassigned, subread, scrap, fibertools_preidct), or which have existing output files.
Filter out any BAMS which do not contain the required kinetics tags (CHECK_KINETICS)
Predict 6mA, 5mC, and 5hmC base modification using jasmine (PREDICT_JASMINE)
Extract modifications to table using a custom perl script (EXTRACT_CALLS). This currently only extracts modifications with a probability > 240 (~0.94). Currently this is a fixed threshold and cannot be changed. This extraction is done by default, but is optional. Disable by setting --extract_calls false.

Installation notes

samtools should be available in the environment you launch the pipeline from.

Running using slurm and either apptainer/singularity or micromamba

This will show how to run the pipeline using either micromamba environments, or singularity/apptainer. In both cases, we create a micromamba environment with nextflow installed. You could install nextflow in a different way, the important element is to ensure that samtools and nextflow are available in the environment that will run nextflow.

If the machines you run the pipeline on do not have internet access, see the later section on running without internet access

Install nextflow

Run

micromamba create -n nextflow nextflow conda samtools

We are installing conda within the environment, as nextflow needs the conda binary to activate and deactivate environments. samtools is installed as each file gets checked for kinetics tags locally, rather than submitted as a job, and so runs in the nextflow environment.

Pull the pipeline

(Optional). Nextflow can take a local copy of the pipeline to run. If your compute nodes have internet access, this step isn't strictly necessary.

nextflow pull apduncan/bm-tk -r v0.1

This will pull the most recent commit to the main branch. You could also specify a version tag (e.g. v0.1) or commit hash (e.g. 7097a95).

Move to directory where you will run the pipeline

Move to whichever directory you want pipeline logs and configuration to be kept in. Unlike many nextflow pipelines, output files will not be in this directory. Output will be in the same location as the input bams.

Customise nextflow.config profile

This step isn't neccessary if you are in our group, the default should work.

nextflow.config specifies profiles which give details for the submission system. It has defaults which work for our group, if you are using this elsewhere you will need to customise this. Take a copy of the default config

curl https://raw.githubusercontent.com/apduncan/bm-tk/refs/heads/main/nextflow.config > nextflow.config

You can either customise or copy the nbi_slurm profile. If you are also using slurm, it should be enough to specify your partition names in the queues fields.

Run pipeline

Activate your nextflow environment

micromamba activate nextflow

Then run the pipeline

nextflow run apduncan/bm-tk \
-profile nbi_slurm \
-work-dir /path/to/scratch \
-with-report \
-r main \
--bams "/glob/to/**/find*.bam"

Do this on a node where it is okay to start long running jobs interactively, or put the above in a batch submission script.

The pipeline should then run and produce your BAMs with predicted methylation.

This defaults to using singularity for execution. It will attempt to fetch the container image from the GitHub container registry automatically. If there is no internet access on the machines runnning these processes, see the later section. Similarly, if you want to use micromamba or equivalents, see the section below.

Using `micromamba`/`mamba`/`conda`

Environments can be managed using micromamba or equivalents instead of containers.

To use micromamba, you can edit the profile in the nextflow.config file to:

Remove singularity.enabled = true from the profile scope
Add conda.enabled = true to the profile scope
Add conda.useMicromamba = true to the profile scope

It will look as follows:

profiles {
    ...
    nbi_slurm {
        conda.enabled = true
        conda.useMicromamba = true
        process {
...

Running without internet access

The main obstacle to running without internet access is that nextflow will not be able to pull the container or create the conda environment. However, we can do that on a node with internet access, then provide the path to the environment.

`singularity` or `apptainer`

All steps in the pipeline run using a single image, so the simplest method is to download this and provide a path to it at the command line. To use apptainer, simply substitute apptainer for singularity in the commands below.

singularity pull bmtk-latest.sif bmtk-ghcr.io/apduncan/bm-tk:latest

The pipeline can then be run with

nextflow run apduncan/bm-tk \
-with-singularity bmtk-latest.sif \
-profile nbi_slurm \
-work-dir /path/to/scratch \
-with-report \
-r main \
--bams "/glob/to/**/find*.bam"

The container path on the command line takes priority over the setting in nextflow.config, so it will use the image you pulled.

`micromamba` or equivalent

To create the environment, run

curl https://raw.githubusercontent.com/apduncan/bm-tk/refs/heads/main/env.yaml > env.yaml && \
micromamba env create -n bmtk --file env.yaml

Find the environment path

> micromamba env list | grep bmtk
bmtk                      /home/user/micromamba/envs/bmtk

Copy that path into the conda = setting of the profile in nextflow.config, e.g. for the nbi_slurm profile:

profiles {
    conda {
        conda.enabled = true
        process.conda = "/home/kam24goz/miniforge3/envs/pbbm"
    }
    nbi_slurm {
        conda.useMicromamba = true
        process {
            conda = "/home/user/micromamba/envs/bmtk"
            executor = 'slurm'
            queue = 'ei-medium'
            memory = '2GB'
            cpus = 2
...

When you are submitting the nextflow pipeline it should use this environment. Be sure to also put export NXF_OFFLINE='true' in your submission scripts, otherwise nextflow will waste much time trying to phone home for updates.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
bin		bin
container		container
stub_data		stub_data
.gitignore		.gitignore
README.md		README.md
env.yaml		env.yaml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bm-tk

Installation notes

Running using slurm and either apptainer/singularity or micromamba

Install nextflow

Pull the pipeline

Move to directory where you will run the pipeline

Customise nextflow.config profile

Run pipeline

Using `micromamba`/`mamba`/`conda`

Running without internet access

`singularity` or `apptainer`

`micromamba` or equivalent

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bm-tk

Installation notes

Running using slurm and either apptainer/singularity or micromamba

Install nextflow

Pull the pipeline

Move to directory where you will run the pipeline

Customise nextflow.config profile

Run pipeline

Using micromamba/mamba/conda

Running without internet access

singularity or apptainer

micromamba or equivalent

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using `micromamba`/`mamba`/`conda`

`singularity` or `apptainer`

`micromamba` or equivalent

Packages