-
Notifications
You must be signed in to change notification settings - Fork 1
Genome assembly exercise
During the lecture we have discussed the difference between short- and long-read DNA/RNA sequencing approaches, and how they are applied to solve different assembly problems. In this practical exercise you will work with the assembly of a bacterial genome and explore the effect of different k-mer sizes on the assembly result.
- Read all instructions carefully before you start the exercise.
- Make sure you have an account on the computer cluster Albiorix. Usernames and passwords will be passed out before we begin. In the examples below I use the username
studentX
but you should use the one that has been assigned to you, so please make sure you update all commands with the correct username. - Run the exercise in two terminal tabs, one where you logg into Albiorix and one for accessing the files on your local computer.
- Install the program Bandage on your local computer.
- Start by login in to Albiorix
- When you first log in you will be located in your home directory. However, for this exercise you should move to a different part of the filesystem.
cd /nobackup/data18/Assembly_exercise/studentX
- This directory contains a script that can be send to the queue system in order to run the assembly analysis. Open this file in your favorite text editor (vim, nano,...) and look at the content. Ask a teacher for help if you need help interpreting the content of the file, but make sure you understand what each line of code does. Also carefully read the comments in the file.
nano runMegahit.sge
-
Set a value for the
KMERE
andSAMPLE
variables. TheKMERE
variable sets the k-mer size used by the assembly algorithm. In this exercise we will explore the effect of the k-mer size on the assembly result. TheSAMPLE
variable is used to give the output files unique names that also reflects which k-mer size was used for the analysis. -
Run the analysis by submitting your script to the computer cluster queue system.
qsub runMegahit.sge
-
You can monitor the state of your analysis using the command
qstat
, and by looking at the content of the output files. Your analysis will disappear from the list once it has finished. -
After the analysis has finished you will find a new file ending with
.fastg
in the directory where you ran your analysis from. Copy this file to your local computer using this command (remember to run this command from the second tab in your terminal, the one you can access your local filesystem from):
rsync -hav [email protected]:/nobackup/data18/Assembly_exercise/studentX/*.fastg .
- Open the file in
Bandage
and pressDraw Graph
to look at the assembly graph from your analysis. We know that the data originates from a single circular chromosome so we would expect to see one circular edge in our graph. Try running the analysis again using a different k-mer size to see if you can improve the result.
- What effect does the k-mer size have on the number of contigs produced?
- What effect does it have on the overall size of the assembly?