-
Notifications
You must be signed in to change notification settings - Fork 10
Command Line Configuration
Note: This page is concerned with defining data sets in a benchmark configuration file. For most cases, we suggest using the graphical web interface instead. The web interface generates benchmark configuration files automatically, which are then stored in the setups_generated
directory and can be shared and used on other machines.
This page shows examples for benchmark configuration files that define different types of data sets supported by Teaser. Configuration files should be placed in the setups
directory and have a .yaml
file extension. For detailed information on data set types and evaluation, see Data Sets.
For more examples, we suggest browsing some of the default configuration files in the setups
directory.
After creating your configuration file, benchmark mappers on it using ./teaser.py <my_filename>.yaml
.
For a list of available simulation parameters, see Table of Simulation Parameters.
include:
- base_teaser
teaser:
tests:
my_customized_ecoli_dataset: #Name of your data set
type: simulated_teaser
reference: E_coli.fasta
platform: illumina
simulator: mason
paired: No
read_length: 150
mutation_rate: 0.05
mutation_indel_frac: 0.02
sampling:
enable: Yes
ratio: 0.15
#(Optional) Title of the data set to be shown in reports
title: Dros Test
evaluation:
threshold: 75
Running Teaser using this benchmark configuration will cause the data set to be generated and evaluated.
To important a custom simulation, Teaser requires read file(s) in FASTQ format, and the gold standard file in SAM format. The gold standard file should contain an entry for each read, most importantly having the RNAME and POS fields set to the simulated source position. See Data Sets for more information on how we evaluate simulated data sets.
Teaser does not apply subsampling to custom simulations. Read files will be imported directly.
include:
- base_teaser
teaser:
tests:
#Custom simulation import example 1
custom_se:
type: simulated_custom
reference: E_coli.fasta
paired: No
import_read_files: [/path/to/my/reads.fastq]
import_gold_standard_file: /path/to/my/alignments.sam
#Custom simulation import example 2
custom_pe:
type: simulated_custom
reference: E_coli.fasta
paired: Yes
import_read_files: [/path/to/my/reads1.fastq,/path/to/my/reads2.fastq]
import_gold_standard_file: /path/to/my/alignments.sam
Running Teaser using this benchmark configuration will cause the data sets to be imported and evaluated. The field import_read_files
must be a list of the absolute paths to either one or two FASTQ files (based on the value of the paired
field which may be either Yes
or No
). The field import_gold_standard_file
must be set to the absolute path of the SAM file containing the simulated source positions for each read. Teaser will create copies of these files during the import process.
For real read data, Teaser cannot evaluate correctness of alignments. However, the mapped percentage of reads and performance metrics are available and may be useful for a comparison with results from a simulation. See Data Sets for more information on how we evaluate real data sets.
include:
- base_teaser
teaser:
tests:
#Real data import example 1
#Default. This will calculate the read count to sample
#based on estimated average read length and size of the reference.
real_se:
type: real
reference: E_coli.fasta
paired: No
import_read_files: [/path/to/my/reads.fastq]
#Real data import example 2
#Sample 10000 reads
real_se_use_custom:
type: real
reference: E_coli.fasta
paired: No
import_read_files: [/path/to/my/reads.fastq]
read_count: 10000
#Real data import example 3
#Sampling disabled - all reads will be imported
real_se_use_all:
type: real
reference: E_coli.fasta
paired: No
import_read_files: [/path/to/my/reads.fastq]
sampling: {enable: No}
#Real data import example 4
#Paired-end example
real_pe:
type: real
reference: E_coli.fasta
paired: Yes
import_read_files: [/path/to/my/reads_1.fastq,/path/to/my/reads_2.fastq]
Running Teaser using this benchmark configuration will cause the real read data sets to be imported and evaluated. Set the type
field of the data set to real
, the import_read_files
to a list containing the absolute paths to either one or two FASTQ files (the paired
field must be set to Yes
or No
accordingly). By default, Teaser will automatically sample a number of reads from the input.