Skip to content

contamination checks (with or post PAQman) #43

@SAMtoBAM

Description

@SAMtoBAM

One or the primary issues is the database required to search and whether a desired taxid is required for the run.

If it is post-PAQman runs; might just want to wrap up one of the tools to download the database, run the searches etc

blobtools
FCS-GX

e.g. blobtools using the swissprot protein dataset (but want ability to choose between uniref90/100 and swissprot depending on how much size they have for downloading the database)

##download and unzip
wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
gunzip uniprot_sprot.fasta.gz

##make diamond databse
diamond makedb \
  --in uniprot_sprot.fasta \
  -d uniprot_sprot

##search for proteins against assembly
diamond blastx \
  -d uniprot_sprot.dmnd \
  -q assembly.fasta \
  -o diamond.out \
  -f 6 qseqid staxids bitscore \
  --max-target-seqs 1 \
  --evalue 1e-5 \
  --threads 8

##take in the alignment file and diamond to get blobtools analysis
blobtools create \
  -i assembly.fasta \
  -b assembly.bam \
  -t diamond.out \
  -o blobtools

##generate plots and summaries 
blobtools view -i blobtools.blobDB.json
blobtools plot -i blobtools.blobDB.json
blobtools summary -i blobtools.blobDB.json

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions