Thanks for developing all these really helpful tools for the microbial ecology community!
I would like to annotate Streptococcus pneumoniae genomes for genes involved in bacteriocin (antimicrobial/immunity/regulation). I have download the genes I am interested in in amino acid sequences from UNIPROT and ideally I would like to include annotation from specific database to my gbff or gff files I have downloaded from NCBI.
I am exploring the --proteins flag from prokka to see if it can help me to achieve this objective
I formatted my database according to the instructions:
>A0A384ZZZ3 ~~~sliC~~~~~~
MDENKVIIDLSEKVFAKFDEQLKRYAEQPNYDLLTLSSGLPGLILLSSELTSLTSERKYS
ARTGKYVNFMVKQMRNYGVLSDSLFSGVSGIGISILHLVEEHPEYHNLLISFNEYIKYYT
LSKIENIDIKKISPTDYDIIEGVSGVLVYLLSQEQDENDYIINRIINFLSEFSLKNSTLT
GFYVESKNQMSKTESKLYPLGCLNFGLAHGLAGVGAMLSYSKLKGYSNEKSIAAIKKIIM
LYEKHELKNYMWKEGLSDIELKKTEKSNLQYEFIRDAWCYGSPGISLLYLYSSLALEDKK
LKSKACNILKASIRRSNGLEQSILCHGFSGAIEICLFFKKIYKTTDFDDCIKSLKEKLIS
DFREDMTYGFNTTAEFENIKTKDNLGYLDGIIGILLTMIELNNLKVTTNWQRALLLFDDV
IKEVK
>A0A0H2UNX0 ~~~blpB~~~~~~
MNPNLFRSVEFYQRRYHNYATVLIIPLSLLFTFILIFSLVATKEITVTSQGEIAPTSVIA
SIQSTSDNPILANHLVANQVVEKGDLLIKYSETMEESQKTALATQLQRLEKQKEGLGILK
QSLEKATDLFSGEDEFGYHNTFMNFTKQSHDIELGITKTNTEVSNQANLSNSSSSAIEQE
ITKVQQQIGEYQELRDAIINNRARLPTGNPHQSILNRYLVASQGQTQGTAEEPFLSQINQ
SIAGLESSIASLKIQQAGIGSVATYDNSLATKIEVLRTQFLQTASQQQLTVENQLTELKV
QLDQATQRLENNTLTSPSKGIVHLNSEFEGKNRIPTGTEIAQIFPVITDTREVLITYYVS
SDYLPLLDKGQTVRLKLEKIGNHGTTIIGQLQTIDQTPTRTEQGNLFKLTALAKLSNEDS
KLIQYGLQGRVTSVTTKKTYFDYFKDKILTHSD
and I am surprised to see that only 1 gene was annotated using this custom database:
[13:45:55] Running: prodigal -i prokka\/PROKKA_02052025\.fna -c -m -g 11 -p single -f sco -q
[13:45:56] Found 1984 CDS
[13:45:56] Connecting features back to sequences
[13:45:56] Not using genus-specific database. Try --usegenus to enable it.
[13:45:56] Preparing user-supplied primary BLAST annotation source: protein_sequences_prokka_ready.faa
[13:45:56] Guessed source was in fasta format.
[13:45:56] Running: makeblastdb -dbtype prot -in protein_sequences_prokka_ready\.faa -out prokka\/proteins -logfile /dev/null
[13:45:56] Using /inference source as 'protein_sequences_prokka_ready.faa'
[13:45:56] Annotating CDS, please be patient.
[13:45:56] Will use 3 CPUs for similarity searching.
[13:45:57] There are still 1984 unannotated CDS left (started with 1984)
example from the proteins.tmp.blast file
Query= 259
Length=61
Score E
Sequences producing significant alignments: (Bits) Value
A0A062WQJ3 ~~~cibA 118 2e-39
A0A062WQJ3 ~~~cibA~~~~~~
Length=61
Score = 118 bits (295), Expect = 2e-39, Method: Compositional matrix adjust.
Identities = 61/61 (100%), Positives = 61/61 (100%), Gaps = 0/61 (0%)
Query 1 MTNFDILDNQFLSLSENELSDIDGGLAPLVIFGVAVSWKAIAGGTALIGSGLAAGYFLGG 60
MTNFDILDNQFLSLSENELSDIDGGLAPLVIFGVAVSWKAIAGGTALIGSGLAAGYFLGG
Sbjct 1 MTNFDILDNQFLSLSENELSDIDGGLAPLVIFGVAVSWKAIAGGTALIGSGLAAGYFLGG 60
Query 61 D 61
D
Sbjct 61 D 61
When I blasted the genomic.faa of that particular genomes against my local blast database built from the same .faa file I got quit some significant hits.
Do you see what I am missing here?
Many thanks for your input, suggestions!
Thanks for developing all these really helpful tools for the microbial ecology community!
I would like to annotate Streptococcus pneumoniae genomes for genes involved in bacteriocin (antimicrobial/immunity/regulation). I have download the genes I am interested in in amino acid sequences from UNIPROT and ideally I would like to include annotation from specific database to my gbff or gff files I have downloaded from NCBI.
I am exploring the
--proteinsflag from prokka to see if it can help me to achieve this objectiveI formatted my database according to the instructions:
and I am surprised to see that only 1 gene was annotated using this custom database:
example from the proteins.tmp.blast file
When I blasted the
genomic.faaof that particular genomes against my local blast database built from the same .faa file I got quit some significant hits.Do you see what I am missing here?
Many thanks for your input, suggestions!