Skip to content

Indexing issue with Biom files? #7

@andrewjmc

Description

@andrewjmc

I am importing Biom files made from Kraken reports (using: https://github.com/smdabdoub/kraken-biom)

I have noticed that in one case counts are not assigned to the correct OTU and a species is missing.

In my Kraken report, I have the following lines

25.56  211687  **3**       G       10509           Mastadenovirus
25.56  **211678**  124056  S       129951            Human mastadenovirus C

Indicating three reads to the Mastadenovirus genus and 211,678 (including subspecies level) to mastadenovirus C. kraken-biom correctly makes the .biom file, with data:

...[2095,0,3.0],[2096,0,211678.0]...

And I confirm that the 2095th and 2096th (0-offset) elements of rows is:

...
{"id": "10509", "metadata": {"taxonomy": ["k__Viruses", "p__", "c__", "o__", "f__Adenoviridae", "g__Mastadenovirus", "s__"]}},
{"id": "129951", "metadata": {"taxonomy": ["k__Viruses", "p__", "c__", "o__", "f__Adenoviridae", "g__Mastadenovirus", "s__Human mastadenovirus C"]}}
...

However, MEGAN6 6.12.5 assigns 211,687 reads to Mastadenovirus and intriguingly, I cannot even uncollapse Mastadenovirus to reveal Human mastadenovirus C.

Nonetheless, Neisseria sicca comes out fine:

30.54  252935  6038    G       482                   Neisseria
 22.06  182723  182723  S       490                     Neisseria sicca

182,731 reads to the species and 6038 to the genus. This is again correctly recorded in the Biom:

[2,0,6038.0],[3,0,182723.0]

Where elements 2 and 3 (0-offset) are indeed the pair we want:

{"id": "482", "metadata": {"taxonomy": ["k__Bacteria", "p__Proteobacteria", "c__Betaproteobacteria", "o__Neisseriales", "f__Neisseriaceae", "g__Neisseria", "s__"]}},{"id": "490", "metadata": {"taxonomy": ["k__Bacteria", "p__Proteobacteria", "c__Betaproteobacteria", "o__Neisseriales", "f__Neisseriaceae", "g__Neisseria", "s__sicca"]}}

I have attached the file in case this helps understand the problem. I have also confirmed the assignments are correct when I read the biom file into R with the biomformat package.

exemplar_biom.txt

Many thanks,

Andrew

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions