Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JBrowse issue - Visualization of whole-contig alignment to a reference genome #4877

Closed
mrossi1-ilmn opened this issue Mar 10, 2025 · 6 comments

Comments

@mrossi1-ilmn
Copy link

Hi, I am trying to visualize whole-contig alignment to a reference genome, e.g., hg38. One example BAM file is https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_HG002_DraftBenchmark_defrabbV0.019-20241113/dipcall_output/GRCh38_HG2-T2TQ100-V1.1_dipcall-z2k.hap1.bam

I got this error from JBrowse, here is the stack trace:

RangeError: Invalid array length
RangeError: Invalid array length
    at Array.push (<anonymous>)
    at get seq (file:///Applications/JBrowse%202.app/Contents/Resources/app.asar/build/static/js/8309.ea4e7ad7.chunk.js:1:12448)
    at k.get (file:///Applications/JBrowse%202.app/Contents/Resources/app.asar/build/static/js/8309.ea4e7ad7.chunk.js:1:13638)
    at get fields (file:///Applications/JBrowse%202.app/Contents/Resources/app.asar/build/static/js/4770.1c2f1ae1.chunk.js:1:915)
    at f.get (file:///Applications/JBrowse%202.app/Contents/Resources/app.asar/build/static/js/3890.499c7480.chunk.js:14:155497)
    at f.get (file:///Applications/JBrowse%202.app/Contents/Resources/app.asar/build/static/js/4770.1c2f1ae1.chunk.js:1:635)
    at s (file:///Applications/JBrowse%202.app/Contents/Resources/app.asar/build/static/js/3890.499c7480.chunk.js:14:141647)
    at Be.execute (file:///Applications/JBrowse%202.app/Contents/Resources/app.asar/build/static/js/3890.499c7480.chunk.js:1:314255)
@cmdcolin
Copy link
Collaborator

thanks for posting this. so, multi-part thing

Part 0. A workaround

This error does not happen in firefox, so could try that if really needed

However, our desktop app uses Electron which is Chrome based, and has the issue

Part 1. The error you ran into

The error you are seeing is basically trying to allocate a array to create a string bigger than 512Mb. I have written about these in funny blog posts as we often run into these types of things trying to do big data stuff in the browser

https://cmdcolin.github.io/posts/2021-08-15-map-limit
https://cmdcolin.github.io/posts/2021-10-30-spooky

That's because it's making basically the entirety of chromosome 1 get loaded into a memory, as a single string (which is utf16 btw, so two bytes per letter)

So it's basically hitting that limit.

If we just used a Uint8Array, then it can be 1 byte per character, and can be longer than 512Mb. We could possible even make seq a zero-copy view over the bam backing data.

Part 2. Other references

this assembly-assembly alignment BAMs are tricky

another thread from igv forum for same file
igvteam/igv#1520

Part 3. What JBrowse 2 likes to use for assembly-to-assembly alignment

For assembly-to-assembly alignment, JBrowse actually likes PAF. PAF is a good output format for assembly-to-assembly alignment, and can be loaded as a "SyntenyTrack" in jbrowse which allows you to use it in the dotplot and 'synteny' views.

It does not encode the whole SEQ field, so it is much smaller. Note that currently we do not have a way to encode sequence differences with PAF, but if we add "cs" tag support, then this would work #3378

@cmdcolin
Copy link
Collaborator

and note: you can convert this via a chain of BAM->SAM->PAF using paftools :) https://www.biostars.org/p/479287/#9474845

I'll keep this open as there could be something that could help, but in general, i would probably say that BAM/CRAM is not the optimal assembly-to-assembly alignment format...PAF probably preferred. note that the PAF tracks can be loaded in a normal linear genome view also just like a BAM file, so it shouldn't have too much loss of functionality, and once cs tag is added, SNPs can be rendered on it too

@mrossi1-ilmn
Copy link
Author

Hi, @cmdcolin,

Thanks for the quick reply. I am looking into PAF files. Good to know that they can be loaded in the linear genome viewer as well.

I tried to convert the BAM to a PAF, but I probably need to double check if I did all the steps correctly, as I am having some issue visualizing it with the Synteny viewer.

@cmdcolin
Copy link
Collaborator

feel free to let me know what issue you run into. the most common reason that a PAF file "doesn't show up" is because the assembly names are entered backwards (query vs target)

@mrossi1-ilmn
Copy link
Author

Yes, the issue was more that, if the sample assembly is not in the assembly, list, then it selects twice the reference assembly for both query and target. You need to manually change the query assembly in the Settings.

thanks for the support. Feel free to close when the BAM issue is resolved.

@cmdcolin
Copy link
Collaborator

will be tracked in GMOD/bam-js#106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants