Skip to content

jcast is extremely slow / multi-threading #28

@dtenenba

Description

@dtenenba

Hello,

I am trying to help out a user here who is having trouble with jcast. They are running it on our HPC cluster and after 15 days, the job is still not complete. It does not seem to use a lot of memory or CPU but it does make a lot of requests to Uniprot.

Looking at #14 it looks like multithreading was removed because it caused problems with the display of the progress bar. Since we don't care about the progress bar as much as having jcast run faster, I attempted to restore the multithreading functionality in a couple of different ways, but I quickly ran into problems that only the package developers would probably know how to solve.

We also saw output like this:

2026-01-12 13:23:46,833 - jcast.seq - INFO - Sequence not cached locally. Attempting to get from Uniprot: https://www.ebi.ac.uk/proteins/api/proteins/Ensembl:ENSG00000289707?offset=0&size=1&reviewed=false&isoform=0
2026-01-12 13:23:47,427 - jcast.seq - INFO - Retrieved empty fasta from Ensembl for ENSG00000289707.1

And indeed, pasting that URL into a browser returns basically an empty xml file:

<?xml version='1.0' encoding='UTF-8'?><uniprot xmlns="http://uniprot.org/uniprot" xsi:schemaLocation="http://uniprot.org/uniprot http://www.uniprot.org/support/docs/uniprot.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>

Could this indicate a problem with the input data?

Would welcome any advice on speeding up jcast and addressing the problematic Uniprot requests. Happy to provide more information if needed.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions