Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: invalid file header with NCBI genomic data, file works in python. #89

Open
pcjentsch opened this issue Jun 3, 2022 · 1 comment

Comments

@pcjentsch
Copy link

pcjentsch commented Jun 3, 2022

This archive opens fine with python's zipfiles.

It is a 39GB so I cannot include a working example easily but if you are inclined to try it:

conda create -n ncbi_datasets
conda activate ncbi_datasets
conda install -c conda-forge ncbi-datasets-cli
datasets download virus genome taxon sars-cov-2 --host human

The output from zipfiles.infolist() in python is

zipf.infolist()
[<ZipInfo filename='README.md' compress_type=deflate filemode='?rw-------' file_size=1                  604 compress_size=769>, <ZipInfo filename='ncbi_dataset/data/data_report.jsonl' compress_type=deflate filemode='?rw-------' file_size=81889507642 compress_size=4292597995>, <ZipInfo filename='ncbi_dataset/data/biosample.jsonl' compress_type=deflate filemode='?rw-------' file_size=7826671566 compress_size=205379661>, <ZipInfo filename='ncbi_dataset/data/cds.fna' compress_type=deflate filemode='?rw-------' file_siz                            e=177621771946 compress_size=11180195822>, <ZipInfo filename='ncbi_dataset/data/genomic.fna' compress_type=deflate filemode='?rw-------' file_size=167811715365 compress_size=13523743233>, <ZipInfo filename='ncbi_dataset/data/protein.faa' compress_type=deflate filemode='?rw-------' file_size=82837067420 compress_size=3110887927>, <ZipInfo filename='ncbi_dataset/data/virus_dataset.md' compress_type=deflate filemode='?rw-------' file_size=2431 compress_size=1057>, <ZipInfo filename='ncbi_dataset/data/dataset_catalog.json' compress_type=deflate filemode='?rw-------' file_size=845 compress_size=321>]

if that is helpful.

@fhs
Copy link
Owner

fhs commented Jul 2, 2022

You may want to try version 0.10.0 which I just released, which has support for reading zip64 files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants