-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
504 Gateway Timeout when uploading large dataset to Hugging Face Hub #7400
Comments
I transferred to the Another solution @hotchpotch if you want to get your dataset pushed to the Hub in a robust way is to save it to a local folder first and then use |
There is no retry mechanism for datasets/src/datasets/arrow_dataset.py Line 5372 in de062f0
|
Thank you! I believe that to use load_dataset() to read data from Hugging Face, we need to first save the markdown metadata and parquet files in our local filesystem, then upload them using upload-large-folder. If you know how to do this, could you please let me know? |
I see, so adding a retry mechanism there would solve it. If I continue to have issues, I'll consider implementing that kind of solution. |
Description
I encountered consistent 504 Gateway Timeout errors while attempting to upload a large dataset (approximately 500GB) to the Hugging Face Hub. The upload fails during the process with a Gateway Timeout error.
I will continue trying to upload. While it might succeed in future attempts, I wanted to report this issue in the meantime.
Reproduction
dataset.push_to_hub()
methodEnvironment Information
Full Error Traceback
The text was updated successfully, but these errors were encountered: