You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add .with_huggingface_data_dir() method on the HuggingfaceDataLoader to allow modifying the data_dir for some datasets.
Feature motivation
When trying to download "facebook/covost2" using HuggingfaceDataLoader I received this error message:
raise ManualDownloadError(
datasets.exceptions.ManualDownloadError: The dataset covost2 with config fr_en requires manual data.
Please follow the manual download instructions:
Please download the Common Voice Corpus 4 in fr from https://commonvoice.mozilla.org/en/datasets and unpack it with `tar xvzf fr.tar`. Make sure to pass the path to the directory in which you unpacked the downloaded file as `data_dir`: `datasets.load_dataset('covost2', data_dir="path/to/dir")`
Manual data can be loaded with:
datasets.load_dataset("facebook/covost2", data_dir="<path/to/manual/data>")
Unfortunately it seems there is no way to pass in the data_dir to load_dataset right now. I can imagine there is more datasets that will require a similar manual step.
(Optional) Suggest a Solution
.with_huggingface_data_dir() method on the HuggingfaceDataLoader + the python code analogous to the cache_dir.
EDIT: After some more digging, manual download seems to be a thing in a bunch of HF datasets, so this could cover a lot of ground.
The text was updated successfully, but these errors were encountered:
Feature description
Add
.with_huggingface_data_dir()
method on the HuggingfaceDataLoader to allow modifying the data_dir for some datasets.Feature motivation
When trying to download "facebook/covost2" using HuggingfaceDataLoader I received this error message:
Unfortunately it seems there is no way to pass in the data_dir to load_dataset right now. I can imagine there is more datasets that will require a similar manual step.
(Optional) Suggest a Solution
.with_huggingface_data_dir()
method on the HuggingfaceDataLoader + the python code analogous to the cache_dir.EDIT: After some more digging, manual download seems to be a thing in a bunch of HF datasets, so this could cover a lot of ground.
The text was updated successfully, but these errors were encountered: