Commit 686b523
authored
fix: datasets broken import due to HF package and folder name collision (#1730)
This PR resolves the `datasets` import error that were introduced in
#1712. The error causes the following error message:
```Traceback (most recent call last):
File "./torchtitan/train.py", line 16, in <module>
import torchtitan.protocols.train_spec as train_spec_module
File "./torchtitan/__init__.py", line 12, in <module>
import torchtitan.experiments # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "./torchtitan/experiments/__init__.py", line 7, in <module>
import torchtitan.experiments.llama4 # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "./torchtitan/experiments/llama4/__init__.py", line 11, in <module>
from torchtitan.datasets.hf_datasets import build_hf_dataloader
File "./torchtitan/datasets/hf_datasets.py", line 12, in <module>
from datasets import Dataset, load_dataset
ImportError: cannot import name 'Dataset' from 'datasets' (./torchtitan/datasets/__init__.py)
```
Why is this happening? It is because #1712 added an `__init__.py` file
to the `datasets` folder. On the surface the PR looks fine, however, it
causes a collision with HF datasets Python package called `datasets`. So
when we try to import the `Dataset` class, we actually want the HF
datasets package and not the local `datasets` folder. This, in turn,
causes an import error. The solution is simple, change the `__init__.py`
name to something else, in our case I changed it to `common.py` as I
found this name most fitting.1 parent 476a965 commit 686b523
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | | - | |
| 62 | + | |
63 | 63 | | |
0 commit comments