-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest dill release raises exception #4379
Comments
Fixed by: |
Just an additional insight, the latest dill (either 0.3.5 or 0.3.5.1) also broke the hashing/fingerprinting of any mapping function. For example:
Returns the standard non-dillable error:
|
@albertvillanova ExamplesTests.test_run_speech_recognition_seq2seq is in which file? |
Thanks a lot @gugarosa for the insight: we will incorporate it in our CI as regression testing for future dill releases. |
I did a deep dive into @gugarosa's problem and found the issue and it might be related to the one @sgugger discovered. In dill 0.3.5(.1), I created a new datasets/src/datasets/utils/py_utils.py Lines 607 to 678 in 95193ae
Ah. I see what is happening. I guess a different copy of the function code is needed that sorts the global variables by name. if dill.__version__.split('.') < ['0', '3', '5']:
# current save_function code inside here
else:
# new save_function code inside here with the following line inserted after creating the globals
globs = {k: globs[k] for k in sorted(globs.keys())} Will look into the test case @sgugger pointed out after that and verify if this is causing the problem. I am actually looking into rewriting the global variables code in uqfoundation/dill#466 and will keep this in mind and will try to create an easy way to modify the global variables in dill 0.3.6 (for example, sort them by key like datasets does). |
Thanks a lot for your investigation @anivegesana. Yes, we copied-pasted the old However, this function has changed a lot from version 0.3.5, after your PR (thank you for the fix in recursiveness, indeed): We have to address this change. If finally your PR to sort global variables is merged into dill 0.3.6, that will make our life easier, as the tweak will no longer be necessary. ;) I have included a regression test so that we are sure future releases of dill do not break |
I should note that because Python 3.6 and older are now deprecated and Python 3.7 has insertion order dictionaries, the globals in dill will have a deterministic order, just not sorted. I would still keep it sorted like you have it to help with stability (for example, if someone reorders variables in a file, then sorting the globals would not invalidate the cache.) It seems that the order is not quite deterministic in IPython. Huggingface datasets seems to do well in Jupyter regardless, so it is not a good idea to remove the sorting. uqfoundation/dill#19 |
Describe the bug
As reported by @sgugger, latest dill release is breaking things with Datasets.
The text was updated successfully, but these errors were encountered: