-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression from 0.3.4 to 0.3.5.x - class decorators #483
Comments
I don't have a Windows computer and I am a little bit hesitant about running a potential fork bomb on a computer that isn't mine. Looking into VMs. In the meantime, can you send the output of the dumps line in dill.detect.trace(True)
c = dill.dumps(ClassTest1.f)
dill.loads(c) with both dill 0.3.4 and 0.3.5.1 and whether dumps or loads is causing the infinite loop. If dumps succeeds, can you share |
Dear anivegesana |
Dear anivegesana,
thanks for Your effort |
@anivegesana: my build and test boxes are (docker instances inside) VMs. FYI. |
Dear anivegesana,
under dill 0.3.4 - it works. under dill 0.3.5.1, the same output, but looping until timeout (of course with different object adresses in each loop)
thanks for Your effort |
That output is very strange. It is matching, which means that dill is not going through infinite regress, multiprocess is just executing the same code over and over. It is even more bizarre that it only effects dill. |
Dear anivegesana, I pinned down the Dill Version to 0.3.4 in the Yours sincerely |
I am still having trouble reproducing this bug. I am getting the error you described when you pickled the function. I'll try other Windows computers, but it might be dependent on the environment and I might need to try it on a CI which will be incredibly painful to debug. Python 3.7.11 [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> from wrapt_timeout_decorator import timeout
>>> class ClassTest1(object):
... def __init__(self) -> None:
... self.x = 3.0
... @timeout("instance.x/10", dec_allow_eval=True) # type: ignore
... def f(self) -> None:
... time.sleep(0.5)
>>> ClassTest1().f()
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
ClassTest1().f()
File "wrapt_timeout_decorator-1.3.12-py3-none-any.whl\wrapt_timeout_decorator\wrapt_timeout_decorator.py", line 139, in wrapper
return wrapped_with_timeout(wrap_helper)
File "wrapt_timeout_decorator-1.3.12-py3-none-any.whl\wrapt_timeout_decorator\wrapt_timeout_decorator.py", line 148, in wrapped_with_timeout
return wrapped_with_timeout_process(wrap_helper)
File "wrapt_timeout_decorator-1.3.12-py3-none-any.whl\wrapt_timeout_decorator\wrapt_timeout_decorator.py", line 162, in wrapped_with_timeout_process
return timeout_wrapper()
File "wrapt_timeout_decorator-1.3.12-py3-none-any.whl\wrapt_timeout_decorator\wrap_function_multiprocess.py", line 41, in __call__
self.__process.start()
File "multiprocess-0.70.13-py37-none-any.whl\multiprocess\process.py", line 112, in start
self._popen = self._Popen(self)
File "multiprocess-0.70.13-py37-none-any.whl\multiprocess\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "multiprocess-0.70.13-py37-none-any.whl\multiprocess\context.py", line 322, in _Popen
return Popen(process_obj)
File "multiprocess-0.70.13-py37-none-any.whl\multiprocess\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "multiprocess-0.70.13-py37-none-any.whl\multiprocess\reduction.py", line 63, in dump
ForkingPickler(file, protocol, *args, **kwds).dump(obj)
File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 620, in dump
StockPickler.dump(self, obj)
File "C:\Program Files\Python\lib\pickle.py", line 437, in dump
self.save(obj)
File "C:\Program Files\Python\lib\pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv)
File "C:\Program Files\Python\lib\pickle.py", line 662, in save_reduce
save(state)
File "C:\Program Files\Python\lib\pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 1251, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "C:\Program Files\Python\lib\pickle.py", line 859, in save_dict
self._batch_setitems(obj.items())
File "C:\Program Files\Python\lib\pickle.py", line 885, in _batch_setitems
save(v)
File "C:\Program Files\Python\lib\pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Program Files\Python\lib\pickle.py", line 774, in save_tuple
save(element)
File "C:\Program Files\Python\lib\pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv)
File "C:\Program Files\Python\lib\pickle.py", line 662, in save_reduce
save(state)
File "C:\Program Files\Python\lib\pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 1251, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "C:\Program Files\Python\lib\pickle.py", line 859, in save_dict
self._batch_setitems(obj.items())
File "C:\Program Files\Python\lib\pickle.py", line 885, in _batch_setitems
save(v)
File "C:\Program Files\Python\lib\pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv)
File "C:\Program Files\Python\lib\pickle.py", line 638, in save_reduce
save(args)
File "C:\Program Files\Python\lib\pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Program Files\Python\lib\pickle.py", line 774, in save_tuple
save(element)
File "C:\Program Files\Python\lib\pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv)
File "C:\Program Files\Python\lib\pickle.py", line 633, in save_reduce
save(cls)
File "C:\Program Files\Python\lib\pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 1840, in save_type
)), obj=obj, postproc_list=postproc_list)
File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 1140, in _save_with_postproc
pickler.save_reduce(*reduction, obj=obj)
File "C:\Program Files\Python\lib\pickle.py", line 638, in save_reduce
save(args)
File "C:\Program Files\Python\lib\pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Program Files\Python\lib\pickle.py", line 789, in save_tuple
save(element)
File "C:\Program Files\Python\lib\pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 1251, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "C:\Program Files\Python\lib\pickle.py", line 859, in save_dict
self._batch_setitems(obj.items())
File "C:\Program Files\Python\lib\pickle.py", line 885, in _batch_setitems
save(v)
File "C:\Program Files\Python\lib\pickle.py", line 524, in save
rv = reduce(self.proto)
NotImplementedError: object proxy must define __reduce_ex__() |
Based on GrahamDumpleton/wrapt#158, I believe this is likely mostly a conscious choice by the wrapt author as opposed to a problem in dill or multiprocess. multiprocess should not spin into an infinite loop and silently fail in very strange corner cases, so I believe that this issue should still be left up or moved over to multiprocess in case we can find out what is causing that strange behavior. They created the change to wrapt that disabled pickling of ObjectProxy objects in this commit: GrahamDumpleton/wrapt/13c55faf7d3e83e184b8a150f0df3de294698b25 So it is still a mystery how the new version of dill changed behavior for this error to bubble up now and why it was ignored earlier. Also, what was the monkey patch you referenced in bitranox/wrapt_timeout_decorator#13 (comment)? |
Dear anivegesana,
which trouble ? Its kind of hard to debug in a subprocesses, but I think You should concentrate to debug multiprocess & pickle together, instead of only pickle the function itself. This also did not work in the past.
#1 Yes, this pickle error happens also with #2 it works under linux, also with dill '0.3.5' - so You might concentrate on the differences between linux and win32 inplementation ? There should be some #3 You migh also check which change between #4 the error
I dont remember - I did not tinker around with I really dont like mysteries ;-), it means there is a hidden bug in the code .... which should be found, even if it is "hard to debug" You have all the tools on hand to find that bug - just by replay the changes between 0.3.4 and 0.3.5 - Your argument "I dont have a windows machine" seems somehow a little bit odd in times of virtual machines nowadays ... please be so kind and consider to help here, because a number of astronomers rely on that package. yours sincerely Robert |
Well, the mystery is indeed what is causing the strange behavior. The root cause is unknown, and it could be a bug in So step (1) is make sure whatever issue is being reported can be reproduced. Then step (2) is try to use diagnostic tools to find what the root cause is. |
I did not intend to make it seem that your problem is not a problem. In fact, I wanted to imply that there are likely multiple problems in multiple libraries that makes it very hard to diagnose and we should try to break it up into smaller pieces. All I meant by not being able to reproduce it was that on some Windows computers it has one behavior and on other Windows computers it has another, which is very bad and is the first problem that should be tackled. It is impossible to get anywhere else without consistency. I do have access to VMs and am trying it on different VM platforms, hoping to find a difference that is causing the behavior.
The most likely reason was that a change in dill broke a monkey patch that multiprocess made for Windows, but it breaks it differently based on other properties of the environment depending on what that patch does. I think it is in multiprocess because the other libraries are platform agnostic and there shouldn't be a reason that it performs differently on different machines with the same OS. |
One thing I noticed was that you ran your test that passed on Windows with Python 3.9 and the one that failed on Windows with Python 3.10. I tried it out on Python 3.7. That might be a confounding factor here. When I find time, I'll try it out on different Python versions and see if that is the cause of the varying results. |
Dear,mmckerns,
Everything absolutely True, but
I disagree here - tests on windows are absolutely necessary, because many things work different on windows - especially forking for
I just updated the tests from
Sure - if I came across rude, please accept my sincere apologies - please consider the German/Austrian mentality, which is abolutely straight forward. (well Austrians a bit less the Germans ...)
Sounds good. You can let all the components be the same, just change
greetings from Vienna, Austria, yours sincerely Robert |
If the only difference is the dill version and it works on some Windows computers and not others, I would assume that the wait has something to do with it. I tried it out on a big VM with 10 CPU cores and GitHub actions probably has just 2. I was to throw a wild guess, I would assume that it is related to #482 where a bug fix to dill's handling of recursive functions (#443) caused deeper searches for objects to pickle. Does adding dill.settings['recurse'] = True to the beginning of the test help? I am at a loss though, and the only way that I could hope to help is by recreating the entire setup on GitHub Actions (as opposed to Azure/others like I am trying now) and run your tests there. |
What specifically are you referring to? I know the |
I haven't look into the |
Dear anivegesana, adding |
The mods to @bitranox: Can you specify a matrix of versions of python, |
Dear,mmckerns, I guess to roll back the changes gradually from 3.0.5 to 3.0.4 should be the easiest way ! yours sincerely |
@bitranox: can you generate a matrix, even if it's easy. I haven't seen a quote on the versions of |
Dear anivegesana, Dear,mmckerns, I really try to make it easy for You folks, but lets concentrate on something that looks more beneficial:
Yours sincerely |
Dear anivegesana, |
Ok. Not expected. Most other issues from this week were concerning that PR. That narrows it down significantly. How about e2831d0? |
d2f916c NOT working |
Ok. It is somewhere between. 23c4745? |
23c4745 NOT working |
5bd56a8 NOT working |
Last one: 914d47f? Then we know for sure. |
914d47f NOT Working |
The parent of that is e2831d0. So I know what to look for now. Will get back to you with a fix. Thank you for your patience and help with debugging this! |
Yes, double checked, e2831d0 IS working. |
Ok. I found out that running the test suite with PyTest instead of just running the test manually lets me reproduce your error. This is not an issue with your library or multiprocess, it is an issue between dill and PyTest like #482. The error message I was getting back up here was because the console was implicitly Now that I understand multiprocess and this issue, I can be more useful. |
Dear anivegesana, |
Yes, we should expect |
And thank you both for being patient. This was a very frustrating bug that kept slipping out from underneath my nose due to all of the moving parts and because I kept searching in the wrong places. |
I don't think I have permissions to do that. If it would be more convenient for me to do that because of regular contributions, I believe that the role is triage: It is hopefully broad enough to allow contributors to help with managing the repo while not being expressive enough to cause a dominictarr/event-stream situation because there is no way to change the code on master without going through you. |
dear @anivegesana ,
I am the author of the wrapt_timeout_decorator
since version 0.3.5 of dill, two of our tests are failing under windows.
In those tests we use our decorator to decorate some Class Methods, which are then called in a process via
multiprocess
(which is using implicitely dill).(see this test )
Those tests are just fine with version
0.3.4
on windows/linux/macos, or0.3.5.x
on linux/macos, but the fail with dill >0.3.4
on windows.What seems to happen is, that the decorator recursively seem to call itself, creating a chain of subprocesses.
In order to debug that behaviour You might set the timeout to 3600 seconds in that tests and print some debug information to
stderr
.I really struggle to make a minimal version to show that regression, therefore I would please ask You for help,
in order to be able to use dill 0.3.5 upwards for my community.
you can find the failed jobs with dill 0.3.5.1 here
and the same tests passing with dill 0.3.4.0 here
yours sincerely
Robert, Vienna
The text was updated successfully, but these errors were encountered: