You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is well know that multiprocessing module has some severe issues in jupyterlab on windows. Unfortunately, multiprocess solve a limited case in jupyterlab on windows at the moment.
Below I provide several cases for discussion. code is put in a single jupyterlab cell to run.
def foo(x):
return x
def bar(z):
return [foo(z)]
from multiprocess import Pool
with Pool(2) as p:
print(p.map(bar,[1,2]))
run it pop up a message
---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\qq\anaconda3\lib\site-packages\multiprocess\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\qq\anaconda3\lib\site-packages\multiprocess\pool.py", line 48, in mapstar
return list(map(*args))
File "<ipython-input-14-39de361cd185>", line 5, in bar
NameError: name 'foo' is not defined
"""
The above exception was the direct cause of the following exception:
NameError Traceback (most recent call last)
<ipython-input-14-39de361cd185> in <module>
7 from multiprocess import Pool
8 with Pool(2) as p:
----> 9 print(p.map(bar,[1,2]))
~\anaconda3\lib\site-packages\multiprocess\pool.py in map(self, func, iterable, chunksize)
362 in a list that is returned.
363 '''
--> 364 return self._map_async(func, iterable, mapstar, chunksize).get()
365
366 def starmap(self, func, iterable, chunksize=None):
~\anaconda3\lib\site-packages\multiprocess\pool.py in get(self, timeout)
769 return self._value
770 else:
--> 771 raise self._value
772
773 def _set(self, i, obj):
NameError: name 'foo' is not defined
apparently, multiprocess can not recognize the foo called in bar.
As suggested https://stackoverflow.com/a/16891169/1911722, cloudpickle is "able to pickle a function, method, class, or even a lambda, as well as any dependencies." Let us try it
2nd case:
import cloudpickle
def foo(x):
return x
def bar(z):
return [foo(z)]
x = cloudpickle.dumps(bar)
del foo
del bar
import pickle
f = pickle.loads(x)
print(f(3))
from multiprocess import Pool
with Pool(2) as p:
print(p.map(f,[1,2]))
it outputs
[3]
[[1], [2]]
First, print(f(3)) print correct result. It seems cloudpickle is "pickling" those dependencies quite well.
Second, the p.map print correct result.
At this moment, I almost thought that cloudpickle is a perfect tool to workaround the limitation of multiprocess. But let us go on
3rd case:
import cloudpickle
def h(x):
return [x]
def foo(x):
return h(x)
def bar(z):
return [foo(z)]
x = cloudpickle.dumps(bar)
del foo
del bar
del h
import pickle
f = pickle.loads(x)
print(f(3))
from multiprocess import Pool
with Pool(2) as p:
print(p.map(f,[1,2]))
Now bar calls foo, foo calls h, so that is a chain of 3 functions.
You will notice that print(f(3)) still gives correct result which suggest cloudpickle is stilling pickling well.
But p.map got error message.
---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\qq\anaconda3\lib\site-packages\multiprocess\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\qq\anaconda3\lib\site-packages\multiprocess\pool.py", line 48, in mapstar
return list(map(*args))
File "<ipython-input-2-4643d366747a>", line 10, in bar
File "<ipython-input-2-4643d366747a>", line 7, in foo
NameError: name 'h' is not defined
"""
The above exception was the direct cause of the following exception:
NameError Traceback (most recent call last)
<ipython-input-2-4643d366747a> in <module>
22 from multiprocess import Pool
23 with Pool(2) as p:
---> 24 print(p.map(f,[1,2]))
~\anaconda3\lib\site-packages\multiprocess\pool.py in map(self, func, iterable, chunksize)
362 in a list that is returned.
363 '''
--> 364 return self._map_async(func, iterable, mapstar, chunksize).get()
365
366 def starmap(self, func, iterable, chunksize=None):
~\anaconda3\lib\site-packages\multiprocess\pool.py in get(self, timeout)
769 return self._value
770 else:
--> 771 raise self._value
772
773 def _set(self, i, obj):
NameError: name 'h' is not defined
p.map can not find the definition of h.
Conclusion
From the above several cases, it seems that cloudpickle indeed pickles function and its dependencies well. But multiprocess has some problems.
without cloudpickle, multiprocess does not support a chain of two functions.
with cloudpickle, multiprocess does not support chains of functions over three.
But it seems promising to me that if multiprocess is properly combined with cloudpickle, it will solve all the problems in jupyterlab on windows.
The text was updated successfully, but these errors were encountered:
It is well know that multiprocessing module has some severe issues in jupyterlab on windows. Unfortunately, multiprocess solve a limited case in jupyterlab on windows at the moment.
Below I provide several cases for discussion. code is put in a single jupyterlab cell to run.
Some of my packages version are:
jupyterlab 3.0.14
multiprocess 0.70.12.2
cloudpickle 1.6.0
1st case:
run it pop up a message
apparently, multiprocess can not recognize the
foo
called inbar
.As suggested https://stackoverflow.com/a/16891169/1911722, cloudpickle is "able to pickle a function, method, class, or even a lambda, as well as any dependencies." Let us try it
2nd case:
it outputs
First,
print(f(3))
print correct result. It seems cloudpickle is "pickling" those dependencies quite well.Second, the
p.map
print correct result.At this moment, I almost thought that cloudpickle is a perfect tool to workaround the limitation of
multiprocess
. But let us go on3rd case:
Now
bar
callsfoo
,foo
callsh
, so that is a chain of 3 functions.You will notice that
print(f(3))
still gives correct result which suggestcloudpickle
is stilling pickling well.But
p.map
got error message.p.map
can not find the definition ofh
.Conclusion
From the above several cases, it seems that
cloudpickle
indeed pickles function and its dependencies well. But multiprocess has some problems.But it seems promising to me that if multiprocess is properly combined with cloudpickle, it will solve all the problems in jupyterlab on windows.
The text was updated successfully, but these errors were encountered: