Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex & multiprocessing #8

Open
parisni opened this issue Mar 23, 2017 · 5 comments
Open

Regex & multiprocessing #8

parisni opened this issue Mar 23, 2017 · 5 comments

Comments

@parisni
Copy link

parisni commented Mar 23, 2017

Hi,

I encounter a problem with multi-process and sdgen Regex (while rstr.xeger works fine ):
in above example, main() does not work, while main2() works fine, and main3 too
http://pastebin.com/zmMrHaEQ

Maybe some insight here (about pickable concept):
http://stackoverflow.com/questions/8804830/python-multiprocessing-pickling-error
https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled

Thanks !

@Dubrzr
Copy link
Member

Dubrzr commented Mar 24, 2017

This isn't a problem in Dsfaker... Lambda cannot be pickled. But Xeger uses some lambdas, see: https://bitbucket.org/leapfrogdevelopment/rstr/src/a814f13cca0ebc2f5a82435760bd486d05551528/rstr/xeger.py?at=default&fileviewer=file-view-default#xeger.py-26

@parisni
Copy link
Author

parisni commented Mar 24, 2017

this version of the code based on xeger works well
http://pastebin.com/mAC08qic
that is why this looks like a dsfaker responsability

@Dubrzr
Copy link
Member

Dubrzr commented Mar 24, 2017

Nope it doesn't:

REGEX = [
    rstr.xeger("a"),
    rstr.xeger("a"),
    rstr.xeger("a"),
    rstr.xeger("a")
    ]

-> Here you are creating a list of strings, not a list of functions, its same as REGEX = ["a", "a", "a", "a"]

But now let's try it the right way:

REGEX = [
    rstr.Rstr(Random(1)),
    rstr.Rstr(Random(1)),
    rstr.Rstr(Random(1)),
    rstr.Rstr(Random(1)),
]
def run_regex(r):
  return r.xeger('a')

with concurrent.futures.ProcessPoolExecutor() as executor:
  print(list(executor.map(run_regex, REGEX)))

Traceback (most recent call last):
  File "/home/jdu/apps/anaconda3-4.3.0/lib/python3.6/multiprocessing/queues.py", line 241, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/jdu/apps/anaconda3-4.3.0/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'Xeger.__init__.<locals>.<lambda>'

So we have the same error and no Dsfaker code is used...

@Dubrzr
Copy link
Member

Dubrzr commented Mar 24, 2017

I've found a similar issue here: https://bugs.python.org/issue29517
Maybe try with Python 3.5

@parisni
Copy link
Author

parisni commented Mar 24, 2017

using 3.5 didn't help.

threadPool works. But I guess processPool is best for this use case.

Then, maybe waiting for Xeger to improve his package (maybe an issue could help), I will distinguish in my package jobs that are process friendly and other. And run the later as single processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants