Description
Hey guys very very cool job so far.
I'm not quite sure if that's a hugh issue, but I wasn't able to find a solution by myself.
Goal: I want to implement a pythoncluster on a Windows HPC Cluster
Installed SW: Windows Server 2012 on the Head, HPC Pack 2016 as managment, and Anaconda for management of python.
What I have done so far: Installed all ipcluster dependencies and made a cluster ( ipcluster start -n 2) working without issues. I did not establish connections to any engines yet. I thought that would minimize fault potentials.
Anyway when I'm trying to use the WindowsHPC controller, The cluster does not start up, but fails with:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\ipclusterapp.py", line 543, in start_controller
self.controller_launcher.start()
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\launcher.py", line 973, in start
return super(WindowsHPCControllerLauncher, self).start(1)
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\launcher.py", line 914, in start
output = check_output([self.job_cmd] + args,
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['C:\Program Files\Microsoft HPC Pack 2016\Bin\job.EXE', 'submit', '/jobfile:C:\Users\xxx\.ipython\profile_default\ipcontroller_job.xml',
'/scheduler:']' returned non-zero exit status 1.
ERROR:tornado.application:Exception in callback functools.partial(<function IPClusterStart.start..start at 0x000000B31324A670>)
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\tornado\ioloop.py", line 743, in _run_callback
ret = callback()
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\ipclusterapp.py", line 588, in start
self.start_controller()
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\ipclusterapp.py", line 543, in start_controller
self.controller_launcher.start()
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\launcher.py", line 973, in start
return super(WindowsHPCControllerLauncher, self).start(1)
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\launcher.py", line 914, in start
output = check_output([self.job_cmd] + args,
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['C:\Program Files\Microsoft HPC Pack 2016\Bin\job.EXE', 'submit', '/jobfile:C:\Users\xxx\.ipython\profile_default\ipcontroller_job.xml',
'/scheduler:']' returned non-zero exit status 1.
I thought about wrong paths, but unfortunatly this wasn't a problem. I guess the problem isn't that big but I couldn't dig to the source. I tried to highlight the most intersting part of the message.