Skip to content

HPC Cluster Problems #428

Open
Open
@Pas0691

Description

@Pas0691

Hey guys very very cool job so far.

I'm not quite sure if that's a hugh issue, but I wasn't able to find a solution by myself.

Goal: I want to implement a pythoncluster on a Windows HPC Cluster

Installed SW: Windows Server 2012 on the Head, HPC Pack 2016 as managment, and Anaconda for management of python.

What I have done so far: Installed all ipcluster dependencies and made a cluster ( ipcluster start -n 2) working without issues. I did not establish connections to any engines yet. I thought that would minimize fault potentials.

Anyway when I'm trying to use the WindowsHPC controller, The cluster does not start up, but fails with:

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\ipclusterapp.py", line 543, in start_controller
self.controller_launcher.start()
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\launcher.py", line 973, in start
return super(WindowsHPCControllerLauncher, self).start(1)
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\launcher.py", line 914, in start
output = check_output([self.job_cmd] + args,
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['C:\Program Files\Microsoft HPC Pack 2016\Bin\job.EXE', 'submit', '/jobfile:C:\Users\xxx\.ipython\profile_default\ipcontroller_job.xml',
'/scheduler:']' returned non-zero exit status 1.
ERROR:tornado.application:Exception in callback functools.partial(<function IPClusterStart.start..start at 0x000000B31324A670>)
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\tornado\ioloop.py", line 743, in _run_callback
ret = callback()
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\ipclusterapp.py", line 588, in start
self.start_controller()
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\ipclusterapp.py", line 543, in start_controller
self.controller_launcher.start()
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\launcher.py", line 973, in start
return super(WindowsHPCControllerLauncher, self).start(1)
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\site-packages\ipyparallel\apps\launcher.py", line 914, in start
output = check_output([self.job_cmd] + args,
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "C:\ProgramData\Anaconda3\envs\pythoncluster\lib\subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['C:\Program Files\Microsoft HPC Pack 2016\Bin\job.EXE', 'submit', '/jobfile:C:\Users\xxx\.ipython\profile_default\ipcontroller_job.xml',
'/scheduler:']' returned non-zero exit status 1.

I thought about wrong paths, but unfortunatly this wasn't a problem. I guess the problem isn't that big but I couldn't dig to the source. I tried to highlight the most intersting part of the message.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions