-
Notifications
You must be signed in to change notification settings - Fork 38
Use ansible-init user for cluster share #866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Image build: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/20135647750 |
|
Linting passed, cancelled other CI. |
|
Weirdly it got stuck on the system users task, and ansible-init user appears to have gid of 100! |
|
Confirmed problem was during fatimage build, on boot above image had: Did some local testing to get above change. Image build: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/20171889414 |
|
For some reason that image build linked above didn't even get the Trying build again in case there was some github wierdness: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/20173380585. Ok checked during build: presumably the runner didn't manage to pull the correct commit for some reason?! |
|
Looks like its got stuck again at the "Add system users" task. Although logging into a CI VM that user at least has the right gid now: So maybe somehow there were two problems?? Last ansible entry in syslog is: this is interesting, not sure I expected to see a homedir: maybe it is using create_home: false, /home being on local disk in the build and an NFS share in the cluster which causes problems? edit: |
|
Above "fix" really needs a new image building too, else it'll behave differently on the next image build but I'm going to let CI run to see if that also hangs ... Image build: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/20178697087 |
|
Ok above CI did get past the point it had hung, so going to push new image .. |
Fixes an issue where sssd and sshd roles fail for compute-init, as their
export.ymltasks write files to the cluster share as only readable by root. As the share is root-squashed these files cannot be retrieved by ansible-init on compute node boot. Other files were written by theslurmuser but this is not considered appropriate for sensitive files.The fix is to create a new "ansible-init" user and to use this for all writes and reads to/from the cluster share.