Skip to content

AWS ParallelCluster v2.6.1

Choose a tag to compare

@tilne tilne released this 09 Apr 23:04
· 1 commit to release-2.6 since this release

We're excited to announce the release of AWS ParallelCluster 2.6.1.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Improved the management of SQS messages and retries to speed-up recovery times when failures occur.

CHANGES

  • Do not launch a replacement for an unhealthy or unresponsive node until this is terminated. This makes cluster slower at provisioning new nodes when failures occur but prevents any temporary over-scaling with respect to the expected capacity.
  • Increase parallelism when starting slurmd on compute nodes that join the cluster from 10 to 30.
  • Reduce the verbosity of messages logged by the node daemons.
  • Do not dump logs to /home/logs when nodewatcher encounters a failure and terminates the node. CloudWatch can be used to debug such failures.
  • Reduce the number of retries for failed REMOVE events in sqswatcher.

BUG FIXES

  • Fixed a bug in the ordering and retrying of SQS messages that was causing, under certain circumstances of heavy load, the scheduler configuration to be left in an inconsistent state.
  • Delete from queue the REMOVE events that are discarded due to hostname collision with another event fetched as part of the same sqswatcher iteration.

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192