-
Notifications
You must be signed in to change notification settings - Fork 64
{2023.06}[2023a,2023b] rebuild CUDA/* modules to update module files #918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{2023.06}[2023a,2023b] rebuild CUDA/* modules to update module files #918
Conversation
Instance
|
Instance
|
Instance
|
Instance
|
Instance
|
bot: build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
New job on instance
|
New job on instance
|
Force rebuilding CUDA/12.1.1 for |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
New job on instance
|
Latest jobs will fail rebuilding CUDA/12.1.1 with a known error (example below for
|
Force rebuilding CUDA/12.1.1 for zen2 and zen3 and try previous workaround for permission denied issue (reverting #907)... |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
One more... |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
Should be better... |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
Looks better. If rebuilding works, rollback changes related to alternative removal of packages, but keep grep for |
@@ -101,7 +101,7 @@ fi | |||
pr_diff=$(ls [0-9]*.diff | head -1) | |||
|
|||
# if this script is run as root, use PR patch file to determine if software needs to be removed first | |||
if [ $EUID -eq 0 ]; then | |||
if [ $EUID -ne 0 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't look right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is, the comments will need updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just trying to get it working by using an alternative approach that doesn't use fakeroot ;)
Didn't work to remove existing installations. However, CUDA/12.1.1 was identified as an installation to be removed (in addition to CUDA/12.4.0). So this part works now. Opting to open another PR based on the standard way to remove/rebuild a package that also includes the changes to consider CUDA/12.1.1. |
Superseded by #919 |
PR merged! Moved |
PR merged! Moved |
PR merged! Moved |
PR merged! Moved |
After easybuilders/easybuild-easyblocks#3516 got merged we need to update the module files for CUDA/12.{1.1,4.0}
We need to do that for the architecture combinations:
zen2
+cc80
zen3
+cc80
zen4
+cc90
For the first two we use the build cluster on AWS. For the third we use the build cluster on Azure. Because CUDA is just a binary installation, this should be fine.
Note, while we only need to rebuild the module files, we cannot use
--module-only
as EasyBuild argument because the rebuild procedure removes the whole installation.