Skip to content

Conversation

trz42
Copy link
Collaborator

@trz42 trz42 commented Feb 17, 2025

Renewed version for #918

After easybuilders/easybuild-easyblocks#3516 got merged we need to update the module files for CUDA/12.{1.1,4.0}

We need to do that for the architecture combinations:

  • zen2 + cc80
  • zen3 + cc80
  • zen4 + cc90

For the first two we use the build cluster on AWS. For the third we use the build cluster on Azure. Because CUDA is just a binary installation, this should be fine.

Note, while we only need to rebuild the module files, we cannot use --module-only as EasyBuild argument because the rebuild procedure removes the whole installation.

@trz42 trz42 added 2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia labels Feb 17, 2025
Copy link

eessi-bot bot commented Feb 17, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphire_rapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@riscv-eessi-io-bot
Copy link

Instance eessi-bot-riscv is configured to build for:

  • architectures: riscv64/generic
  • repositories: riscv.eessi.io-20240402

Copy link

eessi-bot bot commented Feb 17, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 17, 2025

Instance eessi-bot-vsc-ugent is configured to build for:

  • architectures: x86_64/amd/zen3
  • repositories: eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat

@eessi-bot-toprichard
Copy link

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@trz42
Copy link
Collaborator Author

trz42 commented Feb 17, 2025

Just give it a try...

bot: build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90
bot: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
bot: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80

Copy link

eessi-bot bot commented Feb 17, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 from trz42

    • expanded format: build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

@riscv-eessi-io-bot
Copy link

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Feb 17, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 from trz42

    • expanded format: build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 resulted in:

  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot-surf
Copy link

Updates by the bot instance eessi-bot-surf (click for details)
  • account trz42 has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 17, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 from trz42

    • expanded format: build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Feb 17, 2025

New job on instance eessi-bot-mc-azure for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_919/1085

date job status comment
Feb 17 12:50:41 UTC 2025 submitted job id 1085 awaits release by job manager
Feb 17 12:50:51 UTC 2025 released job awaits launch by Slurm scheduler
Feb 17 12:56:54 UTC 2025 running job 1085 is running
Feb 17 13:56:16 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-1085.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-1739798555.tar.gzsize: 4373 MiB (4585648527 bytes)
entries: 11757
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.1.1
CUDA/12.4.0
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
no other files in tarball
Feb 17 13:56:16 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-1085.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Feb 18 09:18:47 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen4-1739798555.tar.gz to S3 bucket succeeded

Copy link

eessi-bot bot commented Feb 17, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_919/46531

date job status comment
Feb 17 12:50:42 UTC 2025 submitted job id 46531 awaits release by job manager
Feb 17 12:51:39 UTC 2025 released job awaits launch by Slurm scheduler
Feb 17 12:59:44 UTC 2025 running job 46531 is running
Feb 17 14:01:56 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-46531.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1739798896.tar.gzsize: 4373 MiB (4585650573 bytes)
entries: 11757
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Feb 17 14:01:56 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-46531.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Feb 18 09:18:35 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen3-1739798896.tar.gz to S3 bucket succeeded

Copy link

eessi-bot bot commented Feb 17, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_919/46532

  • test step failed with
    ERROR: failed to load configuration: could not find a configuration entry for the requested system/partition combination: 'BotBuildTests:x86_64_amd_zen2_nvidia_cc80'
    Log file(s) saved in '/tmp/tmp.skuL75P5X3/rfm-0an7_lvk.log'
    ESC[31mERROR: Failed to list ReFrame tests with command: reframe --tag CI --tag 1_node  --nocolor -n EESSI_OSU -n EESSI_LAMMPS --listESC[0m
    
date job status comment
Feb 17 12:50:46 UTC 2025 submitted job id 46532 awaits release by job manager
Feb 17 12:51:37 UTC 2025 released job awaits launch by Slurm scheduler
Feb 17 12:59:42 UTC 2025 running job 46532 is running
Feb 17 14:11:08 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-46532.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1739799110.tar.gzsize: 4373 MiB (4585663906 bytes)
entries: 11757
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
no other files in tarball
Feb 17 14:11:08 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-46532.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Feb 18 09:19:37 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1739799110.tar.gz to S3 bucket succeeded

@trz42
Copy link
Collaborator Author

trz42 commented Feb 17, 2025

All rebuilt successfully.

@trz42 trz42 added the ready-to-deploy Mark a PR as ready to deploy label Feb 17, 2025
@bedroge bedroge added bot:deploy Ask bot to deploy missing software installations to EESSI and removed ready-to-deploy Mark a PR as ready to deploy labels Feb 18, 2025
@eessi-bot-surf
Copy link

Label bot:deploy has been set by user bedroge, but this person does not have permission to trigger deployments

1 similar comment
@eessi-bot-toprichard
Copy link

Label bot:deploy has been set by user bedroge, but this person does not have permission to trigger deployments

@bedroge
Copy link
Collaborator

bedroge commented Feb 18, 2025

Tarballs have been ingested (and I've removed the log files of the old installations).

@bedroge bedroge merged commit 464dcba into EESSI:2023.06-software.eessi.io Feb 18, 2025
49 checks passed
Copy link

eessi-bot bot commented Feb 18, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.02/pr_919/46531', '/project/def-users/SHARED/jobs/2025.02/pr_919/46532'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.02.18

@riscv-eessi-io-bot
Copy link

PR merged! Moved [] to /home/eessibot/shared/trash_bin/EESSI/software-layer/2025.02.18

Copy link

eessi-bot bot commented Feb 18, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.02/pr_919/1085'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.02.18

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 18, 2025

PR merged! Moved [] to /scratch/gent/vo/002/gvo00211/SHARED/trash_bin/EESSI/software-layer/2025.02.18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia bot:deploy Ask bot to deploy missing software installations to EESSI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants