Skip to content

Conversation

trz42
Copy link
Collaborator

@trz42 trz42 commented Feb 15, 2025

After easybuilders/easybuild-easyblocks#3516 got merged we need to update the module files for CUDA/12.{1.1,4.0}

We need to do that for the architecture combinations:

  • zen2 + cc80
  • zen3 + cc80
  • zen4 + cc90

For the first two we use the build cluster on AWS. For the third we use the build cluster on Azure. Because CUDA is just a binary installation, this should be fine.

Note, while we only need to rebuild the module files, we cannot use --module-only as EasyBuild argument because the rebuild procedure removes the whole installation.

@trz42 trz42 added 2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia labels Feb 15, 2025
Copy link

eessi-bot bot commented Feb 15, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphire_rapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@riscv-eessi-io-bot
Copy link

Instance eessi-bot-riscv is configured to build for:

  • architectures: riscv64/generic
  • repositories: riscv.eessi.io-20240402

Copy link

eessi-bot bot commented Feb 15, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 15, 2025

Instance eessi-bot-vsc-ugent is configured to build for:

  • architectures: x86_64/amd/zen3
  • repositories: eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat

@trz42
Copy link
Collaborator Author

trz42 commented Feb 15, 2025

bot: build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90
bot: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
bot: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80

Copy link

eessi-bot bot commented Feb 15, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 from trz42

    • expanded format: build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

@riscv-eessi-io-bot
Copy link

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Feb 15, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 from trz42

    • expanded format: build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 resulted in:

  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 15, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 from trz42

    • expanded format: build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-azure repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot-surf
Copy link

Updates by the bot instance eessi-bot-surf (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Feb 15, 2025

New job on instance eessi-bot-mc-azure for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_918/1014

date job status comment
Feb 15 17:17:57 UTC 2025 submitted job id 1014 awaits release by job manager
Feb 15 17:18:58 UTC 2025 released job awaits launch by Slurm scheduler
Feb 15 17:26:01 UTC 2025 running job 1014 is running
Feb 15 18:26:20 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-1014.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-1739641985.tar.gzsize: 4373 MiB (4585654376 bytes)
entries: 11757
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.1.1
CUDA/12.4.0
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
no other files in tarball
Feb 15 18:26:20 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-1014.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Feb 15, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_918/46229

date job status comment
Feb 15 17:17:58 UTC 2025 submitted job id 46229 awaits release by job manager
Feb 15 17:18:08 UTC 2025 released job awaits launch by Slurm scheduler
Feb 15 17:24:12 UTC 2025 running job 46229 is running
Feb 15 18:07:35 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-46229.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1739641470.tar.gzsize: 2305 MiB (2417981809 bytes)
entries: 6239
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
CUDA/12.4.0.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
CUDA/12.4.0
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Feb 15 18:07:35 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-46229.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
  • didn't rebuild CUDA/12.1.1 $\rightarrow$ need to debug
  • probably found the CUDA/12.1.1 module under the CPU-only installation path (CUDA/12.4.0 was never installed there)
  • need to force the rebuilding

Copy link

eessi-bot bot commented Feb 15, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_918/46230

date job status comment
Feb 15 17:18:03 UTC 2025 submitted job id 46230 awaits release by job manager
Feb 15 17:18:06 UTC 2025 released job awaits launch by Slurm scheduler
Feb 15 17:24:10 UTC 2025 running job 46230 is running
Feb 15 18:17:46 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-46230.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1739641816.tar.gzsize: 2305 MiB (2417989425 bytes)
entries: 6239
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
CUDA/12.4.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
CUDA/12.4.0
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
no other files in tarball
Feb 15 18:17:46 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-46230.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
  • didn't rebuild CUDA/12.1.1 $\rightarrow$ need to debug
  • probably found the CUDA/12.1.1 module under the CPU-only installation path (CUDA/12.4.0 was never installed there)
  • need to force the rebuilding

@trz42 trz42 changed the title {2023.06}[2023a] rebuild CUDA/* modules to update module files {2023.06}[2023a,2023b] rebuild CUDA/* modules to update module files Feb 15, 2025
@trz42
Copy link
Collaborator Author

trz42 commented Feb 16, 2025

Force rebuilding CUDA/12.1.1 for zen2 and zen3...
bot: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
bot: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80

Copy link

eessi-bot bot commented Feb 16, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

Copy link

eessi-bot bot commented Feb 16, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@riscv-eessi-io-bot
Copy link

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-surf
Copy link

Updates by the bot instance eessi-bot-surf (click for details)
  • account trz42 has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 16, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Feb 16, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_918/46375

date job status comment
Feb 16 19:33:16 UTC 2025 submitted job id 46375 awaits release by job manager
Feb 16 19:33:51 UTC 2025 released job awaits launch by Slurm scheduler
Feb 16 19:39:57 UTC 2025 running job 46375 is running
Feb 16 20:04:49 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-46375.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1739735469.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Feb 16 20:04:49 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-46375.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Feb 16, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_918/46376

date job status comment
Feb 16 19:33:20 UTC 2025 submitted job id 46376 awaits release by job manager
Feb 16 19:33:49 UTC 2025 released job awaits launch by Slurm scheduler
Feb 16 19:38:54 UTC 2025 running job 46376 is running
Feb 16 20:08:58 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-46376.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1739735561.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
no other files in tarball
Feb 16 20:08:58 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-46376.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented Feb 16, 2025

Latest jobs will fail rebuilding CUDA/12.1.1 with a known error (example below for zen3)

Failed to chmod/chown several paths ... (about ~ 630 paths) beginning with
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/CUDA/12.1.1

@trz42
Copy link
Collaborator Author

trz42 commented Feb 16, 2025

Force rebuilding CUDA/12.1.1 for zen2 and zen3 and try previous workaround for permission denied issue (reverting #907)...
bot: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
bot: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 17, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@riscv-eessi-io-bot
Copy link

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-surf
Copy link

Updates by the bot instance eessi-bot-surf (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Feb 17, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_918/46525

date job status comment
Feb 17 05:52:24 UTC 2025 submitted job id 46525 awaits release by job manager
Feb 17 05:53:11 UTC 2025 released job awaits launch by Slurm scheduler
Feb 17 05:54:14 UTC 2025 running job 46525 is running
Feb 17 06:04:33 UTC 2025 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job46525.result does not exist in job directory or reading it failed.
  • No artefacts were found/reported.
Feb 17 06:04:33 UTC 2025 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job46525.test does not exist in job directory or reading it failed.

@trz42
Copy link
Collaborator Author

trz42 commented Feb 17, 2025

One more...
bot: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80

Copy link

eessi-bot bot commented Feb 17, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

@riscv-eessi-io-bot
Copy link

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Feb 17, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot-surf
Copy link

Updates by the bot instance eessi-bot-surf (click for details)
  • account trz42 has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 17, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Feb 17, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_918/46526

date job status comment
Feb 17 06:04:12 UTC 2025 submitted job id 46526 awaits release by job manager
Feb 17 06:04:33 UTC 2025 released job awaits launch by Slurm scheduler
Feb 17 06:05:37 UTC 2025 running job 46526 is running
Feb 17 06:12:49 UTC 2025 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job46526.result does not exist in job directory or reading it failed.
  • No artefacts were found/reported.
Feb 17 06:12:49 UTC 2025 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job46526.test does not exist in job directory or reading it failed.

@trz42
Copy link
Collaborator Author

trz42 commented Feb 17, 2025

Should be better...
bot: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80

Copy link

eessi-bot bot commented Feb 17, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

Copy link

eessi-bot bot commented Feb 17, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@riscv-eessi-io-bot
Copy link

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 17, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 from trz42

    • expanded format: build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-mc-aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot-surf
Copy link

Updates by the bot instance eessi-bot-surf (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Feb 17, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_918/46529

date job status comment
Feb 17 06:13:07 UTC 2025 submitted job id 46529 awaits release by job manager
Feb 17 06:13:52 UTC 2025 released job awaits launch by Slurm scheduler
Feb 17 06:14:55 UTC 2025 running job 46529 is running
Feb 17 06:53:04 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-46529.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1739774348.tar.gzsize: 0 MiB (45 bytes)
entries: 0
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Feb 17 06:53:04 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-46529.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented Feb 17, 2025

Looks better. If rebuilding works, rollback changes related to alternative removal of packages, but keep grep for [F] when determining packages to be rebuilt. Then retry.

@@ -101,7 +101,7 @@ fi
pr_diff=$(ls [0-9]*.diff | head -1)

# if this script is run as root, use PR patch file to determine if software needs to be removed first
if [ $EUID -eq 0 ]; then
if [ $EUID -ne 0 ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't look right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is, the comments will need updating

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just trying to get it working by using an alternative approach that doesn't use fakeroot ;)

@trz42
Copy link
Collaborator Author

trz42 commented Feb 17, 2025

Didn't work to remove existing installations. However, CUDA/12.1.1 was identified as an installation to be removed (in addition to CUDA/12.4.0). So this part works now. Opting to open another PR based on the standard way to remove/rebuild a package that also includes the changes to consider CUDA/12.1.1.

@trz42
Copy link
Collaborator Author

trz42 commented Feb 17, 2025

Superseded by #919

@trz42 trz42 closed this Feb 17, 2025
Copy link

eessi-bot bot commented Feb 17, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.02/pr_918/46229', '/project/def-users/SHARED/jobs/2025.02/pr_918/46230', '/project/def-users/SHARED/jobs/2025.02/pr_918/46375', '/project/def-users/SHARED/jobs/2025.02/pr_918/46376', '/project/def-users/SHARED/jobs/2025.02/pr_918/46377', '/project/def-users/SHARED/jobs/2025.02/pr_918/46378', '/project/def-users/SHARED/jobs/2025.02/pr_918/46379', '/project/def-users/SHARED/jobs/2025.02/pr_918/46502', '/project/def-users/SHARED/jobs/2025.02/pr_918/46511', '/project/def-users/SHARED/jobs/2025.02/pr_918/46524', '/project/def-users/SHARED/jobs/2025.02/pr_918/46525', '/project/def-users/SHARED/jobs/2025.02/pr_918/46526', '/project/def-users/SHARED/jobs/2025.02/pr_918/46529'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.02.17

Copy link

eessi-bot bot commented Feb 17, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.02/pr_918/1014'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.02.17

@riscv-eessi-io-bot
Copy link

PR merged! Moved [] to /home/eessibot/shared/trash_bin/EESSI/software-layer/2025.02.17

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 17, 2025

PR merged! Moved [] to /scratch/gent/vo/002/gvo00211/SHARED/trash_bin/EESSI/software-layer/2025.02.17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants