Skip to content

incorrect parallelism on A64FX with EasyBuild 5.x #70

@boegel

Description

@boegel

From the log for job 519833 on A64FX (for EESSI/software-layer#1146):

== limiting parallelism to 4 (was 16) for GCCcore on aarch64/a64fx to avoid out-of-memory failures during building/testing
== limiting parallelism to 1 (was 4) for GCCcore on aarch64/a64fx to avoid out-of-memory failures during building/testing
== limiting parallelism to 4 (was 16) for GCC on aarch64/a64fx to avoid out-of-memory failures during building/testing

That doesn't look right, and definitely explains why the build is taking such a long time (close to 10h for GCCcore/13.3.0 😱 ).

The bot is configured to request a full A64FX node (48 cores), and then we configure EasyBuild to only use a quarter of the available cores (because of the limited amount of HBM memory, 32GB).

Since EasyBuild v5.0.0, the default EasyBuild configuration is to use max. 16 cores (see easybuilders/easybuild-framework#4816).
We pick up on that in the hooks, and then take a quarter of that (rather than the full 48 cores), resulting in 4 rather than the intended 12.

The make -j 4 bootstrap in build step (of 1st iteration) of GCCcore installation takes 5h46min, that would probably be a lot faster than using the intended 12 cores...

So maybe we should just disable the hook that limits parallelism to a quarter of the default parallelism when using EasyBuild 5.x, and only divert from that when really needed (when the build fails due to lack of memory)...

What's also bad is that the 2nd "limiting parallelism to 1 (was 4) for GCCcore" shows that we're not careful when limiting parallelism through our hook: GCCcore is an iterated build, and in the 2nd iterating we're again changing the level of parallelism to just a quarter (so 4 -> 1).
That's mainly painful in 3rd iteration, where build step uses make (so single core...) and takes 3.5 hours).
We should only change tweak the parallelism during the 1st iteration, and then not touch it again.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions