Skip to content

Conversation

bobpaw
Copy link
Collaborator

@bobpaw bobpaw commented Jun 23, 2025

add ccache to Github CI

  • .github/workflows/cmake.yml: Add ccache to main GitHub CI workflow.
  • Limit workflow runs to source file and CMake script changes.
  • Reflow multiline strings.
  • Remove cmake multi-configuration options (--config) because ninja is single-config.
  • remove -j from cmake --build because ninja is parallel by default.
  • Prune release LLVM and release C++20 from build matrix.
  • .github/workflows/cleanup-caches.yml: Add cleanup workflow for closed PRs.

Unfortunately, caches (with GitHub actions/cache) are scoped by branch. So the first commit may still run a long time, but subsequent commits will be much faster.

Reduced time on from ~2h to ~50m (total runtime for all jobs in matrix).

- add ccache config in jobs.build.env.
    - limit cache size to 200mb for now.
    - compression can also be recomputed at the end with -X
- add ccache setup and ccache action (not sure if I will need to change
  to explicit restore/save).
- add CMake ccache argument to config for normal build and examples.
- remove --config from the build step because ninja is a single config
  builder.
- reflow strings for the example projects
- for mpi-nompi example set c_compiler to be safe.

Signed-off-by: Aiden Woodruff <[email protected]>
- new $GITHUB_OUTPUT is preferred and ::set-output is deprecated
- reflow jobs.build.steps.build.run to one line now that extra options
  are removed.

Signed-off-by: Aiden Woodruff <[email protected]>
bobpaw added 6 commits June 23, 2025 14:07
- ccache_dir should be the actual location of the cache, while basedir
  should be the root path for all files.
- Not sure if the github actions yaml parser is non-standard, but extra
  indentation seemed to be screwing up the multiline strings. try to fix
  and join some related lines

Signed-off-by: Aiden Woodruff <[email protected]>
- use apt-get (which has a more stable CLI interface) instead of apt.

Signed-off-by: Aiden Woodruff <[email protected]>
- add c++ version and build type to cache key parameters.
    - higher cache hits decreases build time significantly, so always
      try to get the most relevant one.
- reduce cache maxsize since the number of caches will increase

Signed-off-by: Aiden Woodruff <[email protected]>
Signed-off-by: Aiden Woodruff <[email protected]>
- whether or not nompi is enabled significantly affects compilation,
  since it's exposed in SCOREC_config.h as well as changing what happens
  in pcu/pcu_defines.h
- add cxx: prefix to the cxx_standard to make the key more readable.
- add new key as a second option to ensure that old caches can be used.
    - next commit will remove old form.

Signed-off-by: Aiden Woodruff <[email protected]>
- remove old cache key and only use new one.
- make ctest run in parallel. the nprocs script should be available as
  it is included with coreutils.

Signed-off-by: Aiden Woodruff <[email protected]>
@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 23, 2025

This except from a failing run shows that ctest DEPENDS is simply not good enough -- the dependent test started after its requirement, but before the first test had completed! Fixtures are therefore required for parallel testing.

      Start 31: gmshV4AirFoil
28/55 Test #30: gmshv2TwoQuads ...................   Passed    0.21 sec
      Start 32: gmshV4AirFoil_dmgDiff
29/55 Test #32: gmshV4AirFoil_dmgDiff ............***Failed    0.00 sec
/usr/bin/diff: /home/runner/work/core/core/pumi-meshes/gmsh/v4/AirfoilDemo.dmg: No such file or directory

      Start 33: verify_serial
30/55 Test #33: verify_serial ....................   Passed    0.22 sec
      Start 34: verify_2nd_order_shape_quads
31/55 Test #27: create_misCube ...................   Passed    0.57 sec
      Start 35: verify_2nd_order_shape_tris
32/55 Test #34: verify_2nd_order_shape_quads .....   Passed    0.21 sec
      Start 36: uniform_serial
33/55 Test #35: verify_2nd_order_shape_tris ......   Passed    0.20 sec
      Start 37: classifyThenAdapt
34/55 Test #37: classifyThenAdapt ................   Passed    0.21 sec
      Start 38: tet_serial
35/55 Test #38: tet_serial .......................   Passed    0.29 sec
      Start 39: applyMatrixFunc
36/55 Test #36: uniform_serial ...................   Passed    0.53 sec
      Start 40: outputcontrol
37/55 Test #39: applyMatrixFunc ..................   Passed    0.21 sec
      Start 41: parmaSerial
38/55 Test #31: gmshV4AirFoil ....................   Passed    1.28 sec

- not only is nprocs missing, but the DEPENDS property isn't good
  enough.

Signed-off-by: Aiden Woodruff <[email protected]>
bobpaw added 2 commits June 23, 2025 15:45
- .github/workflows/cmake.yml: recompress cache before sending.
    - this may or may not help since zstd is already used by the cache
      action.
- .github/workflows/cleanup-caches.yml: add from documentation to
  cleanup 100 caches on PR close.

Signed-off-by: Aiden Woodruff <[email protected]>
- .github/workflows/cleanup-caches.yml: replace tabs with spaces.
- .github/workflows/cmake.yml: add paths that will trigger runs.
- fix ccache recompression arguments.

Signed-off-by: Aiden Woodruff <[email protected]>
@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 23, 2025

@cwsmith should I also add ccache for @Sichao25's python-api-test.yml workflow?

@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 23, 2025

Also, I'm investigating the low cache hit ratio. Right now most jobs get ~50% for the exact same code.

bobpaw added 4 commits June 23, 2025 17:21
- could not find local differences to explain cache misses, but the
  version on ubuntu 22.04 is 4.5.1 and the latest is 4.11. 3 years might
  make a difference.

Signed-off-by: Aiden Woodruff <[email protected]>
- with is only for reusable workflows.

Signed-off-by: Aiden Woodruff <[email protected]>
- ccache 4.7 changed the cache format so the old caches are
  incompatible.
- also, clear the statistics after loading. I think that maybe was
  causing incorrect statistics.

Signed-off-by: Aiden Woodruff <[email protected]>
Signed-off-by: Aiden Woodruff <[email protected]>
@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 23, 2025

Also, I'm investigating the low cache hit ratio. Right now most jobs get ~50% for the exact same code.

I figured out that this was due to zeroing statistics before restoring the cache. Now cache hits are >95%.

bobpaw added 5 commits June 23, 2025 17:48
- .github/workflows/python-api-test.yml: add ccache settings.
- add ccache to the Zoltan and PUMI builds (typically the longest
  steps).
- might need to use ccache-swig instead which is built by the swig
  library.

Signed-off-by: Aiden Woodruff <[email protected]>
- also do shallow clone

Signed-off-by: Aiden Woodruff <[email protected]>
- remove build-essential from list. probably not causing any of the
  long install time problems (mpich and dependencies are at fault) but
  it should not be necessary as a meta-package since all it's
  depdendencies are installed by default.
- remove cmake from the install list as it's also available by default.
- fix cmake configure typo.

Signed-off-by: Aiden Woodruff <[email protected]>
- the install dependencies step took a long time because of pip!
- add cache for pip.
- add step to get single timestamp for both caches.

Signed-off-by: Aiden Woodruff <[email protected]>
- allow ccache to masquerade as the compiler for swig build.
- same method used by the action hendrikmuhs/ccache-action in
  https://github.com/swig/swig/blob/1f8684eb540d5d7c63fe7d5c3853990dd2316999/.github/workflows/linux.yml

Signed-off-by: Aiden Woodruff <[email protected]>
- use the top level env instead of adding to each config.
- gklib/metis/parmetis are secretly cmake but they have a make config.
  using the environmental variable should still reach cmake.

Signed-off-by: Aiden Woodruff <[email protected]>
@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 23, 2025

@cwsmith should I also add ccache for @Sichao25's python-api-test.yml workflow?

Did this

@bobpaw bobpaw marked this pull request as ready for review June 23, 2025 23:02
@bobpaw bobpaw requested a review from cwsmith June 23, 2025 23:02
@cwsmith cwsmith added the CI label Jun 24, 2025
- .github/workflows/cmake.yml: exclude release llvm and c++20. we really
  only care about the extra warnings from llvm and release should have
  less code.
- set CMAKE_C_COMPILER_LAUNCHER=ccache in top-level workflow env instead
  of on each config.

Signed-off-by: Aiden Woodruff <[email protected]>
Copy link
Contributor

@cwsmith cwsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you.

- .github/workflows/python-api-test.yml: avoid sudo when installing
  ccache compiler masquerades by installing into /opt instead of
  /usr/local/opt.

Signed-off-by: Aiden Woodruff <[email protected]>
@bobpaw bobpaw requested a review from cwsmith June 24, 2025 16:10
@cwsmith cwsmith merged commit 1049872 into develop Jun 24, 2025
21 checks passed
@cwsmith cwsmith deleted the apw/ci-ccache branch June 24, 2025 16:11
@bobpaw bobpaw restored the apw/ci-ccache branch June 24, 2025 16:18
@bobpaw bobpaw deleted the apw/ci-ccache branch June 24, 2025 16:19
@cwsmith cwsmith added the v4.1.0 changes included in the 4.1.0 release label Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI v4.1.0 changes included in the 4.1.0 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants