Conversation
Phase 1.2 of incremental upstream merge. This commit integrates the source-flag-redesign changes from upstream which replaces USE_MASS_DEPENDENT_ZETA with SOURCE_MODEL enum. Conflict resolutions: - HaloBox.h: Added extern "C" block, kept convert_halo_props function - SpinTemperatureBox.c: Updated flag checks to SOURCE_MODEL == 1 - map_mass.h: Added extern "C" block, kept MapMass_gpu function - rng.h: Kept extern "C" block - HaloCatalog.h: Preserved updateGlobalParams CUDA utility function - PerturbedField.h: Added extern "C" block for CUDA compatibility - PerturbedHaloCatalog.h: Added extern "C" block for CUDA compatibility File renames by upstream (accepted): - HaloField.c -> HaloCatalog.c - HaloField.h -> HaloCatalog.h - PerturbField.c -> PerturbedField.c - PerturbField.h -> PerturbedField.h - PerturbHaloField.c -> PerturbedHaloCatalog.c - PerturbHaloField.h -> PerturbedHaloCatalog.h Removed CFFI wrapper files (we use nanobind): - _inputparams_wrapper.h - _outputstructs_wrapper.h
This field was added to the C struct in the source-flag-redesign merge (PR 21cmfast#572) but was missing from the Python wrapper class, causing AttributeError when creating AstroOptions objects. Signed-off-by: Angus Gray-Weale <gusgw@gusgw.net>
Merged commits: - 21cmfast#570 A_s_branch (primordial amplitude parameter) - 21cmfast#576 compiler-detection - 21cmfast#578 CosmoTables (cosmology table improvements) Key changes: - CosmoTables now passed from Python instead of reading from file in C - Removed CFFI-specific code (we use nanobind) - Kept nanobind import style in test files
- Add Table1D and CosmoTables struct definitions to InputParameters.h - Add Table1D, CosmoTables bindings and Free_cosmo_tables_global to wrapper - Fix cosmology.c to use size_density instead of CLASS_LENGTH - Update inputs.py to use nanobind instead of CFFI for Table1D/CosmoTables Signed-off-by: Angus Gray-Weale <gusgw@gusgw.net>
Merged commits: - 21cmfast#582 X_RAY_HEATING (conditional X-ray heating feature) - 21cmfast#575 readme-updates (documentation improvements) - Various CI updates (21cmfast#579-585) Key changes: - Added USE_X_RAY_HEATING flag to AstroOptions - Conditional memory allocation for X-ray heating arrays - Removed CFFI files (we use nanobind)
Hi @steven-murray, I quite agree! I have merged in up to 4.0.0, I think and all my tests are passing. I should be up to 4.2 shortly. I'm working through testing my CPU-only and GPU versions against the relevant point on main. |
Signed-off-by: Angus Gray-Weale <gusgw@gusgw.net>
for more information, see https://pre-commit.ci
…_global_evolution
…amine the linear growth factor
…rror if it is not in the right dimensions
Bumps [actions/github-script](https://github.com/actions/github-script) from 6 to 8. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@v6...v8) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '8' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact) from 14 to 15. - [Release notes](https://github.com/dawidd6/action-download-artifact/releases) - [Commits](dawidd6/action-download-artifact@v14...v15) --- updated-dependencies: - dependency-name: dawidd6/action-download-artifact dependency-version: '15' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v6...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact) from 15 to 16. - [Release notes](https://github.com/dawidd6/action-download-artifact/releases) - [Commits](dawidd6/action-download-artifact@v15...v16) --- updated-dependencies: - dependency-name: dawidd6/action-download-artifact dependency-version: '16' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Remove 16 files that should not be in the repository:
- PR-adacs-gpu-base.md (draft PR document)
- check_gpu_usage.{md,py} (local development files)
- gpu_test_*.py, test_gpu*.py, simple_gpu_test.py (local test scripts)
- import_21cmfast.py (local development script)
- install_custom.py (redundant install wrapper)
Restore bump script that was accidentally removed during merges.
… USE_SIGMA_8 - Add missing power_in_vcb function binding to _wrapper.cpp (fixes test_ps_runs) - Cast USE_SIGMA_8 to bool in inputs.py to satisfy nanobind's strict type checking (fixes test_coeval_against_direct)
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Summary
This PR provides the foundation for GPU acceleration in 21cmFAST, developed as part of the ADACS optimization project. It includes:
Key Changes
GPU Initial Conditions Implementation
Build System Improvements
PY21C_PGO_PHASEandPY21C_PGO_DIRenvironment variablesTesting Infrastructure
Validation
Extensive three-way comparison testing was performed on both Skylake/P100 and Milan/A100 architectures:
Numerical Parity (CPU vs GPU)
Discrete halo sampling scripts show expected divergence due to different random number sequences on CPU vs GPU.
Performance (Average over 46 test scripts)
Note: GPU is slower on A100 for these small test workloads due to transfer overhead. Larger production runs are expected to benefit more from GPU acceleration.
Selected Recent Bug Fixes
and phi_2 needs VOLUME pre-multiplication to match velocity kernel expectations
Test Configurations
Testing covered all major physics configurations:
Commits (75 total)
Key commits:
7c3a5060Fix GPU 2LPT implementation278aa749Re-enable GPU InitialConditions path0c01b8f7Implement 2LPT support in GPU MapMass kernel4433dea1Fix cuFFT first-call failure on P100/Pascal GPUs4ca51dfcCherry-pick InitialConditions GPU implementation107c6f9fFix discrete halo correlation failures in GPU stochasticity sampling32568104Fix GPU stochasticity: position randomization and type mismatch29a9b3d3Fix GPU velocity displacement calculation in MapMass_gpuFuture Work
This branch serves as the base for continued GPU optimization work:
Test Plan