ADACS GPU Work -- Base by gusgw · Pull Request #622 · 21cmfast/21cmFAST

gusgw · 2026-02-23T02:39:50Z

Summary

This PR provides the foundation for GPU acceleration in 21cmFAST, developed as part of the ADACS optimization project. It includes:

Complete GPU implementation of the InitialConditions path
Support for both ZELDOVICH and 2LPT perturbation algorithms on GPU
Verified CPU/GPU numerical parity across all physics configurations
Build system improvements for profiling and optimization workflows

Key Changes

GPU Initial Conditions Implementation

GPU InitialConditions path: Full CUDA implementation of initial condition generation, including density field computation, velocity field generation, and high-resolution perturbation support
2LPT support on GPU: Second-order Lagrangian Perturbation Theory implementation with correct cuFFT handling and volume scaling
MapMass GPU kernel: Rewritten to match CPU algorithm exactly, with proper velocity index calculation and bounds checking

Build System Improvements

Profile-Guided Optimization (PGO) workflow support via PY21C_PGO_PHASE and PY21C_PGO_DIR environment variables
Debug symbols and symbol visibility controls for profiling
Build control environment variables for flexible optimization levels

Testing Infrastructure

GPU-CPU parity test framework with reference data
Comprehensive field diagnostics for CPU/GPU comparison
Three-way comparison infrastructure (main vs cpu-optimized vs gpu)

Validation

Extensive three-way comparison testing was performed on both Skylake/P100 and Milan/A100 architectures:

Numerical Parity (CPU vs GPU)

Architecture	Min Correlation	Notes
Skylake/P100	0.999999	All non-discrete scripts
Milan/A100	0.9995	All non-discrete scripts

Discrete halo sampling scripts show expected divergence due to different random number sequences on CPU vs GPU.

Performance (Average over 46 test scripts)

Architecture	CPU vs Main	GPU vs CPU
Skylake/P100	+13.0%	+8.9%
Milan/A100	+10.5%	-7.2%

Note: GPU is slower on A100 for these small test workloads due to transfer overhead. Larger production runs are expected to benefit more from GPU acceleration.

Selected Recent Bug Fixes

Fix cuFFT first-call failure on P100/Pascal GPUs
Fix GPU velocity displacement calculation in MapMass_gpu
Fix GPU stochasticity: position randomization and type mismatch for discrete halo sampling
Fix 2LPT implementation: cuFFT R2C requires tightly-packed input (not FFT-padded),
and phi_2 needs VOLUME pre-multiplication to match velocity kernel expectations

Test Configurations

Testing covered all major physics configurations:

park19, Munoz21, Qin20 physics models
Coeval and lightcone calculations
With and without 2LPT (ZELDOVICH algorithm)
Minihalo and discrete halo sampling modes
Multiple random seeds for reproducibility

Commits (75 total)

Key commits:

7c3a5060 Fix GPU 2LPT implementation
278aa749 Re-enable GPU InitialConditions path
0c01b8f7 Implement 2LPT support in GPU MapMass kernel
4433dea1 Fix cuFFT first-call failure on P100/Pascal GPUs
4ca51dfc Cherry-pick InitialConditions GPU implementation
107c6f9f Fix discrete halo correlation failures in GPU stochasticity sampling
32568104 Fix GPU stochasticity: position randomization and type mismatch
29a9b3d3 Fix GPU velocity displacement calculation in MapMass_gpu

Future Work

This branch serves as the base for continued GPU optimization work:

GPU profiling and kernel optimization
Extended GPU coverage for additional computation stages
Performance optimization for production workloads

Test Plan

CI tests pass
GPU parity tests pass on P100 and A100
Coeval calculations produce matching results between CPU and GPU
Lightcone calculations produce matching results between CPU and GPU
Both ZELDOVICH and 2LPT algorithms work correctly on GPU

Phase 1.2 of incremental upstream merge. This commit integrates the source-flag-redesign changes from upstream which replaces USE_MASS_DEPENDENT_ZETA with SOURCE_MODEL enum. Conflict resolutions: - HaloBox.h: Added extern "C" block, kept convert_halo_props function - SpinTemperatureBox.c: Updated flag checks to SOURCE_MODEL == 1 - map_mass.h: Added extern "C" block, kept MapMass_gpu function - rng.h: Kept extern "C" block - HaloCatalog.h: Preserved updateGlobalParams CUDA utility function - PerturbedField.h: Added extern "C" block for CUDA compatibility - PerturbedHaloCatalog.h: Added extern "C" block for CUDA compatibility File renames by upstream (accepted): - HaloField.c -> HaloCatalog.c - HaloField.h -> HaloCatalog.h - PerturbField.c -> PerturbedField.c - PerturbField.h -> PerturbedField.h - PerturbHaloField.c -> PerturbedHaloCatalog.c - PerturbHaloField.h -> PerturbedHaloCatalog.h Removed CFFI wrapper files (we use nanobind): - _inputparams_wrapper.h - _outputstructs_wrapper.h

This field was added to the C struct in the source-flag-redesign merge (PR 21cmfast#572) but was missing from the Python wrapper class, causing AttributeError when creating AstroOptions objects. Signed-off-by: Angus Gray-Weale <gusgw@gusgw.net>

Merged commits: - 21cmfast#570 A_s_branch (primordial amplitude parameter) - 21cmfast#576 compiler-detection - 21cmfast#578 CosmoTables (cosmology table improvements) Key changes: - CosmoTables now passed from Python instead of reading from file in C - Removed CFFI-specific code (we use nanobind) - Kept nanobind import style in test files

- Add Table1D and CosmoTables struct definitions to InputParameters.h - Add Table1D, CosmoTables bindings and Free_cosmo_tables_global to wrapper - Fix cosmology.c to use size_density instead of CLASS_LENGTH - Update inputs.py to use nanobind instead of CFFI for Table1D/CosmoTables Signed-off-by: Angus Gray-Weale <gusgw@gusgw.net>

Merged commits: - 21cmfast#582 X_RAY_HEATING (conditional X-ray heating feature) - 21cmfast#575 readme-updates (documentation improvements) - Various CI updates (21cmfast#579-585) Key changes: - Added USE_X_RAY_HEATING flag to AstroOptions - Conditional memory allocation for X-ray heating arrays - Removed CFFI files (we use nanobind)

gusgw · 2026-02-25T01:46:48Z

@qyx268 Yes adding the label sounds like a good idea. We manage labels within a yaml file in the repo, so I will add that in a separate PR.

@gusgw is this meant to replace #541? If this is going to be merged ~soon, I think we should rather aim to merge into the branch release-v4.2. We can then manage the release properly.

Also @gusgw if you need any pointers on handling the conflicts, let me know. Of course we've done a fair bit of work and bugfixing etc on the main branch since you branched off.

Hi @steven-murray, I quite agree! I have merged in up to 4.0.0, I think and all my tests are passing. I should be up to 4.2 shortly. I'm working through testing my CPU-only and GPU versions against the relevant point on main.

Signed-off-by: Angus Gray-Weale <gusgw@gusgw.net>

for more information, see https://pre-commit.ci

… box

…_global_evolution

…amine the linear growth factor

…rror if it is not in the right dimensions

Bumps [actions/github-script](https://github.com/actions/github-script) from 6 to 8. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@v6...v8) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '8' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact) from 14 to 15. - [Release notes](https://github.com/dawidd6/action-download-artifact/releases) - [Commits](dawidd6/action-download-artifact@v14...v15) --- updated-dependencies: - dependency-name: dawidd6/action-download-artifact dependency-version: '15' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v6...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact) from 15 to 16. - [Release notes](https://github.com/dawidd6/action-download-artifact/releases) - [Commits](dawidd6/action-download-artifact@v15...v16) --- updated-dependencies: - dependency-name: dawidd6/action-download-artifact dependency-version: '16' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

Remove 16 files that should not be in the repository: - PR-adacs-gpu-base.md (draft PR document) - check_gpu_usage.{md,py} (local development files) - gpu_test_*.py, test_gpu*.py, simple_gpu_test.py (local test scripts) - import_21cmfast.py (local development script) - install_custom.py (redundant install wrapper) Restore bump script that was accidentally removed during merges.

… USE_SIGMA_8 - Add missing power_in_vcb function binding to _wrapper.cpp (fixes test_ps_runs) - Cast USE_SIGMA_8 to bool in inputs.py to satisfy nanobind's strict type checking (fixes test_coeval_against_direct)

review-notebook-app · 2026-03-03T07:19:28Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

alserene and others added 30 commits October 21, 2024 19:10

Add TODO comments.

a1384c8

Add array handling.

6416201

Remove comment.

6fdfdd2

Tidy types.

24d5509

Remove outdated comment.

f171e23

Tidy code.

3d53662

Minor changes to resolve error messages.

c444018

Add PerturbField.o to build file.

b939e36

Misc changes during debugging.

3943d2d

Debugging attempts

51eab94

Reorganise C and CUDA code so only CUDA related code in cu file.

af06767

Trim trailing whitespace.

9162148

Add CUDAError to list of exitcodes.

c5f6450

Add CUDAError to list of error codes.

b9d1fd3

Clean up kernel.

6fdb4e8

Remove cudaDeviceSynchronize.

0b36ebc

Capital letter in comment

477d076

reset profiling libraries

1d9ff1c

Merge branch 'v4-prep' into adacs_tiger_dev_v2

16d3cdf

fix purging of lowres density when needed

b32ebce

First draft of SpinTemp kernel.

a374af3

Make corrections to fix compilation errors.

8e62698

Add SpinTemp CUDA object to build file.

7045288

Add -lstdc++ flag for thrust.

f1d5c6c

Add workaround for table struct corruption.

3689bfa

Add accessor function for SFRD_conditional_table.

4413c33

Fix struct corruption bug.

b25cefe

Reuse memory for unchanged arrays.

456446d

Fix pointer passing issue.

4c3800c

Add accerssor function for nbins.

1bd2f88

gusgw added 5 commits February 24, 2026 06:51

gusgw and others added 23 commits February 25, 2026 17:31

Merge global_signal (21cmfast#588) into adacs-gpu-base

717f9e3

Merge v4.1.0 into adacs-gpu-base

db4f79c

Add ps_norm and USE_SIGMA_8 to CosmoTables for v4.1.0 compatibility

cc10e1d

Add Q_HI to TsBox for v4.1.0 compatibility

772ddce

Add MIN_XE_FOR_FCOLL_IN_TAUX to SimulationOptions for v4.1.0

a0e935a

Fix global_evolution.py import for nanobind

3b332f2

Signed-off-by: Angus Gray-Weale <gusgw@gusgw.net>

[pre-commit.ci] auto fixes from pre-commit.com hooks

45e7707

for more information, see https://pre-commit.ci

added a feature to run initial conditions with an input hires density…

e6d35b9

… box

added option to do linear perturbation theory with 21cmFAST, with run…

9175be1

…_global_evolution

added density as an output of run_global_evolution, this allows to ex…

37d8c6c

…amine the linear growth factor

increased coverage

e3475b5

renamed hires_density_array to initial_density and introduced a new e…

0cd8560

…rror if it is not in the right dimensions

renamed delta_z0 to overdensity_z0

4cd2540

placed underscore, to stress that users should not use this method

890e84b

edited tutorial and faq to give information on the new features

a344e1f

ci: only create sdists

420100e

ci: better name for building source dist

28631f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADACS GPU Work -- Base #622

ADACS GPU Work -- Base #622
gusgw wants to merge 254 commits into21cmfast:mainfrom
gusgw:adacs-gpu-base

gusgw commented Feb 23, 2026 •

edited by qyx268

Loading

Uh oh!

gusgw commented Feb 25, 2026 •

edited

Loading

Uh oh!

review-notebook-app bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

gusgw commented Feb 23, 2026 • edited by qyx268 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

GPU Initial Conditions Implementation

Build System Improvements

Testing Infrastructure

Validation

Numerical Parity (CPU vs GPU)

Performance (Average over 46 test scripts)

Selected Recent Bug Fixes

Test Configurations

Commits (75 total)

Future Work

Test Plan

Uh oh!

gusgw commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

gusgw commented Feb 23, 2026 •

edited by qyx268

Loading

gusgw commented Feb 25, 2026 •

edited

Loading