Domain decomposition and halo construction by halungge · Pull Request #540 · C2SM/icon4py

halungge · 2024-09-06T08:55:34Z

Decompose (global) grid file:

uses pymetis to decompose the global grid (cells) into n patches
after decomposition halos for all dimensions (cell, edge, vertex) are constructed. Halo construction is done in a ICON like fashion: They consist halos of 2 cell levels (one upward and one downward pointing line) and the corresponding vertices and edges on these lines.

Omissions:

LAM grids need to be investigated further:
- tests comparing decomposed vs. single_node computation are only run on the global grid.
- for the LAM grids ICON reorders arrays to arrange halo points on the first boundary layers together with the boundary layers, it should be investigated whether that is essential in the model.
- This PR does only take this into account on the computation of the start_index and end_index not in the halo construction.
the number of halo lines (in terms of cells) is hardcoded to 2, that could be made a parameter.
Not sure it all runs on GPU correctly... most probably there are some numpy cupy issues to fix.

model/common/src/icon4py/model/common/grid/grid_manager.py

frozen Domain

# Conflicts: # model/common/src/icon4py/model/common/grid/grid_manager.py # model/common/src/icon4py/model/common/grid/horizontal.py

# Conflicts: # model/common/src/icon4py/model/common/decomposition/definitions.py # model/common/src/icon4py/model/common/grid/base.py # model/common/src/icon4py/model/common/grid/grid_manager.py # model/common/src/icon4py/model/common/grid/gridfile.py # model/common/src/icon4py/model/common/grid/icon.py # model/common/tests/common/grid/fixtures.py # model/testing/src/icon4py/model/testing/grid_utils.py # model/testing/src/icon4py/model/testing/parallel_helpers.py

msimberg · 2026-02-24T15:27:30Z

cscs-ci run default

msimberg · 2026-02-24T15:51:08Z

cscs-ci run default

msimberg · 2026-02-24T15:54:13Z

cscs-ci run distributed

msimberg · 2026-02-24T15:56:40Z

model/common/src/icon4py/model/common/grid/geometry.py

-                            self._edge_domain(h_grid.Zone.LOCAL),
+                            self._edge_domain(
+                                h_grid.Zone.HALO
+                            ),  # TODO(msimberg): END too much, invalid neighbor access. LOCAL too little?


This is weird. Why does this need to compute all the way to the first halo line to get correct results in the halo when a halo exchange is done later? Bug in indices or expected?

Same further down for other fields that need coinnectivities for computation.

jcanton · 2026-02-26T16:12:52Z

model/common/src/icon4py/model/common/utils/data_allocation.py

+def array_ns_from_array(array: NDArray) -> ModuleType:
+    if isinstance(array, np.ndarray):
+        import numpy as xp
+    else:
+        import cupy as xp  # type: ignore[no-redef]
+
+    return xp


sync with #1052 where we also implemented the same:
https://github.com/C2SM/icon4py/pull/1052/changes#diff-11b7ff7e81877fb9c7781a0c2d429d43d49b5640e264607ba1df910eea1e1adfR202
and whoever merges first wins?

model/common/src/icon4py/model/common/decomposition/definitions.py

jcanton · 2026-02-26T17:00:06Z

model/common/src/icon4py/model/common/decomposition/definitions.py

    return SingleNodeReductions()


+class DecompositionFlag(int, Enum):


should we move here the sketch from halo.py?

Indifferent. I think it can be in either place equally well, but won't oppose moving it here if you prefer.

model/common/src/icon4py/model/common/decomposition/halo.py

jcanton · 2026-02-26T17:19:37Z

model/common/src/icon4py/model/common/decomposition/halo.py

+
+    def __init__(
+        self,
+        run_properties: defs.ProcessProperties,


this is called like so (mostly) everywhere else (it's actually mis-spelled processor_procs here and there)

Suggested change

run_properties: defs.ProcessProperties,

processor_props: defs.ProcessProperties,

jcanton · 2026-02-26T17:20:14Z

model/common/src/icon4py/model/common/decomposition/halo.py

+            allocator: GT4Py buffer allocator
+        """
+        self._xp = data_alloc.import_array_ns(allocator)
+        self._props = run_properties


Suggested change

self._props = run_properties

self._processor_props = processor_props

jcanton · 2026-02-26T17:25:39Z

model/common/src/icon4py/model/common/decomposition/halo.py

+        """
+        self._xp = data_alloc.import_array_ns(allocator)
+        self._props = run_properties
+        self._connectivities = {self._value(k): v for k, v in connectivities.items()}


why do we re-define this here instead of using it as is? can we force it to be dict[gtx.FieldOffset, data_alloc.NDArray] and avoid the str conversion? (it's not passed in from many places as far as I can see)

Good question. I think it'd be nice if we can avoid the str. I can try removing that.

jcanton · 2026-02-26T17:27:53Z

model/common/src/icon4py/model/common/decomposition/halo.py

+                f"The distribution assumes more nodes than the current run is scheduled on  {self._props} ",
+            )
+
+    def _assert_all_neighbor_tables(self) -> None:


should we chose either neighbor_table or connectivity but not both?

I think we use connectivity in most places, so I would be in favour of that.

…s.py Co-authored-by: Jacopo Canton <jacopo.canton@gmail.com>

Co-authored-by: Jacopo Canton <jacopo.canton@gmail.com>

…s.py Co-authored-by: Jacopo Canton <jacopo.canton@gmail.com>

github-actions · 2026-02-27T21:51:05Z

Mandatory Tests

Please make sure you run these tests via comment before you merge!

cscs-ci run default
cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

halungge commented Jul 25, 2025

View reviewed changes

model/common/src/icon4py/model/common/grid/grid_manager.py Outdated Show resolved Hide resolved

halungge force-pushed the halo_construction branch from df9c2ef to 72d4d4b Compare July 25, 2025 16:30

Magdalena Luz added 28 commits August 8, 2025 09:16

move test files

2df11a7

remove grid size form decomposition info

361b582

duplicate _construct_grid to make single node tests run

ffe7023

test fix

514b14d

test fix (3)

6c7d35e

test fix (3) offset provider for KDim

f6c68ca

remove duplicated test

ed2336d

apply global to local

fc65d6d

fix global to local transformation in grid_manager

2bec8d8

compute start end indices WIP (i)

7b0d4d8

Merge branch 'main' into halo_construction

21bf6c5

change start_indices and end_indices to map from domain to index

b9472a2

fix setup of simple grid

04c359f

remove _index from Domain

bfbbf3e

simple map Domain -> index

2c7a9c6

frozen Domain

fix grid wrapper

01dfddb

revert unnecessary changes

1c990eb

do not use function under test in test assertion

f507594

Merge branch 'main' into refactor_start_end_indices

8e7b266

fix typing in horizontal.py

7045a4f

Merge branch 'main' into halo_construction

5d50b61

Merge branch 'refactor_start_end_indices' into halo_construction

2525fb5

# Conflicts: # model/common/src/icon4py/model/common/grid/grid_manager.py # model/common/src/icon4py/model/common/grid/horizontal.py

some typing fixes

4c95d92

move transformation to gridfile.py

324a2c0

pass decomposer to GridManager.__call__

dde8759

move fixture import to conftests.py in model/common/tests/common/grid

1f61a40

fix import

a9d9d33

msimberg added 5 commits February 24, 2026 15:53

Sort geometry fields

789a7ff

Add more interpolation fields to parallel grid manager test

b40ff92

Test more metrics fields

67e054b

Remove unused fixture

f864ec6

Fix figure reference

0fbfb62

msimberg added 2 commits February 24, 2026 16:34

Merge remote-tracking branch 'origin/main' into halo_construction

2bc143e

Remove deleted field from test

4fb0936

msimberg reviewed Feb 24, 2026

View reviewed changes