Skip to content

Support Metal as device #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: he/fix/add-FT-to-gpu-tests
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
[weakdeps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"
Metal = "dde4c033-4e86-420c-a63e-0dd931031962"

[extensions]
ClimaCommsCUDAExt = "CUDA"
ClimaCommsMPIExt = "MPI"
ClimaCommsMetalExt = "Metal"

[compat]
Adapt = "3, 4"
Expand All @@ -24,4 +26,5 @@ Logging = "1.9.4"
LoggingExtras = "1.1.0"
MPI = "0.20.18"
StaticArrays = "1.9"
Metal = "1"
julia = "1.9"
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ CPU, or on several GPUs using with MPI).
- `CPUSingleThreaded`
- `CPUMultiThreaded` (not actively used)
- `CUDADevice`
- `MetalDevice`
and `Context`es (i.e., environments for distributed computing):
- `SingletonCommsContext`
- `MPICommsContext`
Expand Down
2 changes: 2 additions & 0 deletions docs/src/apis.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ ClimaComms
```@docs
ClimaComms.@import_required_backends
ClimaComms.cuda_is_required
ClimaComms.metal_is_required
ClimaComms.mpi_is_required
```

Expand All @@ -24,6 +25,7 @@ ClimaComms.AbstractCPUDevice
ClimaComms.CPUSingleThreaded
ClimaComms.CPUMultiThreaded
ClimaComms.CUDADevice
ClimaComms.MetalDevice
ClimaComms.device
ClimaComms.device_functional
ClimaComms.array_type
Expand Down
14 changes: 13 additions & 1 deletion docs/src/faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,16 @@ export CLIMACOMMS_DEVICE="CUDA"
```
in your shell (outside of Julia, no spaces).

If you want to run on a Metal device, set `CLIMACOMMS_DEVICE` to `Metal`.

```julia
ENV["CLIMACOMMS_DEVICE"] = "Metal"
```
or calling
```julia
export CLIMACOMMS_DEVICE="Metal"
```

## My simulation does not start and crashes with a `MPI` error. I don't want to run with `MPI`. What should I do?

`ClimaComms` tries to be smart and select the best configuration for your run.
Expand Down Expand Up @@ -65,7 +75,7 @@ but do not import `CUDA.jl` in your code.
`ClimaComms` provides a macro, [`ClimaComms.@import_required_backends`](@ref),
that you can add at the top of your scripts to automatically load the required
packages when needed. Note, the packages have to be in your Julia environment,
so you might install packages like ` MPI.jl` and `CUDA.jl`.
so you might install packages like `MPI.jl` and `CUDA.jl` (or `Metal.jl`).

## How can I see the MPI state and verify that MPI is set up correctly?

Expand All @@ -78,4 +88,6 @@ The output varies depending on your communication context type:

When using GPU acceleration with `CUDADevice`, the summary additionally includes the device type and UUID.

When using Metal acceleration with `MetalDevice`, the summary additionally includes the device type.

To test that MPI and CUDA are set up correctly, see [this guide](https://github.com/CliMA/slurm-buildkite?tab=readme-ov-file#testing-cuda-and-mpi-modules).
1 change: 1 addition & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ executing some code. The `Device`s currently implemented are
- [`CPUSingleThreaded`](@ref ClimaComms.CPUSingleThreaded) for a CPU core with a single thread,
- [`CPUMultiThreaded`](@ref ClimaComms.CPUMultiThreaded) for a CPU core with multiple threads,
- [`CUDADevice`](@ref ClimaComms.CUDADevice) for a single CUDA-enabled GPU.
- [`MetalDevice`](@ref ClimaComms.MetalDevice), for a Metal-enabled GPU.

`Device`s are part of [`Context`](@ref ClimaComms.AbstractCommsContext)s,
objects that contain information require for multiple `Device`s to communicate.
Expand Down
5 changes: 4 additions & 1 deletion docs/src/internals.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ First, we will describe what `Device`s and `Context`s are.
GPU, et cetera). The `Device`s implemented are
- [`CPUSingleThreaded`](@ref ClimaComms.CPUSingleThreaded), for a CPU core with a single thread;
- [`CUDADevice`](@ref ClimaComms.CUDADevice), for a single CUDA GPU.
- [`MetalDevice`](@ref ClimaComms.MetalDevice), for a Metal GPU.

`Device`s in `ClimaComms` are
[singletons](https://docs.julialang.org/en/v1/manual/types/#man-singleton-types),
Expand Down Expand Up @@ -117,7 +118,9 @@ devices supported by `ClimaComms`.
Except the most basic ones, `ClimaComms` computing devices and contexts are
implemented as independent backends. For instance, `ClimaComms` provides an
`AbstractDevice` interface for which `CUDADevice` is an implementation that
depends on [`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl). Scripts that use
depends on [`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl). `ClimaComms` also
provides an `AbstractDevice` interface for `MetalDevice`, which depends on
[`Metal.jl`](https://github.com/Metal-for-Julia/Metal.jl). Scripts that use
`ClimaComms` have to load the packages that power the desired backend (e.g.,
`CUDA.jl` has to be explicitly loaded if one wants to use `CUDADevice`s).

Expand Down
Loading
Loading