Skip to content

🐛[BUG]: GPU Memory Limit Exceeded for cBottle SR Example #649

@dphow

Description

@dphow

Version

0.11.0

On which installation method(s) does this occur?

uv

Describe the issue

With latest install of version 0.11.0 using torch2.10.0+cu128, the example notebook at examples/16_cbottle_super_resolution.py fails with an OoM issue. This was attempted with an A100 40GB GPU on Derecho at NSF NCAR. The same example works with a H100 80GB.

I am not sure if the latest updates to pytorch or E2S broke this but can anyone else reproduce this? I've tried insertions of torch.cuda.empty_cache() as well as setting PYTORCH_ALLOC_CONF=expandable_segments:True plus some other PYTORCH_ALLOC_CONF options to no avail on the A100s with the latest versions installed. I can try testing against 0.10.0 and torch 2.9.1+cu129 when I have some time.

My uv environment is below for reference

dhoward@deg0073:/glade/work/dhoward/E2S> uv pip list
Package                      Version
---------------------------- -----------------
absl-py                      2.3.1
affine                       2.4.0
aiobotocore                  3.1.1
aiofiles                     25.1.0
aiohappyeyeballs             2.6.1
aiohttp                      3.13.3
aioitertools                 0.13.0
aiosignal                    1.4.0
anemoi-inference             0.4.9
anemoi-models                0.3.1
anemoi-transform             0.1.20
anemoi-utils                 0.4.43
aniso8601                    10.0.1
annotated-types              0.7.0
antlr4-python3-runtime       4.9.3
anyio                        4.12.1
anytree                      2.13.0
array-api-compat             1.13.0
asttokens                    3.0.1
attrs                        25.4.0
azure-core                   1.38.0
azure-storage-blob           12.28.0
bokeh                        3.8.2
boto3                        1.42.30
botocore                     1.42.30
bracex                       2.6
cartopy                      0.25.0
cattrs                       25.3.0
cbottle                      2025.5.1
cdsapi                       0.7.7
certifi                      2026.1.4
cffi                         2.0.0
cfgrib                       0.9.15.1
cftime                       1.6.5
cfunits                      3.3.7
charset-normalizer           3.4.4
chex                         0.1.91
click                        8.3.1
cligj                        0.7.2
cloudpickle                  3.1.2
colabtools                   0.0.1
coloredlogs                  15.0.1
comm                         0.2.3
contourpy                    1.3.3
cryptography                 46.0.3
cucim-cu12                   25.12.0
cuda-bindings                12.9.4
cuda-pathfinder              1.3.3
cuda-python                  12.9.4
cuda-toolkit                 12.8.1
cupy-cuda12x                 13.6.0
cuvs-cu12                    25.12.0
cycler                       0.12.1
dacite                       1.9.2
dask                         2026.1.1
debugpy                      1.8.19
decorator                    5.2.1
deprecation                  2.1.0
dinosaur-dycore              1.2.1
distributed                  2026.1.1
dm-haiku                     0.0.16
dm-tree                      0.1.9
donfig                       0.8.1.post1
earth2grid                   2025.7.1+torch210
earth2studio                 0.11.0rc0
earthkit-data                0.18.4
earthkit-geo                 0.4.0
earthkit-meteo               0.5.0
earthkit-regrid              0.4.0
earthkit-utils               0.1.2
eccodes                      2.44.0
eccodeslib                   2.44.1.8
eckitlib                     1.32.4.8
ecmwf-datastores-client      0.4.2
ecmwf-opendata               0.3.26
editorconfig                 0.17.1
einops                       0.8.1
entrypoints                  0.4
etils                        1.13.0
executing                    2.2.1
fastrlock                    0.8.3
fckitlib                     0.14.1.8
filelock                     3.20.3
findlibs                     0.1.2
flash-attn                   2.8.3
flatbuffers                  25.12.19
flax                         0.12.0
flexcache                    0.3
flexparser                   0.4
fme                          2025.10.0
fonttools                    4.61.1
frozenlist                   1.8.0
fsspec                       2026.1.0
gcsfs                        2026.1.0
gitdb                        4.0.12
gitpython                    3.1.46
globus-sdk                   3.65.0
google-api-core              2.29.0
google-auth                  2.47.0
google-auth-oauthlib         1.2.4
google-cloud-core            2.5.0
google-cloud-storage         3.8.0
google-cloud-storage-control 1.9.0
google-crc32c                1.8.0
google-resumable-media       2.8.0
googleapis-common-protos     1.72.0
graphcast                    0.2.0.dev0
grpc-google-iam-v1           0.14.3
grpcio                       1.76.0
grpcio-status                1.76.0
h11                          0.16.0
h5netcdf                     1.8.0
h5py                         3.14.0
hf-xet                       1.2.0
httpcore                     1.0.9
httpx                        0.28.1
huggingface-hub              1.3.3
humanfriendly                10.0
humanize                     4.15.0
hydra-core                   1.3.2
idna                         3.11
imageio                      2.37.2
imageio-ffmpeg               0.6.0
importlib-metadata           8.7.1
importlib-resources          6.5.2
intake-esgf                  2026.1.8
iprogress                    0.4
ipykernel                    7.1.0
ipython                      9.9.0
ipython-pygments-lexers      1.1.1
ipywidgets                   8.1.8
isodate                      0.7.2
jax                          0.7.1
jax-cuda12-pjrt              0.7.1
jax-cuda12-plugin            0.7.1
jaxlib                       0.7.1
jedi                         0.19.2
jinja2                       3.1.6
jmespath                     1.1.0
jmp                          0.0.4
joblib                       1.5.3
jraph                        0.0.6.dev0
jsbeautifier                 1.15.4
jsonschema                   4.26.0
jsonschema-specifications    2025.9.1
jupyter-client               8.8.0
jupyter-core                 5.9.1
jupyterlab-widgets           3.0.16
kiwisolver                   1.4.9
lark                         1.3.1
lazy-loader                  0.4
libcuvs-cu12                 25.12.0
libraft-cu12                 25.12.0
librmm-cu12                  25.12.0
lightning-utilities          0.15.2
llvmlite                     0.46.0
locket                       1.0.0
loguru                       0.7.3
lru-dict                     1.4.1
lz4                          4.4.5
makani                       0.2.0
markdown                     3.10.1
markdown-it-py               4.0.0
markupsafe                   3.0.3
matplotlib                   3.10.8
matplotlib-inline            0.2.1
mdurl                        0.1.2
microsoft-aurora             1.8.0
ml-dtypes                    0.5.4
more-itertools               10.8.0
moviepy                      2.2.1
mpmath                       1.3.0
msgpack                      1.1.2
multi-storage-client         0.40.0
multidict                    6.7.0
multiurl                     0.3.7
narwhals                     2.15.0
nest-asyncio                 1.6.0
netcdf4                      1.7.2
networkx                     3.6.1
ninja                        1.13.0
numba                        0.63.1
numcodecs                    0.14.1
numpy                        2.3.5
nvidia-cublas-cu12           12.8.4.1
nvidia-cuda-cupti-cu12       12.8.90
nvidia-cuda-nvcc-cu12        12.9.86
nvidia-cuda-nvrtc-cu12       12.8.93
nvidia-cuda-runtime-cu12     12.8.90
nvidia-cudnn-cu12            9.10.2.21
nvidia-cufft-cu12            11.3.3.83
nvidia-cufile-cu12           1.13.1.3
nvidia-curand-cu12           10.3.9.90
nvidia-cusolver-cu12         11.7.3.90
nvidia-cusparse-cu12         12.5.8.93
nvidia-cusparselt-cu12       0.7.1
nvidia-ml-py                 13.590.48
nvidia-nccl-cu12             2.27.5
nvidia-nvimgcodec-cu12       0.6.1.37
nvidia-nvjitlink-cu12        12.8.93
nvidia-nvshmem-cu12          3.4.5
nvidia-nvtx-cu12             12.8.90
nvidia-physicsnemo           1.3.0
nvsmi                        0.4.2
nvtx                         0.2.14
oauthlib                     3.3.1
omegaconf                    2.3.0
onnx                         1.20.1
onnxruntime-gpu              1.23.2
opentelemetry-api            1.39.1
opt-einsum                   3.4.0
optax                        0.2.6
orbax-checkpoint             0.11.32
packaging                    26.0
pandas                       3.0.0
parso                        0.8.5
partd                        1.4.2
pdbufr                       0.14.1
pexpect                      4.9.0
pillow                       11.3.0
pint                         0.25.2
planetary-computer           1.0.0
platformdirs                 4.5.1
plotly                       6.5.2
prettytable                  3.17.0
proglog                      0.1.12
prompt-toolkit               3.0.52
propcache                    0.4.1
proto-plus                   1.27.0
protobuf                     6.33.4
psutil                       7.2.1
ptyprocess                   0.7.0
pure-eval                    0.2.3
pyarrow                      23.0.0
pyasn1                       0.6.2
pyasn1-modules               0.4.2
pycparser                    3.0
pydantic                     2.12.5
pydantic-core                2.41.5
pygments                     2.19.2
pyjwt                        2.10.1
pylibraft-cu12               25.12.0
pynvml                       13.0.1
pyparsing                    3.3.2
pyproj                       3.7.2
pyshp                        3.0.3
pystac                       1.14.3
pystac-client                0.9.0
python-dateutil              2.9.0.post0
python-dotenv                1.2.1
pytz                         2025.2
pyyaml                       6.0.3
pyzmq                        27.1.0
rapids-logger                0.2.3
rasterio                     1.5.0
rdkit                        2025.9.3
referencing                  0.37.0
requests                     2.32.5
requests-cache               1.2.1
requests-oauthlib            2.0.0
rich                         14.2.0
rioxarray                    0.20.0
rmm-cu12                     25.12.0
rpds-py                      0.30.0
rsa                          4.9.1
rtree                        1.4.1
ruamel-yaml                  0.19.1
s3fs                         2026.1.0
s3transfer                   0.16.0
safetensors                  0.7.0
scikit-image                 0.25.2
scikit-learn                 1.8.0
scipy                        1.17.0
semantic-version             2.10.0
sentry-sdk                   2.50.0
setuptools                   80.10.1
shapely                      2.1.2
shellingham                  1.5.4
simplejson                   3.20.2
six                          1.17.0
smmap                        5.0.2
sortedcontainers             2.4.0
soundfile                    0.13.1
stack-data                   0.6.3
sympy                        1.14.0
tabulate                     0.9.0
tblib                        3.2.2
tensorboard                  2.20.0
tensorboard-data-server      0.7.2
tensorly                     0.9.0
tensorly-torch               0.5.0
tensorstore                  0.1.80
termcolor                    3.3.0
threadpoolctl                3.6.0
tifffile                     2026.1.14
timm                         1.0.24
toolz                        1.1.0
torch                        2.10.0+cu128
torch-geometric              2.4.0
torch-harmonics              0.8.0
torchmetrics                 1.8.2
torchvision                  0.25.0+cu128
tornado                      6.5.4
tqdm                         4.67.1
traitlets                    5.14.3
tree-math                    0.2.1
treelib                      1.8.0
treescope                    0.1.10
trimesh                      4.11.1
triton                       3.6.0
typer-slim                   0.21.1
typing-extensions            4.15.0
typing-inspection            0.4.2
tzdata                       2025.3
url-normalize                2.2.1
urllib3                      2.6.3
wandb                        0.24.0
wcmatch                      10.1
wcwidth                      0.3.1
werkzeug                     3.1.5
widgetsnbextension           4.0.15
wrapt                        2.0.1
xarray                       2025.12.0
xarray-tensorstore           0.3.0
xattr                        1.3.0
xyzservices                  2025.11.0
yarl                         1.22.0
zarr                         3.1.5
zict                         3.0.0
zipp                         3.23.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions