-
Notifications
You must be signed in to change notification settings - Fork 157
Description
Version
Latest main
On which installation method(s) does this occur?
source
Describe the issue
There seems to be a subtle issue with NetCDF4 files not being properly flushed to disk leading to issues trying to move/rename the file immediately after writing.
On my system, this works ok (output_file is on a Lustre file system):
results = NetCDF4Backend(output_file)
run.deterministic([start_time], num_steps, model, source, results, device=device)
ds = xr.open_dataset(output_file)
print(float(ds["t2m"].max()))outputs 304.91934.
Meanwhile this leads to an output file that consists of fill values:
results = NetCDF4Backend("/tmp/tmp_file.nc")
run.deterministic([start_time], num_steps, model, source, results, device=device)
shutil.move("/tmp/tmp_file.nc", output_file)
ds = xr.open_dataset(output_file)
print(float(ds["t2m"].max()))outputs 9.969209968386869e+36. Note that this involves a move from the local /tmp file system to Lustre.
Adding .sync() of the NetCDF4 Dataset fixes the issue:
results = NetCDF4Backend("/tmp/tmp_file.nc")
run.deterministic([start_time], num_steps, model, source, results, device=device)
results.root.sync() # <-- ADDED
shutil.move("/tmp/tmp_file.nc", output_file)
ds = xr.open_dataset(output_file)
print(float(ds["t2m"].max()))outputs 305.44061 (the model is probabilistic, so it's normal to have a different result).
The latter is a quick fix but I don't think it's a great solution as it breaks the interchangeability of IO backends (results.root.sync() will fail if results is another backend that doesn't have this attribute). Would it be reasonable to add self.root.sync() at the end of the write method of NetCDF4Backend? I'm not sure if this would have adverse performance effects.