-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: gdalwarp 3.4.3 takes seconds, gdalwarp 3.5.0 and later takes 30+ minutes #10809
Comments
I might have a try if I knew where and how to get the data "S3B_OL_2_WFR____20230309T115352_20230309T115652_20230310T233800_0180_077_066_1800_MAR_O_NT_003.SEN3" or something similar. |
The data comes from EUMETSAT. A lot of things on their web pages require a login, but I'm pretty sure you can register for free at their login portal: https://eoportal.eumetsat.int/cas/login If you have a EUMETSAT user already, you can download the data with their Python client (EUMDAC): https://user.eumetsat.int/resources/user-guides/eumetsat-data-access-client-eumdac-guide The product is also available from Copernicus, but I think that also requires a user (but you can register for free). If you have a user, go to their browser, click on the half-invisible tab called "Search" in the upper left corner, enter the product name in the box and press the green search button at the bottom left side (you might need to scroll down). You should get a result like this: |
I succeeded in downloading and running your command. Next time you could test all the steps yourself for making it as easy as possible for others to re-produce. I had to edit away some fixed paths from the VRT files before I could run the command, from lines like these:
When I run you command I can confirm that it is slow. The worst is that the oa02.tif file seems to be empty with no valid pixels. That's why it is so small. I used
The whole debug log as an attachment. Without knowing hardly anything about NetCDF I believe that the slowness is not the main problem, but the the process does not actually work at all. |
If "does not actually work at all" is taken at face value, I would have expected an error code to be returned from one of the GDAL commands. Am I misinterpreting what "does not actually work at all" means? Something I just realized is that I'm using a local security-patched ubuntu-small image, and that has the same netCDF support that is in the ubuntu-full image. I think using the ubuntu-full image will be sufficient for reproducing this issue. |
Sorry, I meant that the command seems to work but the result is useless. It is an image with reasonable spatial bounds but it is totally filled by nodata (STATISTICS_VALID_PERCENT=0). Obviously it used to return meaningful data with old GDAL versions, so something must have been changed. Feels like GDAL tries now hard to find something that it never finds.
|
Let's check that I have modified your files correctly. Do you get the same output then I from gdalinfo? Is it the same with 3.5.0 and newer versions?
|
I installed GDAL 3.4.3 and it creates a TIFF with the same statistics than the source vrt. Debug log from this successful run attached. |
I get identical values to you with both 3.4.3 and 3.9.2. The only difference is that my And compared to 3.4.3 and your output, my 3.9.2 has an additional "Min=.. Max" in its output:
|
I fear I have done all that I can. The issue is real, it is not in the speed but it is something more fundamental. I hope that someone else is capable in finding what goes wrong (and edit the title to reflect that). |
Thank you for your help. Based on your analysis, I agree that it's not a performance issue, but I have no idea what is going wrong, so I will leave the title as is for now. I don't know if it matters, but this is the output of
and this is with gdalinfo 3.4.3 on the result processed with 3.9.2:
|
… working datasets to 24 megapixels Fixes OSGeo#10809
Fix(es) in #10812
|
… working datasets to 24 megapixels Fixes OSGeo#10809
Are you saying that if I upgrade to GDAL 3.10 when that is released, I can mask the issue of the misconfigured Int32 type for latitude/longitude by keeping things in memory/not using temp datasets? I can update the generation of latitude/longitude to use Float32 going forward. I still need to understand the ramifications of the past though, the flag you mention was introduced in GDAL 3.5, did 3.4 always keep things in memory and therefore masked the issue? |
no, the wrong declare VRT data type needs to fix it for any GDAL versions. Past versions could be more tolerant about a wrong data type for the declared VRT band data type if for example doing the following chain: Int32 (pixel acquision) -> Float64 (intermediate working type in VRT when applying the scaling) -> Int32 (VRT band data type) -> Float64 (data type used by GDAL geolocation mechanism). Past versions would skip the down cast to Int32, but this was wrong, and has been fixed in later versions. I don't expect any no bad things from declaring the correct data type in past GDAL versions.
Yes 3.4 and below always ingested all the geolocation array and its generated inverse backmap in memory. This was an issue for very large datasets when this didn't fit in RAM and prevented use of geolocation arrays in those cases. |
I noticed the 3.9-backport tag on the PR, so I'm looking forward to leaving GDAL 3.4.3 behind when GDAL 3.9.3 is released. Thank you for the incredibly quick fix! |
What is the bug?
The command
took seconds with GDAL 3.4.3, but takes 38+ CPU minutes with GDAL 3.9.2 when using the GDAL ubuntu-small Docker image. I don't have exact numbers for GDAL 3.5.x, but that was also measured in minutes rather than seconds. Around 15 months ago I bisected GDAL, and landed on #5520 where the performance changed. The PR mentions
"goes from 1.26 s to 3.52 s" when testing on a Sentinel 5P product, so I understand some slowdown was expected.
Performance numbers are on a Ryzen 5900x boosting to 4.55-4.65 GHz. System is not overheating, the fans are not audibly louder than when the system is idle. IO is to a fast SSD, but there is no IO on the system when the CPU is going at full speed.
The commands have the environment variable
GDAL_CACHEMAX=256
, and that value is tuned for the workload using GDAL 3.4.3.The example is using files from the product
S3B_OL_2_WFR____20230309T115352_20230309T115652_20230310T233800_0180_077_066_1800_MAR_O_NT_003.SEN3
, but from what I have observed there is nothing unique with the mentioned product, the GDAL performance is similar for other OL_2_WFR___ products.Steps to reproduce the issue
The VRT is created with
gdal_translate -q -ot Float32 -unscale NETCDF:Oa02_reflectance.nc:Oa02_reflectance -co NUM_THREADS=1 --config GDAL_NUM_THREADS 1 /tmp/tmp_xnrpustp/oa02.vrt
, and then modified based on https://gis.stackexchange.com/questions/103116/map-project-a-raster-having-separate-latitude-and-longitude-raster-bands.The people who initially wrote the code that constructed the GDAL commands are gone, so I don't have any more information than that. Attaching the modified VRT, and the latitude/longitude VRTs that it uses.
vrts.zip
Versions and provenance
GDAL ubuntu-small Docker image. 3.4.3 is quick, 3.5, 3.6, 3.8 and 3.9 are slow.
Additional context
As a sidenote, the resulting GeoTiffs with 3.4.3 for the product mentioned are 3.5-5.5M, and with 3.9.2 they are 128K.
If there is a faster way to get GeoTiffs projected as EPSG:4326 for OL_2_WFR___ products, I'm happy to use that instead.
The text was updated successfully, but these errors were encountered: