-
-
Notifications
You must be signed in to change notification settings - Fork 712
Performance comparison of iterators and ranges to copy vector images #5669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Performance comparison of iterators and ranges to copy vector images #5669
Conversation
Comparing perfomance with using the cast image filter to convert between different vector pixel type. There are two option to compare: - Using ImageRegionRange appears to have performnace benifits, compared to the ImageScanelineIterators. - Using NumericTraits::GetLength is nearly a constant /compile time expression which is also performant. The challenge is choosing input vs output pixel type when only one has a compile time size.
|
Here is sample timing from my system in release mode: |
f530e9c to
236cb22
Compare
|
Thanks @blowekamp, interesting, I'll have a look! In general, I wonder, should a performance benchmark be part of the regular unit test set? Or should it have its own project? Specifically, should your benchmark be a part of https://github.com/InsightSoftwareConsortium/ITKPerformanceBenchmarking ? If your benchmark would stay here in InsightSoftwareConsortium/ITK I would suggest to make it a GoogleTest! (Although maybe a benchmark suite like https://github.com/google/benchmark would even be nicer.) |
| const unsigned int componentsPerPixel = NumericTraits<OutputPixelType>::GetLength(outputPixel); | ||
| while (inputIt != inputEnd) | ||
| { | ||
| InputPixelType inputPixel = *inputIt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not const InputPixelType & inputPixel? (The original code also has const InputPixelType & inputPixel.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You reference code times better on my system.
Modules/Filtering/ImageFilterBase/test/itkImageIterationPerformanceTest.cxx
Outdated
Show resolved
Hide resolved
Modules/Filtering/ImageFilterBase/test/itkImageIterationPerformanceTest.cxx
Show resolved
Hide resolved
| while (inputIt != inputEnd) | ||
| { | ||
| InputPixelType inputPixel = *inputIt; | ||
| OutputPixelType outputPixel(componentsPerPixel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutputPixelType outputPixel(componentsPerPixel) is expensive for VariableLengthVector. Please consider moving this declaration outside the while (inputIt != inputEnd) loop (just like you proposed for CastImageFilter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran the last part of your benchmark:
--- Test Suite 4: VectorImage<float> to VectorImage<double> ---
=== Large Image (512^3) ===
Image size: 512x512x512
Scanline Iterator: 243.729 Mpixels/sec (0.550685 seconds)
ImageRegionRange: 35.8598 Mpixels/sec (3.74284 seconds)
Scanline NumericTraits:245.443 Mpixels/sec (0.546839 seconds)
Range NumericTraits: 290.972 Mpixels/sec (0.461273 seconds)
Results match: YES
That looks very bad for ImageRegionRange. However, after moving the declaration of outputPixel outside the while (inputIt != inputEnd) loop, it said:
=== Large Image (512^3) ===
Image size: 512x512x512
Scanline Iterator: 246.817 Mpixels/sec (0.543794 seconds)
ImageRegionRange: 175.812 Mpixels/sec (0.763417 seconds)
Scanline NumericTraits:246.84 Mpixels/sec (0.543744 seconds)
Range NumericTraits: 293.25 Mpixels/sec (0.45769 seconds)
Results match: YES
So now ImageRegionRange still doesn't seem as fast as Scanline Iterator, but it's only 0.76 versus 0.54 seconds now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is expensive. For this case in my posted results it's the reason why it's 10x slower when then the output is a VectorImage.
Interestingly, when OutputPixelType outputPixel(componentsPerPixel) is outside the loop it's ~2x slower. AND if it's OutputPixelType outputPixel{ *outputIt}; the same issue of unexpected modified output as reported in #5668 occurs.
| std::cout << "CTEST_FULL_OUTPUT" << std::endl; | ||
| std::cout << "ITK Image Iteration Performance Test" << std::endl; | ||
| std::cout << "=====================================" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want this to become a regular unit test, please limit the amount of output! The ITK tests already produce an excessive amount of output, in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional full output is intentional so that the performance results from the CI can be seen.
I don't know what the long term utility of this performance comparison will be. Currently some systems ran out of memory. So a couple tweaks are needed for that... likely command line the size of the image and reduce number of images in memory.
| // Test different image sizes and types | ||
| constexpr unsigned int Dimension = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider testing 2D as well. Maybe do one small 2D and 1 large 3D.
Modules/Filtering/ImageFilterBase/test/itkImageIterationPerformanceTest.cxx
Outdated
Show resolved
Hide resolved
Modules/Filtering/ImageFilterBase/test/itkImageIterationPerformanceTest.cxx
Outdated
Show resolved
Hide resolved
Modules/Filtering/ImageFilterBase/test/itkImageIterationPerformanceTest.cxx
Outdated
Show resolved
Hide resolved
| const auto inputEnd = inputRange.end(); | ||
|
|
||
|
|
||
| OutputPixelType outputPixel{ *outputIt }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, the value of *outputIt may not yet be initialized, right? The output pixel values may still be uninitialized at this point. If so, better just declare OutputPixelType outputPixel; without retrieving the value from *outputIt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that the values in outputPixel are uninitialized. However, for the VectorImage cast, the outputPixel will point to the data in the output Image's buffer. So this removed the need to allocation memory, reduced the memory locations used ( atleast for variable length vectors. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also the "BUG" cast where the reference to the first pixel is maintained and will be assigned all values in the range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you implying that #5661 didn't fix a bug? To be honest, the time it took me to understand the origin of this bug is a clear indication to me that this is not the expected behavior of the Get function of an iterator. And I guess this is not consistent with how it behaves with scalar pixel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merged patch fixed the issue in the case filter. But this new code block with the ImageRangeIterator exhibits the same behavior as the original CastImageFilter.
Some tricks have been needed to get operations on the VectorImageFilter to perform reasonably. But they have been this way for over 10 years, and crippling filters by making them run 10x slower may not be a reasonable option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's reasonable too and I don't have the solution. But it does not mean it's not a bug. BTW, if you implement the (slow) version, a recursive Gaussian test fails which probably indicates a bug there too.
236cb22 to
a720403
Compare
|
I think the Benchmark repo may be a better place for this. I updated the benchmark with many of your changes. Here are my current results:
|
dzenanz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As image region range seems more performant than scanline iterator, do you mind issuing a separate PR which updates the cast image filter with the refactoring?
We have determined which is the most performant, however it relies on the behavior of the unexpected assigned bug to gain that performance. The suggestion (in that issue) of making the return value "const PixelType" will would change the performance characteristics in these examples, since it would cause memory allocation for the vector pixel types. How should we classify the referenced returned of VectorImages for the Iterators and Range iterators? |
|
The |
|
Correspondingly, |
Unfortunately, The Range style iterators have similar behavior with the I'd like to think of this more as documentation/optimization issue than clearly a bug as I'm not sure how big the performance impact on numerous filters will be. These iterators have been this way for over 10 years, it may be best to just leave them as is, and document it. It may be better, less disruptive, and more productive to move on to the Range iterators if their behavior is more defined and consistent. |
| auto inputRange = ImageRegionRange<const TInputImage>(*inputPtr, inputRegionForThread); | ||
| auto outputRange = ImageRegionRange<TOutputImage>(*outputPtr, outputRegionForThread); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: technically this is OK, of course, but in ITK we don't really do AAA ("auto almost always") everywhere. (We just follow clang-tidy modernize-use-auto, but that's still not AAA everywhere). So I would just do:
ImageRegionRange<const TInputImage> inputRange(*inputPtr, inputRegionForThread);
ImageRegionRange<TOutputImage> outputRange(*outputPtr, outputRegionForThread);|
|
||
| const unsigned int componentsPerPixel = outputPtr->GetNumberOfComponentsPerPixel(); | ||
|
|
||
| #if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the #if 0 code may be removed when this PR is ready, right? Otherwise if constexpr (false) might be nicer... 😺 But I would prefer the code to just be removed, eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (argc < 2) | ||
| { | ||
| std::cerr << "Missing parameters." << std::endl; | ||
| std::cerr << "Usage: " << itkNameOfTestExecutableMacro(argv) << " imageSize" << std::endl; | ||
| return EXIT_FAILURE; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run ITKImageFilterBaseTestDriver, it now says:
Available tests:
0. itkNeighborhoodOperatorImageFilterTest
1. itkImageToImageFilterTest
2. itkVectorNeighborhoodOperatorImageFilterTest
3. itkMaskNeighborhoodOperatorImageFilterTest
4. itkCastImageFilterTest
5. itkImageIterationPerformanceTest
To run a test, enter the test number: 5
CTEST_FULL_OUTPUT
Missing parameters.
Usage: <itkImageIterationPerformanceTest executable> imageSize
D:\X\Bin\ITK\bin\Release\ITKImageFilterBaseTestDriver.exe (process 46316) exited with code 1 (0x1).
Press any key to close this window . . .
Please consider a default imageSize, so that the test can be run that way as well... 🙏 (Of course I like GoogleTest even better!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just running ./bin/ITKImageFilterBaseTestDriver itkImageIterationPerformanceTest 512 easy peasy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, easy peasy. 👍 Still I think a default imageSize would make it even easier peasier 😄
|
This has been a useful exercise to gain understanding of the behavior of the iterators with the VectorImage. I'm looking into moving the performance test to the ITK benchmarking repo, and summarizing some thing in the #5668 issue. |
|
@blowekamp Very glad to see For regular images I saw that Of course, if you want to iterate over the entire image buffer, use Those |
PR Checklist
Refer to the ITK Software Guide for
further development details if necessary.