-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-Place Dense Matrix Transposition #2199
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, thanks for the PR, and finding bugs in my transpose in place.
I have only a few comments.
matrix[comp] = cval; | ||
break; | ||
} | ||
if(prevComp == start) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else if?
int prevOrig = prevIndexCycle(orig, rows, (maxIndex + 1) / rows); | ||
int prevComp = maxIndex - prevOrig; | ||
|
||
while(true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can probably help the compiler and improve performance a bit, by moving the content of this loop to another function.
furthermore, is this loop possible to do in parallel?
|
||
LibMatrixReorg.transposeInPlaceDenseBrenner(X, 1); | ||
|
||
TestUtils.compareMatrices(X, tX, 1e-8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set epsilon to 0.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2199 +/- ##
============================================
+ Coverage 71.88% 72.30% +0.42%
- Complexity 44701 45023 +322
============================================
Files 1449 1452 +3
Lines 169182 169417 +235
Branches 32980 33059 +79
============================================
+ Hits 121617 122498 +881
+ Misses 38237 37602 -635
+ Partials 9328 9317 -11 ☔ View full report in Codecov by Sentry. |
Added a new kernel for In-Place Dense Matrix Transposition, based on Algorithm 467 by Brenner (DOI: 10.1145/355611.362542).
Performance:
Compared to the existing kernel, the added method provides significant performance benefits in a single-threaded context:
Note:
Performance measurements were restricted to cases where the existing kernel yields correct results. Similar or even better performance could be observed across all cases.
Future Work:
The divisors operate on different indices of the array, allowing for parallelization and offering additional performance improvements in multi-threaded scenarios.
@mboehm7