-
Notifications
You must be signed in to change notification settings - Fork 14
Return a BandedMatrix from a view of a BandedBlockBandedMatrix #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #223 +/- ##
==========================================
- Coverage 88.16% 86.06% -2.10%
==========================================
Files 11 11
Lines 1115 1134 +19
==========================================
- Hits 983 976 -7
- Misses 132 158 +26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@MikaelSlevinsky With this (and the just tagged BlockArrays v1.6.2) we have the following: julia> using AppleAccelerate # needed so banded matrices aren't insanely slow
julia> for n in 20:20:200
ax = BlockArrays.BlockedOneTo(ArrayLayouts.RangeCumsum(Base.OneTo(n)))
D = BandedBlockBandedMatrix{Float64}(I, (ax,ax), (0, 0), (0, 0))
x = BlockVector(randn(sum(1:n)), (ax,))
y = zero(x)
@time my_special_mul_through_data!(y, D, x);
@time my_special_mul!(y, D, x);
@time my_special_mul_with_a_view!(y, D, x);
@time mul!(y, D, x);
end
0.000003 seconds
0.000011 seconds (80 allocations: 5.938 KiB)
0.000003 seconds
0.000012 seconds (2 allocations: 64 bytes)
0.000002 seconds
0.000008 seconds (160 allocations: 18.219 KiB)
0.000002 seconds
0.000026 seconds (2 allocations: 64 bytes)
0.000002 seconds
0.000012 seconds (240 allocations: 37.188 KiB)
0.000003 seconds
0.000007 seconds (2 allocations: 64 bytes)
0.000003 seconds
0.000017 seconds (320 allocations: 62.438 KiB)
0.000006 seconds
0.000011 seconds (2 allocations: 64 bytes)
0.000004 seconds
0.000029 seconds (400 allocations: 94.500 KiB)
0.000009 seconds
0.000011 seconds (2 allocations: 64 bytes)
0.000005 seconds
0.000034 seconds (480 allocations: 133.469 KiB)
0.000013 seconds
0.000015 seconds (2 allocations: 64 bytes)
0.000007 seconds
0.000039 seconds (560 allocations: 178.156 KiB)
0.000013 seconds
0.000020 seconds (2 allocations: 64 bytes)
0.000009 seconds
0.000049 seconds (640 allocations: 229.531 KiB)
0.000022 seconds
0.000027 seconds (2 allocations: 64 bytes)
0.000012 seconds
0.000052 seconds (720 allocations: 287.469 KiB)
0.000059 seconds
0.000034 seconds (2 allocations: 64 bytes)
0.000017 seconds
0.000072 seconds (800 allocations: 351.938 KiB)
0.000029 seconds
0.000045 seconds (2 allocations: 64 bytes) So we have got rid of almost all allocations. The remaining speed difference is due to julia> n = 10_000; D = BandedMatrix(0 => 1:n); x = randn(n); @time D*x;
0.000042 seconds (3 allocations: 78.188 KiB)
julia> D = Diagonal(1:n); x = randn(n); @time D*x;
0.000010 seconds (3 allocations: 78.188 KiB) We could special case diagonal blocks not to call BLAS... actually I suspect the fastest would be to get rid of all calls to banded BLAS and just do naive for loops.... |
Awesome, thanks! |
The reason ApproxFun is broken is that is assuming that @jishnub thoughts? |
@MikaelSlevinsky This gets rid of the allocations (for lazy axes at least).
Still need to figure out why it's slow.