-
Notifications
You must be signed in to change notification settings - Fork 2
Working on Sparse Sign Matrix #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ontract omega without explicitly forming it
|
@Israafkh Sorry to bother near a holiday but I think there is an issue in the SEQRCS algorithm. We never consider the sign of the element in omega with our code. (i.e. we accidentally make all values +1). Can you verify when you have time |
Hello Karl, sorry for the late reply I was sick and wasn't able to catch things. I was checking the code that generates omega and it considers the signs of the elements. When printing the values I can see +1 and -1. |
|
@kmp5VT I think there is also a problem in the indices because most probably pos is exceeding the bounds of v? This is what I noticed and trying to figure this out. |
|
@Israafkh Sorry this is not a rush, Im just making a comment here since you are an expert in the SE-QRCS method so we can verify correctness. Please take time away if you need it! |
|
Okay, these changes now fix the memory problem. There is no explosion of memory (my test of a large tensor has a reduction of 24 GB). But the code is still very slow for large systems so I will look into whats going on there. |
src/algebra/SEQRCS.jl
Outdated
| omega = nothing; | ||
| indices = findall(col -> any(!=(0), col), eachcol(rows_sel)) | ||
|
|
||
| ## TODO working here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Israafkh So I realized that this portion of the code scales exponentially with tensor dimensions. This is a significant bottle neck in the SEQRCS method. Thinking about how we can improve this. The section is trivially parallelizable which can help performance but right now this is the best I could imagine (it truncates early if the lists match one number).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "hard" part of the problem is that the pivot list is discontinuous even if it was ordered so finding the "kept" columns is slow in the limit of large problem size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Israafkh So I realized that this portion of the code scales exponentially with tensor dimensions. This is a significant bottle neck in the SEQRCS method. Thinking about how we can improve this. The section is trivially parallelizable which can help performance but right now this is the best I could imagine (it truncates early if the lists match one number).
Thank you so much for making it more efficient! I can see that this part is taking a significant time. So either as you said we can parallelize it easly and use OpenMP for the loop or another idea that I was trying is converting p_sk to set, set(p_sk). This way checking if the element is in the set is faster then scanning the array each time. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have been working on this problem a little bit. I will push my change now, I added a dict for faster lookup which I think is more efficient than using set. But I think we can probably write this more efficiently since we are effectively writing a sparse matrix multiply of omega(m, n) P(n, r).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have been working on this problem a little bit. I will push my change now, I added a dict for faster lookup which I think is more efficient than using set. But I think we can probably write this more efficiently since we are effectively writing a sparse matrix multiply of omega(m, n) P(n, r).
Thanks for the update I can see that dict makes it faster. Sorry for not being able to work so much on this as I am traveling today. But once I am back I will resume working on the project.
…'t explicitly construct omega. Also made a faster index code using map and iterators
… (i.e. k_sk is large)
|
So now my tests show that all the sketching portion takes 99% of the SE-QRCS algorithm (about 150 times slower than the next slowest component). |
|
@Israafkh I think this is the best we can do without parallelizing the contraction in the sketched matricization. Do you have any objections with merging into main? |
Hello Karl, Thank you for the progress. I've just checked the updates and have no objections from my side to merging into main. |
@Israafkh I see that forming omega is a major bottleneck for large problems. I don't think we need the tensor since it is only used in 2 locations. Instead we can use rows, s and vals. Working on rewriting this. I also found an issue with our current SE-QRCS implementation the @view function in the map in the sparse matrix multiplication produced the wrong index list.