Skip to content

Conversation

@kmp5VT
Copy link
Owner

@kmp5VT kmp5VT commented Dec 23, 2025

@Israafkh I see that forming omega is a major bottleneck for large problems. I don't think we need the tensor since it is only used in 2 locations. Instead we can use rows, s and vals. Working on rewriting this. I also found an issue with our current SE-QRCS implementation the @view function in the map in the sparse matrix multiplication produced the wrong index list.

@kmp5VT kmp5VT requested a review from Israafkh December 23, 2025 22:50
@kmp5VT
Copy link
Owner Author

kmp5VT commented Dec 24, 2025

@Israafkh Sorry to bother near a holiday but I think there is an issue in the SEQRCS algorithm. We never consider the sign of the element in omega with our code. (i.e. we accidentally make all values +1). Can you verify when you have time

@Israafkh
Copy link
Collaborator

Israafkh commented Dec 24, 2025

@Israafkh Sorry to bother near a holiday but I think there is an issue in the SEQRCS algorithm. We never consider the sign of the element in omega with our code. (i.e. we accidentally make all values +1). Can you verify when you have time

Hello Karl, sorry for the late reply I was sick and wasn't able to catch things. I was checking the code that generates omega and it considers the signs of the elements. When printing the values I can see +1 and -1.
Oh now I can see that you mean the sketched_matricization files and yes the multiplication with values is not included I just added it but wasn't able to handle it with @view.

@Israafkh
Copy link
Collaborator

@kmp5VT I think there is also a problem in the indices because most probably pos is exceeding the bounds of v? This is what I noticed and trying to figure this out.

@kmp5VT
Copy link
Owner Author

kmp5VT commented Dec 24, 2025

@Israafkh Sorry this is not a rush, Im just making a comment here since you are an expert in the SE-QRCS method so we can verify correctness. Please take time away if you need it!

@kmp5VT
Copy link
Owner Author

kmp5VT commented Dec 24, 2025

Okay, these changes now fix the memory problem. There is no explosion of memory (my test of a large tensor has a reduction of 24 GB). But the code is still very slow for large systems so I will look into whats going on there.

omega = nothing;
indices = findall(col -> any(!=(0), col), eachcol(rows_sel))

## TODO working here.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Israafkh So I realized that this portion of the code scales exponentially with tensor dimensions. This is a significant bottle neck in the SEQRCS method. Thinking about how we can improve this. The section is trivially parallelizable which can help performance but right now this is the best I could imagine (it truncates early if the lists match one number).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "hard" part of the problem is that the pivot list is discontinuous even if it was ordered so finding the "kept" columns is slow in the limit of large problem size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Israafkh So I realized that this portion of the code scales exponentially with tensor dimensions. This is a significant bottle neck in the SEQRCS method. Thinking about how we can improve this. The section is trivially parallelizable which can help performance but right now this is the best I could imagine (it truncates early if the lists match one number).

Thank you so much for making it more efficient! I can see that this part is taking a significant time. So either as you said we can parallelize it easly and use OpenMP for the loop or another idea that I was trying is converting p_sk to set, set(p_sk). This way checking if the element is in the set is faster then scanning the array each time. What do you think?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have been working on this problem a little bit. I will push my change now, I added a dict for faster lookup which I think is more efficient than using set. But I think we can probably write this more efficiently since we are effectively writing a sparse matrix multiply of omega(m, n) P(n, r).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have been working on this problem a little bit. I will push my change now, I added a dict for faster lookup which I think is more efficient than using set. But I think we can probably write this more efficiently since we are effectively writing a sparse matrix multiply of omega(m, n) P(n, r).

Thanks for the update I can see that dict makes it faster. Sorry for not being able to work so much on this as I am traveling today. But once I am back I will resume working on the project.

@kmp5VT
Copy link
Owner Author

kmp5VT commented Jan 3, 2026

So now my tests show that all the sketching portion takes 99% of the SE-QRCS algorithm (about 150 times slower than the next slowest component).

@kmp5VT
Copy link
Owner Author

kmp5VT commented Jan 5, 2026

@Israafkh I think this is the best we can do without parallelizing the contraction in the sketched matricization. Do you have any objections with merging into main?

@Israafkh
Copy link
Collaborator

Israafkh commented Jan 5, 2026

@Israafkh I think this is the best we can do without parallelizing the contraction in the sketched matricization. Do you have any objections with merging into main?

Hello Karl, Thank you for the progress. I've just checked the updates and have no objections from my side to merging into main.

@kmp5VT kmp5VT merged commit a732451 into main Jan 5, 2026
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants