-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: <CUDA out of memory> #746
Comments
Are you sure there weren't any other GPU-intensive processes running at the time? Based on that error message, this happened because around 4 gb of video memory was reserved for something else. I would try restarting the machine and sorting again as a first step, if you haven't done that yet. |
I also have this happen to me regularly. I have tried many different cuda versions and nvidia driver versions. Happy to provide any files you need, however the main file this happens on is a 70 gb file. |
|
my .bin file size is about 200 gb |
@hemant22 I still don't see anything to indicate that Kilosort is causing this, especially if you're getting the error at different points in the pipeline. This error message: Is saying: Kilosort is using currently ~4.3 GB of video memory. It tried to allocate an additional 1.6 GB, but couldn't do that because there was no more video memory available. The only reason that would happen is if something else is running on your machine that is using up that memory, or otherwise preventing pytorch from making use of it. Windows task manager is also not a reliable way to gauge memory usage for pytorch. A better way to check is using the |
@jacobpennington No one else is running anything on the machine that might be using the memory. That's for sure. The error happens mostly at this line: - "vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)". Should I try this that is suggested with the error: |
Okay. Can you please also try sorting a subset of the data, say with |
@jacobpennington I tried running with tmax=1800. It ran successfully. So what should I do/check next to figure out the problem? Copied below is the log file : |
@hemant22 Can you open the results in Phy and check if anything looks off with the waveforms or anything else? Screenshots from that would be helpful. |
@jacobpennington I checked KS output (for tmax=1800) in phy. I didn't find anything that is strange or different from the other sessions. Additionally, I was able to run full session on KS3 without any problems; so the raw data looks fine to me. |
Okay, thanks. If you're comfortable modifying the code, can you please try the change in this pull request and see if you're able to sort the full recording? It just adds a couple lines to one file. |
Update to the previous comment: you no longer need to modify the code to try that. You can update to the latest version (v4.0.15) and use |
@jacobpennington Thank you for your help. It is working now. |
Hi Jacob, @jacobpennington
|
@Hobart10 Can you please provide a screenshot of what the KS4 GUI looks like when you load your data, and the |
Just found out it works in kilosort directly but not through spikeinterface in my case. Will consult there. Thank you!! |
I still get OOM from this line ( |
@RobertoDF Is that still the case? Just checking since you closed your pull requests. |
Hi! We have tried clearing the gpu cache, using the qr.kmeansplusplus version and even tried using an older version of KS but it always runs out of memory at the final clustering stage. We get the same error if we try on two different machines and when running on a hpc cluster - on the hpc, we used a GPU with ~18gb memory. Do you have any other suggestions? Since we seem to need more memory, is it currently possible to run a single instance of kilosort4 across multiple GPUs at the same time? |
Is the problem arising specifically at |
yep, without your pull request error occurs at when using it error is either at line 215: Seems to be very similar to the issues Peyton-D mentioned here: #775 |
Thanks! Working on a solution this week. |
Describe the issue:
Kilosort shows error during the Final clustering step. This happened thrice but with the same session (data). KS works fine on other sessions.
Reproduce the bug:
No response
Error message:
Version information:
Kilosort v4.0.13
GPU: Nvidia GeForce RTX 3070
The text was updated successfully, but these errors were encountered: