BUG: <CUDA out of memory> #746

hemant22 · 2024-07-26T18:07:16Z

Describe the issue:

Kilosort shows error during the Final clustering step. This happened thrice but with the same session (data). KS works fine on other sessions.

Reproduce the bug:

No response

Error message:

07-25 16:18 kilosort.gui.sorter INFO     Kilosort version 4.0.13
07-25 16:18 kilosort.gui.sorter INFO     Sorting Y:\Users\hsrivastava\BS\W5006\W5006_20240723_Loc1_g0_t0.imec0.ap.bin
07-25 16:18 kilosort.gui.sorter INFO     ----------------------------------------
07-25 16:18 kilosort.gui.sorter DEBUG    Initial ops:
{   'data_file_path': WindowsPath('Y:/Users/hsrivastava/BS/W5006/W5006_20240723_Loc1_g0_t0.imec0.ap.bin'),
    'results_dir': WindowsPath('Y:/Users/hsrivastava/BS/W5006/kilosort4'),
    'probe_name': 'neuropixPhase3B1_kilosortChanMap.mat',
    'data_dtype': 'int16',
    'n_chan_bin': 385,
    'fs': 30000.0,
    'batch_size': 60000,
    'nblocks': 1,
    'Th_universal': 9.0,
    'Th_learned': 8.0,
    'tmin': 0.0,
    'tmax': inf,
    'nt': 61,
    'shift': None,
    'scale': None,
    'artifact_threshold': inf,
    'nskip': 25,
    'whitening_range': 32,
    'highpass_cutoff': 300.0,
    'binning_depth': 5.0,
    'sig_interp': 20.0,
    'drift_smoothing': [0.5, 0.5, 0.5],
    'nt0min': 20,
    'dmin': None,
    'dminx': 32.0,
    'min_template_size': 10.0,
    'template_sizes': 5,
    'nearest_chans': 10,
    'nearest_templates': 100,
    'max_channel_distance': None,
    'templates_from_data': True,
    'n_templates': 6,
    'n_pcs': 6,
    'Th_single_ch': 6.0,
    'acg_threshold': 0.2,
    'ccg_threshold': 0.25,
    'cluster_downsampling': 20,
    'x_centers': None,
    'duplicate_spike_ms': 0.25,
    'save_preprocessed_copy': True,
    'data_dir': WindowsPath('Y:/Users/hsrivastava/BS/W5006'),
    'filename': WindowsPath('Y:/Users/hsrivastava/BS/W5006/W5006_20240723_Loc1_g0_t0.imec0.ap.bin'),
    'do_CAR': True,
    'invert_sign': False,
    'NTbuff': 60122,
    'Nchan': 383,
    'duplicate_spike_bins': 7,
    'torch_device': 'cuda',
    'xc': array([43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27.], dtype=float32),
    'yc': array([  20.,   20.,   40.,   40.,   60.,   60.,   80.,   80.,  100.,
        100.,  120.,  120.,  140.,  140.,  160.,  160.,  180.,  180.,
        200.,  200.,  220.,  220.,  240.,  240.,  260.,  260.,  280.,
        280.,  300.,  300.,  320.,  320.,  340.,  340.,  360.,  360.,
        380.,  380.,  400.,  400.,  420.,  420.,  440.,  440.,  460.,
        460.,  480.,  480.,  500.,  500.,  520.,  520.,  540.,  540.,
        560.,  560.,  580.,  580.,  600.,  600.,  620.,  620.,  640.,
        640.,  660.,  660.,  680.,  680.,  700.,  700.,  720.,  720.,
        740.,  740.,  760.,  760.,  780.,  780.,  800.,  800.,  820.,
        820.,  840.,  840.,  860.,  860.,  880.,  880.,  900.,  900.,
        920.,  920.,  940.,  940.,  960.,  960.,  980.,  980., 1000.,
       1000., 1020., 1020., 1040., 1040., 1060., 1060., 1080., 1080.,
       1100., 1100., 1120., 1120., 1140., 1140., 1160., 1160., 1180.,
       1180., 1200., 1200., 1220., 1220., 1240., 1240., 1260., 1260.,
       1280., 1280., 1300., 1300., 1320., 1320., 1340., 1340., 1360.,
       1360., 1380., 1380., 1400., 1400., 1420., 1420., 1440., 1440.,
       1460., 1460., 1480., 1480., 1500., 1500., 1520., 1520., 1540.,
       1540., 1560., 1560., 1580., 1580., 1600., 1600., 1620., 1620.,
       1640., 1640., 1660., 1660., 1680., 1680., 1700., 1700., 1720.,
       1720., 1740., 1740., 1760., 1760., 1780., 1780., 1800., 1800.,
       1820., 1820., 1840., 1840., 1860., 1860., 1880., 1880., 1900.,
       1900., 1920., 1940., 1940., 1960., 1960., 1980., 1980., 2000.,
       2000., 2020., 2020., 2040., 2040., 2060., 2060., 2080., 2080.,
       2100., 2100., 2120., 2120., 2140., 2140., 2160., 2160., 2180.,
       2180., 2200., 2200., 2220., 2220., 2240., 2240., 2260., 2260.,
       2280., 2280., 2300., 2300., 2320., 2320., 2340., 2340., 2360.,
       2360., 2380., 2380., 2400., 2400., 2420., 2420., 2440., 2440.,
       2460., 2460., 2480., 2480., 2500., 2500., 2520., 2520., 2540.,
       2540., 2560., 2560., 2580., 2580., 2600., 2600., 2620., 2620.,
       2640., 2640., 2660., 2660., 2680., 2680., 2700., 2700., 2720.,
       2720., 2740., 2740., 2760., 2760., 2780., 2780., 2800., 2800.,
       2820., 2820., 2840., 2840., 2860., 2860., 2880., 2880., 2900.,
       2900., 2920., 2920., 2940., 2940., 2960., 2960., 2980., 2980.,
       3000., 3000., 3020., 3020., 3040., 3040., 3060., 3060., 3080.,
       3080., 3100., 3100., 3120., 3120., 3140., 3140., 3160., 3160.,
       3180., 3180., 3200., 3200., 3220., 3220., 3240., 3240., 3260.,
       3260., 3280., 3280., 3300., 3300., 3320., 3320., 3340., 3340.,
       3360., 3360., 3380., 3380., 3400., 3400., 3420., 3420., 3440.,
       3440., 3460., 3460., 3480., 3480., 3500., 3500., 3520., 3520.,
       3540., 3540., 3560., 3560., 3580., 3580., 3600., 3600., 3620.,
       3620., 3640., 3640., 3660., 3660., 3680., 3680., 3700., 3700.,
       3720., 3720., 3740., 3740., 3760., 3760., 3780., 3780., 3800.,
       3800., 3820., 3820., 3840., 3840.], dtype=float32),
    'kcoords': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32),
    'chanMap': array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
       182, 183, 184, 185, 186, 187, 188, 189, 190, 192, 193, 194, 195,
       196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208,
       209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,
       222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234,
       235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247,
       248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260,
       261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273,
       274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286,
       287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
       300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312,
       313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325,
       326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338,
       339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351,
       352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364,
       365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377,
       378, 379, 380, 381, 382, 383]),
    'n_chan': 384}

07-25 16:18 kilosort.run_kilosort INFO      
07-25 16:18 kilosort.run_kilosort INFO     Computing preprocessing variables.
07-25 16:18 kilosort.run_kilosort INFO     ----------------------------------------
07-25 16:18 kilosort.run_kilosort INFO     N samples: 256380588
07-25 16:18 kilosort.run_kilosort INFO     N seconds: 8546.0196
07-25 16:18 kilosort.run_kilosort INFO     N batches: 4274
07-25 16:19 kilosort.run_kilosort INFO     Preprocessing filters computed in  9.34s; total  9.35s
07-25 16:19 kilosort.run_kilosort DEBUG    hp_filter shape: torch.Size([30122])
07-25 16:19 kilosort.run_kilosort DEBUG    whiten_mat shape: torch.Size([383, 383])
07-25 16:19 kilosort.run_kilosort INFO      
07-25 16:19 kilosort.run_kilosort INFO     Computing drift correction.
07-25 16:19 kilosort.run_kilosort INFO     ----------------------------------------
07-25 16:19 kilosort.spikedetect INFO     Re-computing universal templates from data.
07-25 17:43 kilosort.run_kilosort INFO     drift computed in  5078.85s; total  5088.20s
07-25 17:43 kilosort.run_kilosort DEBUG    st shape: (43130834, 6)
07-25 17:43 kilosort.run_kilosort DEBUG    yblk shape: (1,)
07-25 17:43 kilosort.run_kilosort DEBUG    dshift shape: (4274, 1)
07-25 17:43 kilosort.run_kilosort DEBUG    iKxx shape: torch.Size([383, 383])
07-25 17:43 kilosort.gui.sorter DEBUG    First batch min, max: (-36.98598, 59.449707)
07-25 18:13 kilosort.io  INFO      
07-25 18:13 kilosort.io  INFO     ========================================
07-25 18:13 kilosort.io  INFO     Saving drift-corrected copy of data to: Y:\Users\hsrivastava\BS\W5006\kilosort4\temp_wh.dat...
07-25 18:13 kilosort.io  INFO     Writing batch 0/4274...
07-25 18:16 kilosort.io  INFO     Writing batch 100/4274...
07-25 18:19 kilosort.io  INFO     Writing batch 200/4274...
07-25 18:23 kilosort.io  INFO     Writing batch 300/4274...
07-25 18:26 kilosort.io  INFO     Writing batch 400/4274...
07-25 18:29 kilosort.io  INFO     Writing batch 500/4274...
07-25 18:33 kilosort.io  INFO     Writing batch 600/4274...
07-25 18:36 kilosort.io  INFO     Writing batch 700/4274...
07-25 18:40 kilosort.io  INFO     Writing batch 800/4274...
07-25 18:43 kilosort.io  INFO     Writing batch 900/4274...
07-25 18:47 kilosort.io  INFO     Writing batch 1000/4274...
07-25 18:50 kilosort.io  INFO     Writing batch 1100/4274...
07-25 18:53 kilosort.io  INFO     Writing batch 1200/4274...
07-25 18:57 kilosort.io  INFO     Writing batch 1300/4274...
07-25 19:00 kilosort.io  INFO     Writing batch 1400/4274...
07-25 19:04 kilosort.io  INFO     Writing batch 1500/4274...
07-25 19:07 kilosort.io  INFO     Writing batch 1600/4274...
07-25 19:11 kilosort.io  INFO     Writing batch 1700/4274...
07-25 19:14 kilosort.io  INFO     Writing batch 1800/4274...
07-25 19:18 kilosort.io  INFO     Writing batch 1900/4274...
07-25 19:21 kilosort.io  INFO     Writing batch 2000/4274...
07-25 19:25 kilosort.io  INFO     Writing batch 2100/4274...
07-25 19:28 kilosort.io  INFO     Writing batch 2200/4274...
07-25 19:32 kilosort.io  INFO     Writing batch 2300/4274...
07-25 19:35 kilosort.io  INFO     Writing batch 2400/4274...
07-25 19:39 kilosort.io  INFO     Writing batch 2500/4274...
07-25 19:42 kilosort.io  INFO     Writing batch 2600/4274...
07-25 19:46 kilosort.io  INFO     Writing batch 2700/4274...
07-25 19:49 kilosort.io  INFO     Writing batch 2800/4274...
07-25 19:53 kilosort.io  INFO     Writing batch 2900/4274...
07-25 19:57 kilosort.io  INFO     Writing batch 3000/4274...
07-25 20:00 kilosort.io  INFO     Writing batch 3100/4274...
07-25 20:04 kilosort.io  INFO     Writing batch 3200/4274...
07-25 20:07 kilosort.io  INFO     Writing batch 3300/4274...
07-25 20:11 kilosort.io  INFO     Writing batch 3400/4274...
07-25 20:14 kilosort.io  INFO     Writing batch 3500/4274...
07-25 20:18 kilosort.io  INFO     Writing batch 3600/4274...
07-25 20:22 kilosort.io  INFO     Writing batch 3700/4274...
07-25 20:25 kilosort.io  INFO     Writing batch 3800/4274...
07-25 20:29 kilosort.io  INFO     Writing batch 3900/4274...
07-25 20:33 kilosort.io  INFO     Writing batch 4000/4274...
07-25 20:36 kilosort.io  INFO     Writing batch 4100/4274...
07-25 20:40 kilosort.io  INFO     Writing batch 4200/4274...
07-25 20:42 kilosort.io  INFO     ========================================
07-25 20:42 kilosort.io  INFO     Copying finished.
07-25 20:42 kilosort.io  INFO      
07-25 20:42 kilosort.run_kilosort INFO      
07-25 20:42 kilosort.run_kilosort INFO     Extracting spikes using templates
07-25 20:42 kilosort.run_kilosort INFO     ----------------------------------------
07-25 20:42 kilosort.spikedetect INFO     Re-computing universal templates from data.
07-25 23:00 kilosort.run_kilosort INFO     40724351 spikes extracted in  8260.94s; total  24100.91s
07-25 23:00 kilosort.run_kilosort DEBUG    st0 shape: (40724351, 6)
07-25 23:00 kilosort.run_kilosort DEBUG    tF shape: torch.Size([40724351, 10, 6])
07-25 23:00 kilosort.run_kilosort INFO      
07-25 23:00 kilosort.run_kilosort INFO     First clustering
07-25 23:00 kilosort.run_kilosort INFO     ----------------------------------------
07-26 00:43 kilosort.run_kilosort INFO     1916 clusters found, in  6176.53s; total  30277.48s
07-26 00:43 kilosort.run_kilosort DEBUG    clu shape: (40724351,)
07-26 00:43 kilosort.run_kilosort DEBUG    Wall shape: torch.Size([1916, 383, 6])
07-26 00:43 kilosort.run_kilosort INFO      
07-26 00:43 kilosort.run_kilosort INFO     Extracting spikes using cluster waveforms
07-26 00:43 kilosort.run_kilosort INFO     ----------------------------------------
07-26 03:46 kilosort.run_kilosort INFO     110152832 spikes extracted in  10996.00s; total  41273.50s
07-26 03:46 kilosort.run_kilosort DEBUG    st shape: (110152832, 3)
07-26 03:46 kilosort.run_kilosort DEBUG    tF shape: torch.Size([110152832, 10, 6])
07-26 03:46 kilosort.run_kilosort DEBUG    iCC shape: torch.Size([10, 383])
07-26 03:46 kilosort.run_kilosort DEBUG    iU shape: torch.Size([1558])
07-26 03:46 kilosort.run_kilosort INFO      
07-26 03:46 kilosort.run_kilosort INFO     Final clustering
07-26 03:46 kilosort.run_kilosort INFO     ----------------------------------------
07-26 04:05 kilosort.gui.sorter ERROR    Encountered error in `run_kilosort`:
Traceback (most recent call last):
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\gui\sorter.py", line 124, in run
    clu, Wall = cluster_spikes(
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\run_kilosort.py", line 563, in cluster_spikes
    clu, Wall = clustering_qr.run(ops, st, tF,  mode = 'template', device=device,
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 364, in run
    iclust, iclust0, M, iclust_init = cluster(Xd, nskip=nskip, lam=1,
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 151, in cluster
    iclust = assign_iclust(rows_neigh, isub, kn, tones2, nclust, lam, m, ki, kj, device=device)
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 81, in assign_iclust
    xN = xN - lam/m * (ki.unsqueeze(-1) * kN.to_dense())
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.61 GiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 4.26 GiB is allocated by PyTorch, and 814.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Version information:

Kilosort v4.0.13
GPU: Nvidia GeForce RTX 3070

jacobpennington · 2024-07-26T23:31:22Z

Are you sure there weren't any other GPU-intensive processes running at the time? Based on that error message, this happened because around 4 gb of video memory was reserved for something else. I would try restarting the machine and sorting again as a first step, if you haven't done that yet.

Lathomas42 · 2024-07-29T17:23:55Z

I also have this happen to me regularly. I have tried many different cuda versions and nvidia driver versions. Happy to provide any files you need, however the main file this happens on is a 70 gb file.

hemant22 · 2024-08-02T19:17:48Z

@jacobpennington I tried again after restarting the computer.. but it again stopped with the same error 'cuda out of memory'.
Another session stopped at 'Saving to phy and computing refractory periods' with the same error. I have attached screenshot of GPU usage before and after launching kilosort gui
@Lathomas42 Thanks for your help. Can you please share more details and the drivers that can help me

hemant22 · 2024-08-02T19:23:29Z

my .bin file size is about 200 gb

jacobpennington · 2024-08-04T17:09:51Z

@hemant22 I still don't see anything to indicate that Kilosort is causing this, especially if you're getting the error at different points in the pipeline.

This error message:
CUDA out of memory. Tried to allocate 1.61 GiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 4.26 GiB is allocated by PyTorch, and 814.23 MiB is reserved by PyTorch but unallocated.

Is saying: Kilosort is using currently ~4.3 GB of video memory. It tried to allocate an additional 1.6 GB, but couldn't do that because there was no more video memory available. The only reason that would happen is if something else is running on your machine that is using up that memory, or otherwise preventing pytorch from making use of it.

Windows task manager is also not a reliable way to gauge memory usage for pytorch. A better way to check is using the nvidia-smi command in a terminal / powershell.

hemant22 · 2024-08-04T18:53:50Z

@jacobpennington No one else is running anything on the machine that might be using the memory. That's for sure. The error happens mostly at this line: - "vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)".
I will run it again while monitoring the memory usage via nvidia-smi command.

Should I try this that is suggested with the error:
" If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. "

jacobpennington · 2024-08-06T18:38:27Z

Okay. Can you please also try sorting a subset of the data, say with tmax = 1800 (first 30 mins of data)? The number of spikes you're detecting seems much larger than expected for that size of recording, which might be why you're seeing this issue for this recording and not others. If you can sort a subset, that might reveal if there are some strange units or artifacts in the results that could be causing issues.

hemant22 · 2024-08-07T02:35:41Z

@jacobpennington I tried running with tmax=1800. It ran successfully. So what should I do/check next to figure out the problem?

Copied below is the log file :
08-06 16:50 kilosort.run_kilosort 08-06 16:50 kilosort.run_kilosort INFO 08-06 16:50 kilosort.run_kilosort INFO 08-06 16:50 kilosort.run_kilosort INFO 08-06 16:50 kilosort.run_kilosort DEBUG 08-06 16:50 kilosort.run_kilosort DEBUG 08-06 16:50 kilosort.run_kilosort 08-06 16:50 kilosort.run_kilosort INFO 08-06 16:50 kilosort.run_kilosort INFO 08-06 16:50 kilosort.spikedetect INFO 08-06 18:00 kilosort.run_kilosort INFO 08-06 18:00 kilosort.run_kilosort DEBUG 08-06 18:00 kilosort.run_kilosort DEBUG 08-06 18:00 kilosort.run_kilosort DEBUG 08-06 18:00 kilosort.run_kilosort DEBUG 08-06 18:00 kilosort.gui.sorter DEBUG 08-06 18:00 kilosort.run_kilosort 08-06 18:00 kilosort.run_kilosort INFO 08-06 18:00 kilosort.run_kilosort INFO 08-06 18:00 kilosort.spikedetect INFO 08-06 19:04 kilosort.run_kilosort INFO 08-06 19:04 kilosort.run_kilosort DEBUG 08-06 19:04 kilosort.run_kilosort DEBUG 08-06 19:04 kilosort.run_kilosort 08-06 19:04 kilosort.run_kilosort INFO 08-06 19:04 kilosort.run_kilosort INFO 08-06 19:07 kilosort.run_kilosort INFO 08-06 19:07 kilosort.run_kilosort DEBUG 08-06 19:07 kilosort.run_kilosort DEBUG 08-06 19:07 kilosort.run_kilosort 08-06 19:07 kilosort.run_kilosort INFO 08-06 19:07 kilosort.run_kilosort INFO 08-06 19:25 kilosort.run_kilosort INFO 08-06 19:25 kilosort.run_kilosort DEBUG 08-06 19:25 kilosort.run_kilosort DEBUG 08-06 19:25 kilosort.run_kilosort DEBUG 08-06 19:25 kilosort.run_kilosort DEBUG 08-06 19:25 kilosort.run_kilosort 08-06 19:25 kilosort.run_kilosort INFO 08-06 19:25 kilosort.run_kilosort INFO 08-06 19:37 kilosort.run_kilosort INFO 08-06 19:37 kilosort.run_kilosort DEBUG 08-06 19:37 kilosort.run_kilosort DEBUG 08-06 19:37 kilosort.run_kilosort 08-06 19:37 kilosort.run_kilosort INFO 08-06 19:37 kilosort.run_kilosort INFO 08-06 19:38 kilosort.run_kilosort INFO 08-06 19:38 kilosort.run_kilosort DEBUG 08-06 19:38 kilosort.run_kilosort DEBUG 08-06 19:38 kilosort.run_kilosort 08-06 19:38 kilosort.run_kilosort INFO 08-06 19:38 kilosort.run_kilosort INFO 08-06 19:49 kilosort.run_kilosort INFO 08-06 19:49 kilosort.run_kilosort INFO 08-06 19:49 kilosort.run_kilosort INFO INFO
Computing preprocessing variables.
----------------------------------------
Preprocessing filters computed in 2.04s; total 2.04s
hp_filter shape: torch.Size([30122])
whiten_mat shape: torch.Size([383, 383])
INFO
Computing drift correction.
----------------------------------------
Re-computing universal templates from data.
drift computed in 4227.40s; total 4229.44s
st shape: (10527619, 6)
yblk shape: (1,)
dshift shape: (900, 1)
iKxx shape: torch.Size([383, 383])
First batch min, max: (-25.055134, 38.15639)
INFO
Extracting spikes using templates
----------------------------------------
Re-computing universal templates from data.
9929708 spikes extracted in 3801.92s; total 8032.19s
st0 shape: (9929708, 6)
tF shape: torch.Size([9929708, 10, 6])
INFO
First clustering
----------------------------------------
1533 clusters found, in 233.51s; total 8265.72s
clu shape: (9929708,)
Wall shape: torch.Size([1533, 383, 6])
INFO
Extracting spikes using cluster waveforms
----------------------------------------
25892920 spikes extracted in 1065.07s; total 9330.80s
st shape: (25892920, 3)
tF shape: torch.Size([25892920, 10, 6])
iCC shape: torch.Size([10, 383])
iU shape: torch.Size([1200])
INFO
Final clustering
----------------------------------------
1111 clusters found, in 710.35s; total 10041.17s
clu shape: (25892920,)
Wall shape: torch.Size([1111, 383, 6])
INFO
Merging clusters
----------------------------------------
956 units found, in 43.23s; total 10084.42s
clu shape: (25892920,)
Wall shape: torch.Size([956, 383, 6])
INFO
Saving to phy and computing refractory periods
----------------------------------------
417 units found with good refractory periods
Total runtime: 10774.82s = 02:59:35 h:m:s
Sorting output saved in: Z:\Users\Kyunghee\CN\Ephys\W5006\20240723\Loc1\Response\W5006_20240723_Loc1_g0\W5006_20240723_Loc1_g0_imec0\kilosort4.

jacobpennington · 2024-08-07T14:30:07Z

@hemant22 Can you open the results in Phy and check if anything looks off with the waveforms or anything else? Screenshots from that would be helpful.

hemant22 · 2024-08-07T22:29:25Z

@jacobpennington I checked KS output (for tmax=1800) in phy. I didn't find anything that is strange or different from the other sessions. Additionally, I was able to run full session on KS3 without any problems; so the raw data looks fine to me.

jacobpennington · 2024-08-07T23:11:34Z

Okay, thanks. If you're comfortable modifying the code, can you please try the change in this pull request and see if you're able to sort the full recording? It just adds a couple lines to one file.
https://github.com/MouseLand/Kilosort/pull/758/files

jacobpennington · 2024-08-11T03:08:30Z

Update to the previous comment: you no longer need to modify the code to try that. You can update to the latest version (v4.0.15) and use clear_cache=True.

hemant22 · 2024-08-12T15:28:49Z

@jacobpennington Thank you for your help. It is working now.
@Lathomas42 Thanks a lot.

Hobart10 · 2024-08-20T13:57:55Z

Hi Jacob, @jacobpennington
I used clear_cache=True with v4.0.15, but I'm still encountering the error torch.OutOfMemoryError: CUDA out of memory error. This occurs immediately after the first clustering. Could you examine the error and provide any idea of how to fix? Thank you!!

SpikeSortingError: Spike sorting error trace:
Traceback (most recent call last):
  File "C:\Users\Lenovo\.conda\envs\SI\spikeinterface\src\spikeinterface\sorters\basesorter.py", line 261, in run_from_folder
    SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)
  File "C:\Users\Lenovo\.conda\envs\SI\spikeinterface\src\spikeinterface\sorters\external\kilosort4.py", line 273, in _run_from_folder
    st, tF, _, _ = detect_spikes(ops, device, bfile, tic0=tic0, progress_bar=progress_bar)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\kilosort\run_kilosort.py", line 611, in detect_spikes
    st, tF, ops = template_matching.extract(ops, bfile, Wall3, device=device,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\kilosort\template_matching.py", line 26, in extract
    ctc = prepare_matching(ops, U)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\kilosort\template_matching.py", line 108, in prepare_matching
    ctc = torch.einsum('ijkm, kml -> ijl', UtU, WtW)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\torch\functional.py", line 380, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 53.53 GiB. GPU 0 has a total capacity of 10.00 GiB of which 0 bytes is free. Of the allocated memory 31.35 GiB is allocated by PyTorch, and 43.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

jacobpennington · 2024-08-20T18:01:26Z

@Hobart10 Can you please provide a screenshot of what the KS4 GUI looks like when you load your data, and the kilosort4.log file from the results directory? The fact it's trying to allocate ~54 GB of video memory indicates there's some other problem causing this.

Hobart10 · 2024-08-21T04:22:41Z

Just found out it works in kilosort directly but not through spikeinterface in my case. Will consult there. Thank you!!

RobertoDF · 2024-09-03T15:04:56Z

I still get OOM from this line (vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)) even if clear_cache=True.

jacobpennington · 2024-09-03T22:31:05Z

@RobertoDF Is that still the case? Just checking since you closed your pull requests.

EmmettJT · 2025-02-10T10:56:08Z

Hi!
We are also getting a CUDA memory error, unfortunately none of the suggestions above or in the other related threads are solving it.

We have tried clearing the gpu cache, using the qr.kmeansplusplus version and even tried using an older version of KS but it always runs out of memory at the final clustering stage. We get the same error if we try on two different machines and when running on a hpc cluster - on the hpc, we used a GPU with ~18gb memory.
The datasets which give us issues are large (~300gb) but we have successfully sorted larger datasets without issue. From looking at the data I don't see any obvious issues which might cause ks to fail (eg. the data is not noisy).

Do you have any other suggestions? Since we seem to need more memory, is it currently possible to run a single instance of kilosort4 across multiple GPUs at the same time?

RobertoDF · 2025-02-10T11:00:41Z

Is the problem arising specifically at vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)? If you used my pull request at which line does it happen?

EmmettJT · 2025-02-10T16:47:12Z

yep, without your pull request error occurs at vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)

when using it error is either at line 215: mu[j] = Xg[ix].mean(0) or sometimes at line 171 vtot = (Xg**2).sum(1)

Seems to be very similar to the issues Peyton-D mentioned here: #775

jacobpennington · 2025-02-12T21:36:03Z

Thanks! Working on a solution this week.

Lathomas42 mentioned this issue Aug 7, 2024

Fixed bug where cuda reserved memory climbs throughout process while allocated memory stays low #758

Closed

Hobart10 mentioned this issue Aug 21, 2024

CUDA out of memory error from running kilosort SpikeInterface/spikeinterface#3321

Open

This was referenced Sep 3, 2024

clustering_qr.kmeans_plusplus explicit tensors deletion #773

Closed

clustering_qr.kmeans_plusplus explicit tensors deletion #774

Closed

RobertoDF mentioned this issue Sep 4, 2024

Improve memory management in clustering_qr.kmeans_plusplus #775

Open

jacobpennington closed this as completed Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: <CUDA out of memory> #746

BUG: <CUDA out of memory> #746

hemant22 commented Jul 26, 2024

jacobpennington commented Jul 26, 2024 •

edited

Loading

Lathomas42 commented Jul 29, 2024

hemant22 commented Aug 2, 2024 •

edited

Loading

hemant22 commented Aug 2, 2024

jacobpennington commented Aug 4, 2024

hemant22 commented Aug 4, 2024

jacobpennington commented Aug 6, 2024

hemant22 commented Aug 7, 2024 •

edited

Loading

jacobpennington commented Aug 7, 2024

hemant22 commented Aug 7, 2024 •

edited

Loading

jacobpennington commented Aug 7, 2024

jacobpennington commented Aug 11, 2024

hemant22 commented Aug 12, 2024

Hobart10 commented Aug 20, 2024

jacobpennington commented Aug 20, 2024

Hobart10 commented Aug 21, 2024

RobertoDF commented Sep 3, 2024

jacobpennington commented Sep 3, 2024

EmmettJT commented Feb 10, 2025

RobertoDF commented Feb 10, 2025

EmmettJT commented Feb 10, 2025

jacobpennington commented Feb 12, 2025

BUG: <CUDA out of memory> #746

BUG: <CUDA out of memory> #746

Comments

hemant22 commented Jul 26, 2024

Describe the issue:

Reproduce the bug:

Error message:

Version information:

jacobpennington commented Jul 26, 2024 • edited Loading

Lathomas42 commented Jul 29, 2024

hemant22 commented Aug 2, 2024 • edited Loading

hemant22 commented Aug 2, 2024

jacobpennington commented Aug 4, 2024

hemant22 commented Aug 4, 2024

jacobpennington commented Aug 6, 2024

hemant22 commented Aug 7, 2024 • edited Loading

jacobpennington commented Aug 7, 2024

hemant22 commented Aug 7, 2024 • edited Loading

jacobpennington commented Aug 7, 2024

jacobpennington commented Aug 11, 2024

hemant22 commented Aug 12, 2024

Hobart10 commented Aug 20, 2024

jacobpennington commented Aug 20, 2024

Hobart10 commented Aug 21, 2024

RobertoDF commented Sep 3, 2024

jacobpennington commented Sep 3, 2024

EmmettJT commented Feb 10, 2025

RobertoDF commented Feb 10, 2025

EmmettJT commented Feb 10, 2025

jacobpennington commented Feb 12, 2025

jacobpennington commented Jul 26, 2024 •

edited

Loading

hemant22 commented Aug 2, 2024 •

edited

Loading

hemant22 commented Aug 7, 2024 •

edited

Loading

hemant22 commented Aug 7, 2024 •

edited

Loading