Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: <CUDA out of memory> #746

Closed
hemant22 opened this issue Jul 26, 2024 · 22 comments
Closed

BUG: <CUDA out of memory> #746

hemant22 opened this issue Jul 26, 2024 · 22 comments

Comments

@hemant22
Copy link

Describe the issue:

Kilosort shows error during the Final clustering step. This happened thrice but with the same session (data). KS works fine on other sessions.

Reproduce the bug:

No response

Error message:

07-25 16:18 kilosort.gui.sorter INFO     Kilosort version 4.0.13
07-25 16:18 kilosort.gui.sorter INFO     Sorting Y:\Users\hsrivastava\BS\W5006\W5006_20240723_Loc1_g0_t0.imec0.ap.bin
07-25 16:18 kilosort.gui.sorter INFO     ----------------------------------------
07-25 16:18 kilosort.gui.sorter DEBUG    Initial ops:
{   'data_file_path': WindowsPath('Y:/Users/hsrivastava/BS/W5006/W5006_20240723_Loc1_g0_t0.imec0.ap.bin'),
    'results_dir': WindowsPath('Y:/Users/hsrivastava/BS/W5006/kilosort4'),
    'probe_name': 'neuropixPhase3B1_kilosortChanMap.mat',
    'data_dtype': 'int16',
    'n_chan_bin': 385,
    'fs': 30000.0,
    'batch_size': 60000,
    'nblocks': 1,
    'Th_universal': 9.0,
    'Th_learned': 8.0,
    'tmin': 0.0,
    'tmax': inf,
    'nt': 61,
    'shift': None,
    'scale': None,
    'artifact_threshold': inf,
    'nskip': 25,
    'whitening_range': 32,
    'highpass_cutoff': 300.0,
    'binning_depth': 5.0,
    'sig_interp': 20.0,
    'drift_smoothing': [0.5, 0.5, 0.5],
    'nt0min': 20,
    'dmin': None,
    'dminx': 32.0,
    'min_template_size': 10.0,
    'template_sizes': 5,
    'nearest_chans': 10,
    'nearest_templates': 100,
    'max_channel_distance': None,
    'templates_from_data': True,
    'n_templates': 6,
    'n_pcs': 6,
    'Th_single_ch': 6.0,
    'acg_threshold': 0.2,
    'ccg_threshold': 0.25,
    'cluster_downsampling': 20,
    'x_centers': None,
    'duplicate_spike_ms': 0.25,
    'save_preprocessed_copy': True,
    'data_dir': WindowsPath('Y:/Users/hsrivastava/BS/W5006'),
    'filename': WindowsPath('Y:/Users/hsrivastava/BS/W5006/W5006_20240723_Loc1_g0_t0.imec0.ap.bin'),
    'do_CAR': True,
    'invert_sign': False,
    'NTbuff': 60122,
    'Nchan': 383,
    'duplicate_spike_bins': 7,
    'torch_device': 'cuda',
    'xc': array([43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59.,
       27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27.,
       43., 11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43.,
       11., 59., 27., 43., 11., 59., 27., 43., 11., 59., 27., 43., 11.,
       59., 27., 43., 11., 59., 27.], dtype=float32),
    'yc': array([  20.,   20.,   40.,   40.,   60.,   60.,   80.,   80.,  100.,
        100.,  120.,  120.,  140.,  140.,  160.,  160.,  180.,  180.,
        200.,  200.,  220.,  220.,  240.,  240.,  260.,  260.,  280.,
        280.,  300.,  300.,  320.,  320.,  340.,  340.,  360.,  360.,
        380.,  380.,  400.,  400.,  420.,  420.,  440.,  440.,  460.,
        460.,  480.,  480.,  500.,  500.,  520.,  520.,  540.,  540.,
        560.,  560.,  580.,  580.,  600.,  600.,  620.,  620.,  640.,
        640.,  660.,  660.,  680.,  680.,  700.,  700.,  720.,  720.,
        740.,  740.,  760.,  760.,  780.,  780.,  800.,  800.,  820.,
        820.,  840.,  840.,  860.,  860.,  880.,  880.,  900.,  900.,
        920.,  920.,  940.,  940.,  960.,  960.,  980.,  980., 1000.,
       1000., 1020., 1020., 1040., 1040., 1060., 1060., 1080., 1080.,
       1100., 1100., 1120., 1120., 1140., 1140., 1160., 1160., 1180.,
       1180., 1200., 1200., 1220., 1220., 1240., 1240., 1260., 1260.,
       1280., 1280., 1300., 1300., 1320., 1320., 1340., 1340., 1360.,
       1360., 1380., 1380., 1400., 1400., 1420., 1420., 1440., 1440.,
       1460., 1460., 1480., 1480., 1500., 1500., 1520., 1520., 1540.,
       1540., 1560., 1560., 1580., 1580., 1600., 1600., 1620., 1620.,
       1640., 1640., 1660., 1660., 1680., 1680., 1700., 1700., 1720.,
       1720., 1740., 1740., 1760., 1760., 1780., 1780., 1800., 1800.,
       1820., 1820., 1840., 1840., 1860., 1860., 1880., 1880., 1900.,
       1900., 1920., 1940., 1940., 1960., 1960., 1980., 1980., 2000.,
       2000., 2020., 2020., 2040., 2040., 2060., 2060., 2080., 2080.,
       2100., 2100., 2120., 2120., 2140., 2140., 2160., 2160., 2180.,
       2180., 2200., 2200., 2220., 2220., 2240., 2240., 2260., 2260.,
       2280., 2280., 2300., 2300., 2320., 2320., 2340., 2340., 2360.,
       2360., 2380., 2380., 2400., 2400., 2420., 2420., 2440., 2440.,
       2460., 2460., 2480., 2480., 2500., 2500., 2520., 2520., 2540.,
       2540., 2560., 2560., 2580., 2580., 2600., 2600., 2620., 2620.,
       2640., 2640., 2660., 2660., 2680., 2680., 2700., 2700., 2720.,
       2720., 2740., 2740., 2760., 2760., 2780., 2780., 2800., 2800.,
       2820., 2820., 2840., 2840., 2860., 2860., 2880., 2880., 2900.,
       2900., 2920., 2920., 2940., 2940., 2960., 2960., 2980., 2980.,
       3000., 3000., 3020., 3020., 3040., 3040., 3060., 3060., 3080.,
       3080., 3100., 3100., 3120., 3120., 3140., 3140., 3160., 3160.,
       3180., 3180., 3200., 3200., 3220., 3220., 3240., 3240., 3260.,
       3260., 3280., 3280., 3300., 3300., 3320., 3320., 3340., 3340.,
       3360., 3360., 3380., 3380., 3400., 3400., 3420., 3420., 3440.,
       3440., 3460., 3460., 3480., 3480., 3500., 3500., 3520., 3520.,
       3540., 3540., 3560., 3560., 3580., 3580., 3600., 3600., 3620.,
       3620., 3640., 3640., 3660., 3660., 3680., 3680., 3700., 3700.,
       3720., 3720., 3740., 3740., 3760., 3760., 3780., 3780., 3800.,
       3800., 3820., 3820., 3840., 3840.], dtype=float32),
    'kcoords': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32),
    'chanMap': array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
       182, 183, 184, 185, 186, 187, 188, 189, 190, 192, 193, 194, 195,
       196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208,
       209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,
       222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234,
       235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247,
       248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260,
       261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273,
       274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286,
       287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
       300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312,
       313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325,
       326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338,
       339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351,
       352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364,
       365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377,
       378, 379, 380, 381, 382, 383]),
    'n_chan': 384}

07-25 16:18 kilosort.run_kilosort INFO      
07-25 16:18 kilosort.run_kilosort INFO     Computing preprocessing variables.
07-25 16:18 kilosort.run_kilosort INFO     ----------------------------------------
07-25 16:18 kilosort.run_kilosort INFO     N samples: 256380588
07-25 16:18 kilosort.run_kilosort INFO     N seconds: 8546.0196
07-25 16:18 kilosort.run_kilosort INFO     N batches: 4274
07-25 16:19 kilosort.run_kilosort INFO     Preprocessing filters computed in  9.34s; total  9.35s
07-25 16:19 kilosort.run_kilosort DEBUG    hp_filter shape: torch.Size([30122])
07-25 16:19 kilosort.run_kilosort DEBUG    whiten_mat shape: torch.Size([383, 383])
07-25 16:19 kilosort.run_kilosort INFO      
07-25 16:19 kilosort.run_kilosort INFO     Computing drift correction.
07-25 16:19 kilosort.run_kilosort INFO     ----------------------------------------
07-25 16:19 kilosort.spikedetect INFO     Re-computing universal templates from data.
07-25 17:43 kilosort.run_kilosort INFO     drift computed in  5078.85s; total  5088.20s
07-25 17:43 kilosort.run_kilosort DEBUG    st shape: (43130834, 6)
07-25 17:43 kilosort.run_kilosort DEBUG    yblk shape: (1,)
07-25 17:43 kilosort.run_kilosort DEBUG    dshift shape: (4274, 1)
07-25 17:43 kilosort.run_kilosort DEBUG    iKxx shape: torch.Size([383, 383])
07-25 17:43 kilosort.gui.sorter DEBUG    First batch min, max: (-36.98598, 59.449707)
07-25 18:13 kilosort.io  INFO      
07-25 18:13 kilosort.io  INFO     ========================================
07-25 18:13 kilosort.io  INFO     Saving drift-corrected copy of data to: Y:\Users\hsrivastava\BS\W5006\kilosort4\temp_wh.dat...
07-25 18:13 kilosort.io  INFO     Writing batch 0/4274...
07-25 18:16 kilosort.io  INFO     Writing batch 100/4274...
07-25 18:19 kilosort.io  INFO     Writing batch 200/4274...
07-25 18:23 kilosort.io  INFO     Writing batch 300/4274...
07-25 18:26 kilosort.io  INFO     Writing batch 400/4274...
07-25 18:29 kilosort.io  INFO     Writing batch 500/4274...
07-25 18:33 kilosort.io  INFO     Writing batch 600/4274...
07-25 18:36 kilosort.io  INFO     Writing batch 700/4274...
07-25 18:40 kilosort.io  INFO     Writing batch 800/4274...
07-25 18:43 kilosort.io  INFO     Writing batch 900/4274...
07-25 18:47 kilosort.io  INFO     Writing batch 1000/4274...
07-25 18:50 kilosort.io  INFO     Writing batch 1100/4274...
07-25 18:53 kilosort.io  INFO     Writing batch 1200/4274...
07-25 18:57 kilosort.io  INFO     Writing batch 1300/4274...
07-25 19:00 kilosort.io  INFO     Writing batch 1400/4274...
07-25 19:04 kilosort.io  INFO     Writing batch 1500/4274...
07-25 19:07 kilosort.io  INFO     Writing batch 1600/4274...
07-25 19:11 kilosort.io  INFO     Writing batch 1700/4274...
07-25 19:14 kilosort.io  INFO     Writing batch 1800/4274...
07-25 19:18 kilosort.io  INFO     Writing batch 1900/4274...
07-25 19:21 kilosort.io  INFO     Writing batch 2000/4274...
07-25 19:25 kilosort.io  INFO     Writing batch 2100/4274...
07-25 19:28 kilosort.io  INFO     Writing batch 2200/4274...
07-25 19:32 kilosort.io  INFO     Writing batch 2300/4274...
07-25 19:35 kilosort.io  INFO     Writing batch 2400/4274...
07-25 19:39 kilosort.io  INFO     Writing batch 2500/4274...
07-25 19:42 kilosort.io  INFO     Writing batch 2600/4274...
07-25 19:46 kilosort.io  INFO     Writing batch 2700/4274...
07-25 19:49 kilosort.io  INFO     Writing batch 2800/4274...
07-25 19:53 kilosort.io  INFO     Writing batch 2900/4274...
07-25 19:57 kilosort.io  INFO     Writing batch 3000/4274...
07-25 20:00 kilosort.io  INFO     Writing batch 3100/4274...
07-25 20:04 kilosort.io  INFO     Writing batch 3200/4274...
07-25 20:07 kilosort.io  INFO     Writing batch 3300/4274...
07-25 20:11 kilosort.io  INFO     Writing batch 3400/4274...
07-25 20:14 kilosort.io  INFO     Writing batch 3500/4274...
07-25 20:18 kilosort.io  INFO     Writing batch 3600/4274...
07-25 20:22 kilosort.io  INFO     Writing batch 3700/4274...
07-25 20:25 kilosort.io  INFO     Writing batch 3800/4274...
07-25 20:29 kilosort.io  INFO     Writing batch 3900/4274...
07-25 20:33 kilosort.io  INFO     Writing batch 4000/4274...
07-25 20:36 kilosort.io  INFO     Writing batch 4100/4274...
07-25 20:40 kilosort.io  INFO     Writing batch 4200/4274...
07-25 20:42 kilosort.io  INFO     ========================================
07-25 20:42 kilosort.io  INFO     Copying finished.
07-25 20:42 kilosort.io  INFO      
07-25 20:42 kilosort.run_kilosort INFO      
07-25 20:42 kilosort.run_kilosort INFO     Extracting spikes using templates
07-25 20:42 kilosort.run_kilosort INFO     ----------------------------------------
07-25 20:42 kilosort.spikedetect INFO     Re-computing universal templates from data.
07-25 23:00 kilosort.run_kilosort INFO     40724351 spikes extracted in  8260.94s; total  24100.91s
07-25 23:00 kilosort.run_kilosort DEBUG    st0 shape: (40724351, 6)
07-25 23:00 kilosort.run_kilosort DEBUG    tF shape: torch.Size([40724351, 10, 6])
07-25 23:00 kilosort.run_kilosort INFO      
07-25 23:00 kilosort.run_kilosort INFO     First clustering
07-25 23:00 kilosort.run_kilosort INFO     ----------------------------------------
07-26 00:43 kilosort.run_kilosort INFO     1916 clusters found, in  6176.53s; total  30277.48s
07-26 00:43 kilosort.run_kilosort DEBUG    clu shape: (40724351,)
07-26 00:43 kilosort.run_kilosort DEBUG    Wall shape: torch.Size([1916, 383, 6])
07-26 00:43 kilosort.run_kilosort INFO      
07-26 00:43 kilosort.run_kilosort INFO     Extracting spikes using cluster waveforms
07-26 00:43 kilosort.run_kilosort INFO     ----------------------------------------
07-26 03:46 kilosort.run_kilosort INFO     110152832 spikes extracted in  10996.00s; total  41273.50s
07-26 03:46 kilosort.run_kilosort DEBUG    st shape: (110152832, 3)
07-26 03:46 kilosort.run_kilosort DEBUG    tF shape: torch.Size([110152832, 10, 6])
07-26 03:46 kilosort.run_kilosort DEBUG    iCC shape: torch.Size([10, 383])
07-26 03:46 kilosort.run_kilosort DEBUG    iU shape: torch.Size([1558])
07-26 03:46 kilosort.run_kilosort INFO      
07-26 03:46 kilosort.run_kilosort INFO     Final clustering
07-26 03:46 kilosort.run_kilosort INFO     ----------------------------------------
07-26 04:05 kilosort.gui.sorter ERROR    Encountered error in `run_kilosort`:
Traceback (most recent call last):
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\gui\sorter.py", line 124, in run
    clu, Wall = cluster_spikes(
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\run_kilosort.py", line 563, in cluster_spikes
    clu, Wall = clustering_qr.run(ops, st, tF,  mode = 'template', device=device,
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 364, in run
    iclust, iclust0, M, iclust_init = cluster(Xd, nskip=nskip, lam=1,
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 151, in cluster
    iclust = assign_iclust(rows_neigh, isub, kn, tones2, nclust, lam, m, ki, kj, device=device)
  File "C:\Users\Baylor Medicine\anaconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 81, in assign_iclust
    xN = xN - lam/m * (ki.unsqueeze(-1) * kN.to_dense())
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.61 GiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 4.26 GiB is allocated by PyTorch, and 814.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Version information:

Kilosort v4.0.13
GPU: Nvidia GeForce RTX 3070

@jacobpennington
Copy link
Collaborator

jacobpennington commented Jul 26, 2024

Are you sure there weren't any other GPU-intensive processes running at the time? Based on that error message, this happened because around 4 gb of video memory was reserved for something else. I would try restarting the machine and sorting again as a first step, if you haven't done that yet.

@Lathomas42
Copy link

I also have this happen to me regularly. I have tried many different cuda versions and nvidia driver versions. Happy to provide any files you need, however the main file this happens on is a 70 gb file.

@hemant22
Copy link
Author

hemant22 commented Aug 2, 2024

gpu_usage_after_loading_data
gpu_usage_before_launching_kilosort_gui
@jacobpennington I tried again after restarting the computer.. but it again stopped with the same error 'cuda out of memory'.
Another session stopped at 'Saving to phy and computing refractory periods' with the same error. I have attached screenshot of GPU usage before and after launching kilosort gui
@Lathomas42 Thanks for your help. Can you please share more details and the drivers that can help me

@hemant22
Copy link
Author

hemant22 commented Aug 2, 2024

my .bin file size is about 200 gb

@jacobpennington
Copy link
Collaborator

@hemant22 I still don't see anything to indicate that Kilosort is causing this, especially if you're getting the error at different points in the pipeline.

This error message:
CUDA out of memory. Tried to allocate 1.61 GiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 4.26 GiB is allocated by PyTorch, and 814.23 MiB is reserved by PyTorch but unallocated.

Is saying: Kilosort is using currently ~4.3 GB of video memory. It tried to allocate an additional 1.6 GB, but couldn't do that because there was no more video memory available. The only reason that would happen is if something else is running on your machine that is using up that memory, or otherwise preventing pytorch from making use of it.

Windows task manager is also not a reliable way to gauge memory usage for pytorch. A better way to check is using the nvidia-smi command in a terminal / powershell.

@hemant22
Copy link
Author

hemant22 commented Aug 4, 2024

@jacobpennington No one else is running anything on the machine that might be using the memory. That's for sure. The error happens mostly at this line: - "vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)".
I will run it again while monitoring the memory usage via nvidia-smi command.

Should I try this that is suggested with the error:
" If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. "

@jacobpennington
Copy link
Collaborator

Okay. Can you please also try sorting a subset of the data, say with tmax = 1800 (first 30 mins of data)? The number of spikes you're detecting seems much larger than expected for that size of recording, which might be why you're seeing this issue for this recording and not others. If you can sort a subset, that might reveal if there are some strange units or artifacts in the results that could be causing issues.

@hemant22
Copy link
Author

hemant22 commented Aug 7, 2024

@jacobpennington I tried running with tmax=1800. It ran successfully. So what should I do/check next to figure out the problem?

Copied below is the log file :
08-06 16:50 kilosort.run_kilosort INFO
08-06 16:50 kilosort.run_kilosort INFO Computing preprocessing variables.
08-06 16:50 kilosort.run_kilosort INFO ----------------------------------------
08-06 16:50 kilosort.run_kilosort INFO Preprocessing filters computed in 2.04s; total 2.04s
08-06 16:50 kilosort.run_kilosort DEBUG hp_filter shape: torch.Size([30122])
08-06 16:50 kilosort.run_kilosort DEBUG whiten_mat shape: torch.Size([383, 383])
08-06 16:50 kilosort.run_kilosort INFO
08-06 16:50 kilosort.run_kilosort INFO Computing drift correction.
08-06 16:50 kilosort.run_kilosort INFO ----------------------------------------
08-06 16:50 kilosort.spikedetect INFO Re-computing universal templates from data.
08-06 18:00 kilosort.run_kilosort INFO drift computed in 4227.40s; total 4229.44s
08-06 18:00 kilosort.run_kilosort DEBUG st shape: (10527619, 6)
08-06 18:00 kilosort.run_kilosort DEBUG yblk shape: (1,)
08-06 18:00 kilosort.run_kilosort DEBUG dshift shape: (900, 1)
08-06 18:00 kilosort.run_kilosort DEBUG iKxx shape: torch.Size([383, 383])
08-06 18:00 kilosort.gui.sorter DEBUG First batch min, max: (-25.055134, 38.15639)
08-06 18:00 kilosort.run_kilosort INFO
08-06 18:00 kilosort.run_kilosort INFO Extracting spikes using templates
08-06 18:00 kilosort.run_kilosort INFO ----------------------------------------
08-06 18:00 kilosort.spikedetect INFO Re-computing universal templates from data.
08-06 19:04 kilosort.run_kilosort INFO 9929708 spikes extracted in 3801.92s; total 8032.19s
08-06 19:04 kilosort.run_kilosort DEBUG st0 shape: (9929708, 6)
08-06 19:04 kilosort.run_kilosort DEBUG tF shape: torch.Size([9929708, 10, 6])
08-06 19:04 kilosort.run_kilosort INFO
08-06 19:04 kilosort.run_kilosort INFO First clustering
08-06 19:04 kilosort.run_kilosort INFO ----------------------------------------
08-06 19:07 kilosort.run_kilosort INFO 1533 clusters found, in 233.51s; total 8265.72s
08-06 19:07 kilosort.run_kilosort DEBUG clu shape: (9929708,)
08-06 19:07 kilosort.run_kilosort DEBUG Wall shape: torch.Size([1533, 383, 6])
08-06 19:07 kilosort.run_kilosort INFO
08-06 19:07 kilosort.run_kilosort INFO Extracting spikes using cluster waveforms
08-06 19:07 kilosort.run_kilosort INFO ----------------------------------------
08-06 19:25 kilosort.run_kilosort INFO 25892920 spikes extracted in 1065.07s; total 9330.80s
08-06 19:25 kilosort.run_kilosort DEBUG st shape: (25892920, 3)
08-06 19:25 kilosort.run_kilosort DEBUG tF shape: torch.Size([25892920, 10, 6])
08-06 19:25 kilosort.run_kilosort DEBUG iCC shape: torch.Size([10, 383])
08-06 19:25 kilosort.run_kilosort DEBUG iU shape: torch.Size([1200])
08-06 19:25 kilosort.run_kilosort INFO
08-06 19:25 kilosort.run_kilosort INFO Final clustering
08-06 19:25 kilosort.run_kilosort INFO ----------------------------------------
08-06 19:37 kilosort.run_kilosort INFO 1111 clusters found, in 710.35s; total 10041.17s
08-06 19:37 kilosort.run_kilosort DEBUG clu shape: (25892920,)
08-06 19:37 kilosort.run_kilosort DEBUG Wall shape: torch.Size([1111, 383, 6])
08-06 19:37 kilosort.run_kilosort INFO
08-06 19:37 kilosort.run_kilosort INFO Merging clusters
08-06 19:37 kilosort.run_kilosort INFO ----------------------------------------
08-06 19:38 kilosort.run_kilosort INFO 956 units found, in 43.23s; total 10084.42s
08-06 19:38 kilosort.run_kilosort DEBUG clu shape: (25892920,)
08-06 19:38 kilosort.run_kilosort DEBUG Wall shape: torch.Size([956, 383, 6])
08-06 19:38 kilosort.run_kilosort INFO
08-06 19:38 kilosort.run_kilosort INFO Saving to phy and computing refractory periods
08-06 19:38 kilosort.run_kilosort INFO ----------------------------------------
08-06 19:49 kilosort.run_kilosort INFO 417 units found with good refractory periods
08-06 19:49 kilosort.run_kilosort INFO Total runtime: 10774.82s = 02:59:35 h:m:s
08-06 19:49 kilosort.run_kilosort INFO Sorting output saved in: Z:\Users\Kyunghee\CN\Ephys\W5006\20240723\Loc1\Response\W5006_20240723_Loc1_g0\W5006_20240723_Loc1_g0_imec0\kilosort4.

@jacobpennington
Copy link
Collaborator

@hemant22 Can you open the results in Phy and check if anything looks off with the waveforms or anything else? Screenshots from that would be helpful.

@hemant22
Copy link
Author

hemant22 commented Aug 7, 2024

@jacobpennington I checked KS output (for tmax=1800) in phy. I didn't find anything that is strange or different from the other sessions. Additionally, I was able to run full session on KS3 without any problems; so the raw data looks fine to me.

@jacobpennington
Copy link
Collaborator

Okay, thanks. If you're comfortable modifying the code, can you please try the change in this pull request and see if you're able to sort the full recording? It just adds a couple lines to one file.
https://github.com/MouseLand/Kilosort/pull/758/files

@jacobpennington
Copy link
Collaborator

Update to the previous comment: you no longer need to modify the code to try that. You can update to the latest version (v4.0.15) and use clear_cache=True.

@hemant22
Copy link
Author

@jacobpennington Thank you for your help. It is working now.
@Lathomas42 Thanks a lot.

@Hobart10
Copy link

Hi Jacob, @jacobpennington
I used clear_cache=True with v4.0.15, but I'm still encountering the error torch.OutOfMemoryError: CUDA out of memory error. This occurs immediately after the first clustering. Could you examine the error and provide any idea of how to fix? Thank you!!

SpikeSortingError: Spike sorting error trace:
Traceback (most recent call last):
  File "C:\Users\Lenovo\.conda\envs\SI\spikeinterface\src\spikeinterface\sorters\basesorter.py", line 261, in run_from_folder
    SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)
  File "C:\Users\Lenovo\.conda\envs\SI\spikeinterface\src\spikeinterface\sorters\external\kilosort4.py", line 273, in _run_from_folder
    st, tF, _, _ = detect_spikes(ops, device, bfile, tic0=tic0, progress_bar=progress_bar)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\kilosort\run_kilosort.py", line 611, in detect_spikes
    st, tF, ops = template_matching.extract(ops, bfile, Wall3, device=device,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\kilosort\template_matching.py", line 26, in extract
    ctc = prepare_matching(ops, U)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\kilosort\template_matching.py", line 108, in prepare_matching
    ctc = torch.einsum('ijkm, kml -> ijl', UtU, WtW)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lenovo\.conda\envs\SI\Lib\site-packages\torch\functional.py", line 380, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 53.53 GiB. GPU 0 has a total capacity of 10.00 GiB of which 0 bytes is free. Of the allocated memory 31.35 GiB is allocated by PyTorch, and 43.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

@jacobpennington
Copy link
Collaborator

@Hobart10 Can you please provide a screenshot of what the KS4 GUI looks like when you load your data, and the kilosort4.log file from the results directory? The fact it's trying to allocate ~54 GB of video memory indicates there's some other problem causing this.

@Hobart10
Copy link

Just found out it works in kilosort directly but not through spikeinterface in my case. Will consult there. Thank you!!

@RobertoDF
Copy link
Contributor

I still get OOM from this line (vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)) even if clear_cache=True.

@jacobpennington
Copy link
Collaborator

@RobertoDF Is that still the case? Just checking since you closed your pull requests.

@EmmettJT
Copy link

Hi!
We are also getting a CUDA memory error, unfortunately none of the suggestions above or in the other related threads are solving it.

We have tried clearing the gpu cache, using the qr.kmeansplusplus version and even tried using an older version of KS but it always runs out of memory at the final clustering stage. We get the same error if we try on two different machines and when running on a hpc cluster - on the hpc, we used a GPU with ~18gb memory.
The datasets which give us issues are large (~300gb) but we have successfully sorted larger datasets without issue. From looking at the data I don't see any obvious issues which might cause ks to fail (eg. the data is not noisy).

Do you have any other suggestions? Since we seem to need more memory, is it currently possible to run a single instance of kilosort4 across multiple GPUs at the same time?

@RobertoDF
Copy link
Contributor

Is the problem arising specifically at vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)? If you used my pull request at which line does it happen?

@EmmettJT
Copy link

yep, without your pull request error occurs at vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)

when using it error is either at line 215: mu[j] = Xg[ix].mean(0) or sometimes at line 171 vtot = (Xg**2).sum(1)

Seems to be very similar to the issues Peyton-D mentioned here: #775

@jacobpennington
Copy link
Collaborator

Thanks! Working on a solution this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants