When I run train_ms.py on WSL2 get CUDA error and something. #224

ramune64 · 2025-03-22T14:38:07Z

Hi, I tried train my JP multi speaker on WSL2.
But I got following logs and error message.
I'm a beginner so please tell me how to solve it.

==================================================================
......
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [82,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [83,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [84,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [85,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [86,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [87,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f9403b6c446 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f9403b166e4 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f9403f1ba18 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x1021c88 (0x7f93b987fc88 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x102a735 (0x7f93b9888735 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0x5faf70 (0x7f940299af70 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x6f69f (0x7f9403b4d69f in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7f9403b4637b in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f9403b46529 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #9: + 0x8c1a98 (0x7f9402c61a98 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #10: THPVariable_subclass_dealloc(_object*) + 0x2c6 (0x7f9402c61de6 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #11: /home/test/venv/bin/python() [0x504334]
frame #12: /home/test/venv/bin/python() [0x5102aa]
frame #13: /home/test/venv/bin/python() [0x600b4a]
frame #14: _PyEval_EvalFrameDefault + 0x5dd8 (0x51a858 in /home/test/venv/bin/python)
frame #15: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #16: _PyEval_EvalFrameDefault + 0x32d (0x514dad in /home/test/venv/bin/python)
frame #17: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x302b (0x517aab in /home/test/venv/bin/python)
frame #19: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x302b (0x517aab in /home/test/venv/bin/python)
frame #21: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x734 (0x5151b4 in /home/test/venv/bin/python)
frame #23: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x734 (0x5151b4 in /home/test/venv/bin/python)
frame #25: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x32d (0x514dad in /home/test/venv/bin/python)
frame #27: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x1451 (0x515ed1 in /home/test/venv/bin/python)
frame #29: /home/test/venv/bin/python() [0x5c9dd5]
frame #30: PyEval_EvalCode + 0x80 (0x5c9d30 in /home/test/venv/bin/python)
frame #31: /home/test/venv/bin/python() [0x5fea7c]
frame #32: /home/test/venv/bin/python() [0x5fa616]
frame #33: PyRun_StringFlags + 0x82 (0x5f03a2 in /home/test/venv/bin/python)
frame #34: PyRun_SimpleStringFlags + 0x42 (0x5f01c2 in /home/test/venv/bin/python)
frame #35: Py_RunMain + 0x3c4 (0x5ef6e4 in /home/test/venv/bin/python)
frame #36: Py_BytesMain + 0x2d (0x5bd16d in /home/test/venv/bin/python)
frame #37: + 0x2a1ca (0x7f940473a1ca in /lib/x86_64-linux-gnu/libc.so.6)
frame #38: __libc_start_main + 0x8b (0x7f940473a28b in /lib/x86_64-linux-gnu/libc.so.6)
frame #39: _start + 0x25 (0x5bd065 in /home/test/venv/bin/python)

Traceback (most recent call last):
File "/home/kense/vits/train_ms.py", line 297, in
main()
File "/home/kense/vits/train_ms.py", line 52, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/home/test/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 328, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/home/test/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 284, in start_processes
while not context.join():
File "/home/test/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 184, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT

===================================================

{
"train": {
"log_interval": 200,
"eval_interval": 1000,
"seed": 1234,
"epochs": 10000,
"learning_rate": 2e-4,
"betas": [0.8, 0.99],
"eps": 1e-9,
"batch_size": 32,
"fp16_run": true,
"lr_decay": 0.999875,
"segment_size": 8192,
"init_lr_ratio": 1,
"warmup_epochs": 0,
"c_mel": 45,
"c_kl": 1.0
},
"data": {
"training_files":"filelists/train.txt.cleaned",
"validation_files":"filelists/val.txt.cleaned",
"text_cleaners":["basic_cleaners"],
"max_wav_value": 32768.0,
"sampling_rate": 22050,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"n_mel_channels": 80,
"mel_fmin": 0.0,
"mel_fmax": null,
"add_blank": true,
"n_speakers": 12,
"cleaned_text": true
},
"model": {
"inter_channels": 192,
"hidden_channels": 192,
"filter_channels": 768,
"n_heads": 2,
"n_layers": 6,
"kernel_size": 3,
"p_dropout": 0.1,
"resblock": "1",
"resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
"upsample_rates": [8,8,2,2],
"upsample_initial_channel": 512,
"upsample_kernel_sizes": [16,16,4,4],
"n_layers_q": 3,
"use_spectral_norm": false,
"gin_channels": 256
}
}

===================================================

python==3.10.16
torch==2.5.1+cu124
If you need more information about my environment or logs, please tell me.

Number of speakers is 13(id:0~12)

I tried some methods proposed in similar situations.
But all of it didn't contribute.
Please help me and thanks for reading my poor English.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When I run train_ms.py on WSL2 get CUDA error and something. #224

When I run train_ms.py on WSL2 get CUDA error and something. #224

ramune64 commented Mar 22, 2025

When I run train_ms.py on WSL2 get CUDA error and something. #224

When I run train_ms.py on WSL2 get CUDA error and something. #224

Comments

ramune64 commented Mar 22, 2025