Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I run train_ms.py on WSL2 get CUDA error and something. #224

Open
ramune64 opened this issue Mar 22, 2025 · 0 comments
Open

When I run train_ms.py on WSL2 get CUDA error and something. #224

ramune64 opened this issue Mar 22, 2025 · 0 comments

Comments

@ramune64
Copy link

Hi, I tried train my JP multi speaker on WSL2.
But I got following logs and error message.
I'm a beginner so please tell me how to solve it.

==================================================================
......
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [82,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [83,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [84,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [85,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [86,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [87,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f9403b6c446 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f9403b166e4 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f9403f1ba18 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x1021c88 (0x7f93b987fc88 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x102a735 (0x7f93b9888735 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0x5faf70 (0x7f940299af70 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x6f69f (0x7f9403b4d69f in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7f9403b4637b in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f9403b46529 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #9: + 0x8c1a98 (0x7f9402c61a98 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #10: THPVariable_subclass_dealloc(_object*) + 0x2c6 (0x7f9402c61de6 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #11: /home/test/venv/bin/python() [0x504334]
frame #12: /home/test/venv/bin/python() [0x5102aa]
frame #13: /home/test/venv/bin/python() [0x600b4a]
frame #14: _PyEval_EvalFrameDefault + 0x5dd8 (0x51a858 in /home/test/venv/bin/python)
frame #15: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #16: _PyEval_EvalFrameDefault + 0x32d (0x514dad in /home/test/venv/bin/python)
frame #17: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x302b (0x517aab in /home/test/venv/bin/python)
frame #19: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x302b (0x517aab in /home/test/venv/bin/python)
frame #21: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x734 (0x5151b4 in /home/test/venv/bin/python)
frame #23: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x734 (0x5151b4 in /home/test/venv/bin/python)
frame #25: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x32d (0x514dad in /home/test/venv/bin/python)
frame #27: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x1451 (0x515ed1 in /home/test/venv/bin/python)
frame #29: /home/test/venv/bin/python() [0x5c9dd5]
frame #30: PyEval_EvalCode + 0x80 (0x5c9d30 in /home/test/venv/bin/python)
frame #31: /home/test/venv/bin/python() [0x5fea7c]
frame #32: /home/test/venv/bin/python() [0x5fa616]
frame #33: PyRun_StringFlags + 0x82 (0x5f03a2 in /home/test/venv/bin/python)
frame #34: PyRun_SimpleStringFlags + 0x42 (0x5f01c2 in /home/test/venv/bin/python)
frame #35: Py_RunMain + 0x3c4 (0x5ef6e4 in /home/test/venv/bin/python)
frame #36: Py_BytesMain + 0x2d (0x5bd16d in /home/test/venv/bin/python)
frame #37: + 0x2a1ca (0x7f940473a1ca in /lib/x86_64-linux-gnu/libc.so.6)
frame #38: __libc_start_main + 0x8b (0x7f940473a28b in /lib/x86_64-linux-gnu/libc.so.6)
frame #39: _start + 0x25 (0x5bd065 in /home/test/venv/bin/python)

Traceback (most recent call last):
File "/home/kense/vits/train_ms.py", line 297, in
main()
File "/home/kense/vits/train_ms.py", line 52, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/home/test/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 328, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/home/test/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 284, in start_processes
while not context.join():
File "/home/test/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 184, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT

===================================================

{
"train": {
"log_interval": 200,
"eval_interval": 1000,
"seed": 1234,
"epochs": 10000,
"learning_rate": 2e-4,
"betas": [0.8, 0.99],
"eps": 1e-9,
"batch_size": 32,
"fp16_run": true,
"lr_decay": 0.999875,
"segment_size": 8192,
"init_lr_ratio": 1,
"warmup_epochs": 0,
"c_mel": 45,
"c_kl": 1.0
},
"data": {
"training_files":"filelists/train.txt.cleaned",
"validation_files":"filelists/val.txt.cleaned",
"text_cleaners":["basic_cleaners"],
"max_wav_value": 32768.0,
"sampling_rate": 22050,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"n_mel_channels": 80,
"mel_fmin": 0.0,
"mel_fmax": null,
"add_blank": true,
"n_speakers": 12,
"cleaned_text": true
},
"model": {
"inter_channels": 192,
"hidden_channels": 192,
"filter_channels": 768,
"n_heads": 2,
"n_layers": 6,
"kernel_size": 3,
"p_dropout": 0.1,
"resblock": "1",
"resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
"upsample_rates": [8,8,2,2],
"upsample_initial_channel": 512,
"upsample_kernel_sizes": [16,16,4,4],
"n_layers_q": 3,
"use_spectral_norm": false,
"gin_channels": 256
}
}

===================================================

python==3.10.16
torch==2.5.1+cu124
If you need more information about my environment or logs, please tell me.

Number of speakers is 13(id:0~12)

I tried some methods proposed in similar situations.
But all of it didn't contribute.
Please help me and thanks for reading my poor English.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant