Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Uninitialized __global__ memory read of size 4 bytes in thrust::inclusive_scan #2104

Closed
1 task done
lilohuang opened this issue Jul 30, 2024 · 2 comments
Closed
1 task done
Labels
bug Something isn't working right.

Comments

@lilohuang
Copy link

Is this a duplicate?

Type of Bug

Silent Failure

Component

Thrust

Describe the bug

The similar issue appears in several Thrust/CUB APIs (#1891, #1790, #1889), and some of them might be due to the same root cause, as @elstehle mentioned in the issue comment about the decoupled look-back algorithm.

It would be great to find a way to appease the compute-sanitizer so that CUDA programmers can trust the NVIDIA compute-sanitizer to identify any potential bugs while using the NVIDIA CCCL library. Thank you so much. 👍

========= Uninitialized __global__ memory read of size 4 bytes
=========     at 0xa0a50 in /usr/local/cuda-12.2/targets/x86_64-linux/include/cub/thread/thread_load.cuh:272:ThreadLoad<LOAD_CG,const unsigned int *>
=========     by thread (30,0,0) in block (1,0,0)
=========     Address 0x7f65e5204710
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/thread/thread_load.cuh:119:Load<cub::CUB_200200_520_NS::LOAD_CG,DeviceWord> [0xa0a90]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/thread/thread_load.cuh:377:ThreadLoad<value_type,2> [0xa0d90]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/agent/single_pass_scan_operators.cuh:395:ThreadLoad<LOAD_CG,value_type *> [0xa0ed0]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/agent/single_pass_scan_operators.cuh:801:WaitForValid<delay_t> [0xa0ed0]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/agent/single_pass_scan_operators.cuh:1123:ProcessWindow<delay_t> [0xa10d0]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/agent/single_pass_scan_operators.cuh:1155:operator() [0xb59e0]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/block/specializations/block_scan_warp_scans.cuh:297:ExclusiveScan<struct lambda [] type at line 199391, col. 7,TilePrefixCallbackOpT> [0xd0980]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/block/specializations/block_scan_warp_scans.cuh:879:ExclusiveScan<struct lambda [] type at line 199391, col. 7,TilePrefixCallbackOpT> [0xd1bb0]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/block/block_scan.cuh:2114:InclusiveScan<6,struct lambda [] type at line 199391, col. 7,TilePrefixCallbackOpT> [0xd1be0]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/block/block_scan.cuh:282:ScanTile<TilePrefixCallbackOpT> [0xd4800]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/agent/agent_scan.cuh:385:ConsumeTile<true> [0xd4830]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/agent/agent_scan.cuh:438:ConsumeRange [0xdaa30]
=========     Device Frame:/usr/local/cuda-12.2/targets/x86_64-linux/include/cub/device/dispatch/dispatch_scan.cuh:195:void cub::CUB_200200_520_NS::DeviceScanKernel<cub::CUB_200200_520_NS::DeviceScanPolicy<thrust::pair<int, int>, main::[lambda(thrust::pair<int, int>, thrust::pair<int, int>) (instance 1)]>::Policy900, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int>>>, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int>>>, cub::CUB_200200_520_NS::ScanTileState<thrust::pair<int, int>, (bool)0>, main::[lambda(thrust::pair<int, int>, thrust::pair<int, int>) (instance 1)], cub::CUB_200200_520_NS::NullType, int, thrust::pair<int, int>>(T2, T3, T4, int, T5, T6, T7) [0xdaa80]
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x332560]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:libcudart_static_4d8b33a106dceb3c07a56e26de61f2d53bb62a68 [0x2dcde]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:cudaLaunchKernel [0x8deee]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:cudaError cudaLaunchKernel<char>(char const*, dim3, dim3, void**, unsigned long, CUstream_st*) [0xd1f2]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:__device_stub__ZN3cub17CUB_200200_520_NS16DeviceScanKernelINS0_16DeviceScanPolicyIN6thrust4pairIiiEEZ4mainEUnvdl0_PFivE4main1_E9Policy900ENS3_6detail15normal_iteratorINS3_10device_ptrIS5_EEEESF_NS0_13ScanTileStateIS5_Lb0EEES8_NS0_8NullTypeEiS5_EEvT0_T1_T2_iT3_T4_T5_(thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >&, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >&, cub::CUB_200200_520_NS::ScanTileState<thrust::pair<int, int>, false>&, int, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>&, cub::CUB_200200_520_NS::NullType&, int) [0xc88b]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:void cub::CUB_200200_520_NS::__wrapper__device_stub_DeviceScanKernel<cub::CUB_200200_520_NS::DeviceScanPolicy<thrust::pair<int, int>, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>> >::Policy900, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, cub::CUB_200200_520_NS::ScanTileState<thrust::pair<int, int>, false>, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>, cub::CUB_200200_520_NS::NullType, int, thrust::pair<int, int> >(thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >&, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >&, cub::CUB_200200_520_NS::ScanTileState<thrust::pair<int, int>, false>&, int&, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>&, cub::CUB_200200_520_NS::NullType&, int&) [0xc8f6]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:void cub::CUB_200200_520_NS::DeviceScanKernel<cub::CUB_200200_520_NS::DeviceScanPolicy<thrust::pair<int, int>, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>> >::Policy900, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, cub::CUB_200200_520_NS::ScanTileState<thrust::pair<int, int>, false>, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>, cub::CUB_200200_520_NS::NullType, int, thrust::pair<int, int> >(thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, cub::CUB_200200_520_NS::ScanTileState<thrust::pair<int, int>, false>, int, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>, cub::CUB_200200_520_NS::NullType, int) [0xf867]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:cudaError thrust::cuda_cub::launcher::triple_chevron::doit_host<void (*)(thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, cub::CUB_200200_520_NS::ScanTileState<thrust::pair<int, int>, false>, int, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>, cub::CUB_200200_520_NS::NullType, int), thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, cub::CUB_200200_520_NS::ScanTileState<thrust::pair<int, int>, false>, int, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>, cub::CUB_200200_520_NS::NullType, int>(void (*)(thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, cub::CUB_200200_520_NS::ScanTileState<thrust::pair<int, int>, false>, int, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>, cub::CUB_200200_520_NS::NullType, int), thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > > const&, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > > const&, cub::CUB_200200_520_NS::ScanTileState<thrust::pair<int, int>, false> const&, int const&, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>> const&, cub::CUB_200200_520_NS::NullType const&, int const&) const [0x266e2]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > > thrust::cuda_cub::detail::inclusive_scan_n_impl<thrust::cuda_cub::par_t, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, long, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>> >(thrust::cuda_cub::execution_policy<thrust::cuda_cub::par_t>&, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, long, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>) [0x1bc14]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > > thrust::cuda_cub::inclusive_scan_n<thrust::cuda_cub::par_t, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, long, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>> >(thrust::cuda_cub::execution_policy<thrust::cuda_cub::par_t>&, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, long, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>) [0x1021b]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > > thrust::cuda_cub::inclusive_scan<thrust::cuda_cub::par_t, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>> >(thrust::cuda_cub::execution_policy<thrust::cuda_cub::par_t>&, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>) [0xfe95]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > > thrust::inclusive_scan<thrust::cuda_cub::par_t, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>> >(thrust::detail::execution_policy_base<thrust::cuda_cub::par_t> const&, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, thrust::detail::normal_iterator<thrust::device_ptr<thrust::pair<int, int> > >, __nv_dl_wrapper_t<__nv_dl_tag<int (*)(), &main, 1u>>) [0xf74e]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame:main [0xbf02]
=========                in /home/lilo/thrust_initcheck/./a.out
=========     Host Frame: [0x29d90]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main [0x29e40]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0xbd65]
=========                in /home/lilo/thrust_initcheck/./a.out

How to Reproduce

Step 1. nvcc --extended-lambda -G main.cu

#include <thrust/scan.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

int main() {
   thrust::device_vector<thrust::pair<int, int> > d_in(1024, thrust::make_pair(1, 1));
   thrust::device_vector<thrust::pair<int, int> > d_out(1024);
   thrust::inclusive_scan(thrust::device,
      d_in.begin(), d_in.end(), d_out.begin(),
      [] __device__(thrust::pair<int, int> a, thrust::pair<int, int> b) {
         return thrust::make_pair(a.first + b.first, a.second + b.second);
      });
   return 0;
}

Step 2. compute-sanitizer --tool initcheck --check-device-heap yes --leak-check full --padding 512 --track-stream-ordered-races all --check-warpgroup-mma yes --require-cuda-init no --check-exit-code yes --error-exitcode 1 --nvtx true ./a.out

Expected behavior

No diagnostics emitted.

Reproduction link

No response

Operating System

Ubuntu Linux 22.04.4 LTS

nvidia-smi output

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A4000               On  | 00000000:07:00.0 Off |                  Off |
| 41%   41C    P8               6W / 140W |      3MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

NVCC version

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
@lilohuang lilohuang added the bug Something isn't working right. label Jul 30, 2024
@github-project-automation github-project-automation bot moved this to Todo in CCCL Jul 30, 2024
@lilohuang
Copy link
Author

@gevtushenko @alliepiper, I would like to know if the #2739 has fixed the issue I observed in thrust::inclusive_scan. If so, please close this issue, and I will rerun the test using the newer CCCL version.

@alliepiper
Copy link
Contributor

Sorry for the delayed response -- I tested this against the current main branch of CCCL, and it shows no errors:

$ compute-sanitizer --tool initcheck --check-device-heap yes --leak-check full --padding 512 --track-stream-ordered-races all --check-warpgroup-mma yes --require-cuda-init no --check-exit-code yes --error-exitcode 1 --nvtx true ./a.out
========= COMPUTE-SANITIZER
========= ERROR SUMMARY: 0 errors

@github-project-automation github-project-automation bot moved this from Todo to Done in CCCL Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: Done
Development

No branches or pull requests

2 participants