Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gklee stuck while trying to check out kernel code #3

Open
Lebronmydx opened this issue Aug 9, 2018 · 0 comments
Open

Gklee stuck while trying to check out kernel code #3

Lebronmydx opened this issue Aug 9, 2018 · 0 comments

Comments

@Lebronmydx
Copy link

Hello there!

I can run test code successfully in gklee, but when I am trying to use gklee on some other code, face a strange issue:
sesa seems halt forever.

Here is my code sample:

#define CU1DBLOCK 256
#include <stdio.h>


#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
  if (code != cudaSuccess) 
    {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
    }
}

__global__
void _trace_mat_mat_trans(const float* A, const float* B, int dA_rows, int dA_cols, int dA_stride,
                                 int B_stride, float* value) {
  __shared__ float ssum[CU1DBLOCK];
  // linear thread id;
  const int tid = threadIdx.y * blockDim.x + threadIdx.x;
  const int j = blockIdx.x * blockDim.x + threadIdx.x;
  const int grid_height = gridDim.y * blockDim.y;
  int i = blockIdx.y * blockDim.y + threadIdx.y;

  // Grid reduce
  float tsum = 0.0;
  if (j < dA_cols) {
    while (i < dA_rows) {
      tsum += A[i * dA_stride + j] * B[i * B_stride + j];
      i += grid_height;
    }
  }
  ssum[tid] = tsum;
  __syncthreads();

  // Block reduce
  for (int shift = CU1DBLOCK / 2; shift > warpSize; shift >>= 1) {
    if (tid < shift)
      ssum[tid] += ssum[tid + shift];
    __syncthreads();
  }

  // Warp reduce. Implicitly synchronized within a warp.
  if (tid < warpSize) {
    for (int shift = warpSize; shift > 0; shift >>= 1) {
      ssum[tid] += ssum[tid + shift];
    }
  }

  // output 1 sum per thread block
  if (tid == 0) {
    value[blockIdx.y * gridDim.x + blockIdx.x] = ssum[0];
  }
}

int main(void) {
	int dA_rows = 5, dA_cols = 5, dA_stride = 1, B_stride = 1;
	int total_size = 100;
	int num_bytes = sizeof(float) * total_size;
	float *A = NULL;
	float *B = NULL;
	float *value = NULL;
	cudaMalloc((void **) &A, num_bytes);
	cudaMalloc((void **) &B, num_bytes);
	cudaMalloc((void **) &value, num_bytes);
	int block_size = 3, grid_size = 1;
	_trace_mat_mat_trans<<<grid_size, block_size>>>(A, B, dA_rows, dA_cols, dA_stride, B_stride, value);
	return 0;
}

Here is my execute cmd:

sesa < new-func-gklee > new-func-gklee.sesa 

Here is output in Summary.txt:

****************************************
The 0 (pointer) argument of function _Z20_trace_mat_mat_transPKfS0_iiiiPf: 
A
The 1 (pointer) argument of function _Z20_trace_mat_mat_transPKfS0_iiiiPf: 
B
The 6 (pointer) argument of function _Z20_trace_mat_mat_transPKfS0_iiiiPf: 
value

Start evaluating 3 (pointer) arguments of function _Z20_trace_mat_mat_transPKfS0_iiiiPf

And no output in .sesa file.

Any tips?

Thanks for the help! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant