Skip to content

Conversation

LourensVeen
Copy link

Here are some minimal updates to make sapporo2 compile with newer versions of CUDA and OpenCL.

The cmalloc function seems to be called by the OpenCL code, but is missing. Possibly it's present in the codes that use sapporo, and since everything is linked statically it gets taken from there when that code is linked? It seems like the allocate member function is what should have been called there, but I'm not sure.

I've tested this with CUDA, and can run the test codes, but with nVidia OpenCL they crash with

oclSafeCall() Runtime API error in file <./include/ocldev.h>, line 531 : Out of resources
.
test_gravity_block_ocl: ./include/ocldev.h:179: void dev::__oclsafeCall(cl_int, const char*, int): Assertion `false' failed.

Since OpenCL is obsolete anyway, maybe we don't bother going into this?

(I have more and bigger changes in the pipeline, but I'm going to try to split them into small, easy to review PRs to hopefully not take too much of your time. Thanks!)

@LourensVeen LourensVeen changed the title Cuda opencl fixes Cuda and OpenCL fixes Oct 31, 2023
Copy link
Collaborator

@jbedorf jbedorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes and fixes. It's been a while since anyone worked on this, glad that it's not completely broken. Just one comment on the ifdef/else construct.

fprintf(stderr, "ERROR: is not implemented in OpenCL, only in CUDA. Please");
fprintf(stderr, "ERROR: file an issue on GitHub if you need this combination.");
exit(1);
#else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's no need for the else? Given that there's the exit above? In that case maybe change the #else into #endif

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here actually was that float3 was undefined when compiling with OpenCL, causing a compiler error. So we could move the perThreadSM = ... line outside of the #ifdef, but then it wouldn't compile for OpenCL.

I think I later saw a header somewhere that aliases some OpenCL types to CUDA-like names, so maybe float3 can be added there to fix it instead, I'll have a look.

ocl_free();
cmalloc(src.n, DeviceMemFlags);
ocl_free();
allocate(src.n, DeviceMemFlags);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks to be the right thing todo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants