-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows compatibility #29
base: v2
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,30 @@ | |
#include "texture.h" | ||
#include "backprojection.h" | ||
|
||
namespace | ||
{ | ||
template<typename T> | ||
__device__ T toType(float); | ||
|
||
template<> | ||
__device__ float toType(float f) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are these required? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The cuda compiler on windows doesn't allow implicit casts from 32bit floats to 16bit floats. In There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just found out that
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am ok with keeping the templated casts. Please put them in a separate file in order to avoid duplication of code |
||
{ | ||
return f; | ||
}; | ||
|
||
template<> | ||
__device__ __half toType(float f) | ||
{ | ||
return __float2half(f); | ||
}; | ||
|
||
template<> | ||
__device__ unsigned short toType(float f) | ||
{ | ||
return static_cast<unsigned short>(f); | ||
}; | ||
} | ||
|
||
template<bool parallel_beam, int channels, typename T> | ||
__global__ void | ||
radon_backward_kernel(T *__restrict__ output, cudaTextureObject_t texture, const float *__restrict__ angles, | ||
|
@@ -98,7 +122,7 @@ radon_backward_kernel(T *__restrict__ output, cudaTextureObject_t texture, const | |
|
||
#pragma unroll | ||
for (int b = 0; b < channels; b++) { | ||
output[base + b * pitch] = accumulator[b] * ids; | ||
output[base + b * pitch] = toType<T>(accumulator[b] * ids); | ||
} | ||
} | ||
} | ||
|
@@ -247,7 +271,7 @@ radon_backward_kernel_3d(T *__restrict__ output, cudaTextureObject_t texture, co | |
|
||
#pragma unroll | ||
for (int b = 0; b < channels; b++) { | ||
output[b * pitch + index] = accumulator[b] * ids; | ||
output[b * pitch + index] = toType<T>(accumulator[b] * ids); | ||
} | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,5 @@ | ||
import numpy as np | ||
import torch | ||
import torch.fft | ||
|
||
try: | ||
import scipy.fft | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setup is used also in the CI/CD pipeline to create the precompiled packages so I would keep both the Linux version with the
make.py
file and the new Windows version ad add anif os.name=='posix'
to choose between the two versions of the codeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I will add the
if os.name=='posix'
.Since this will include setting
cuda_home = os.getenv("CUDA_HOME", "/usr/local/cuda")
again, what do you think about replacing this withfrom torch.utils.cpp_extension import CUDA_HOME
? On Manjaro, for example, the path to cuda isCUDA_PATH="/opt/cuda"
andCUDA_HOME
is not set. The torch import would take care of that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks.
Replacing the CUDA_HOME definition seems a good idea!