Bug
delay_forward.cu passes int64_t T (and derived sizes) directly to CUDA kernel parameters typed int (lines ~111 and ~119) without static_cast<int>, unlike the explicit-cast pattern used consistently in parallel_scan.cu, biquad_forward.cu, and compressor_forward.cu. Implicit narrowing is silent today and wrong for signals beyond 2^31 samples.
Fix
Add explicit static_cast<int> plus a TORCH_CHECK(T <= INT_MAX, ...) guard, matching the other kernels. Needs cluster build validation.
Bug
delay_forward.cupassesint64_t T(and derived sizes) directly to CUDA kernel parameters typedint(lines ~111 and ~119) withoutstatic_cast<int>, unlike the explicit-cast pattern used consistently inparallel_scan.cu,biquad_forward.cu, andcompressor_forward.cu. Implicit narrowing is silent today and wrong for signals beyond 2^31 samples.Fix
Add explicit
static_cast<int>plus aTORCH_CHECK(T <= INT_MAX, ...)guard, matching the other kernels. Needs cluster build validation.