-
Notifications
You must be signed in to change notification settings - Fork 59
Split IndextoOffset() into offline and online versions #2136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Split IndexToOffset into compile-time (static Dims) and runtime (-1) variants to reduce nvcc compilation time and refactor all kernel call sites to use the new template form. Key changes remove the previous contiguous fast path flag and introduce dimension template parameters across many XPU SYCL kernels.
- Introduced IndexToOffset<T, IndexType, Dims> and dynamic specialization with Dims = -1; removed strict/non-strict contiguous branching.
- Updated all kernel usages to pass an explicit Dims (positive, -1, or new sentinel -2) and added indexing_kind template parameters in RNN kernels.
- Replaced standard SYCL subgroup size attribute with Intel-specific attribute in one kernel.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
src/comm/TensorInfo.h | Replaced old IndexToOffset implementation with compile-time and runtime (-1) variants, removed contiguous fast path. |
src/ATen/native/xpu/sycl/WeightNormKernels.cpp | Updated all IndexToOffset calls to new API (runtime -1). |
src/ATen/native/xpu/sycl/TensorModeKernel.cpp | Switched subgroup size attribute to Intel-specific and updated IndexToOffset usage. |
src/ATen/native/xpu/sycl/TensorApplyUtils.h | Updated ApplyOp2 to use runtime (-1) IndexToOffset. |
src/ATen/native/xpu/sycl/SummaryOpsKernels.cpp | Added ADims/BDims template params and updated IndexToOffset calls. |
src/ATen/native/xpu/sycl/Sorting.cpp | Pass compile-time Dim to IndexToOffset. |
src/ATen/native/xpu/sycl/ScanUtils.h | Migrated to runtime (-1) IndexToOffset calls. |
src/ATen/native/xpu/sycl/RNNKernels.cpp | Added indexing_kind template parameter and adjusted macros to new IndexToOffset signature. |
src/ATen/native/xpu/sycl/Indexing.h | Updated offset calculations to new runtime form. |
src/ATen/native/xpu/sycl/Indexing.cpp | Added DstDim/SrcDim/IdxDim template params, macros now emit various Dims (including -2) for IndexToOffset. |
src/ATen/native/xpu/sycl/Dropout.cpp | Added ADims / BDims template-based offset computation. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
if (selfInfo.dims == 1 && sourceInfo.dims == 1 && indContig) { | ||
auto caller = SMALL_INDEX( | ||
scalar_t, index_t, unsigned int, 1, 1, -2, func_t); |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentinel value -2 for IdxDim is undocumented and differs from the established -1 dynamic case; it implicitly relies on the primary template's loop skipping logic and produces offset = linearId * stride[0], which would be incorrect if indices_ is not 1D with stride[0]==1. Replace -2 with an explicit dimension (e.g., 1) or unify on -1 with a clear fast path, and document the intent.
Copilot uses AI. Check for mistakes.
Divide indextoOffset() into two versions, offline and online, to reduce runtime overhead and as much as possible.