Skip to content

Conversation

yucai-intel
Copy link
Contributor

@yucai-intel yucai-intel commented Oct 7, 2025

Divide indextoOffset() into two versions, offline and online, to reduce runtime overhead and as much as possible.

@CuiYifeng CuiYifeng requested a review from Copilot October 16, 2025 08:45
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Split IndexToOffset into compile-time (static Dims) and runtime (-1) variants to reduce nvcc compilation time and refactor all kernel call sites to use the new template form. Key changes remove the previous contiguous fast path flag and introduce dimension template parameters across many XPU SYCL kernels.

  • Introduced IndexToOffset<T, IndexType, Dims> and dynamic specialization with Dims = -1; removed strict/non-strict contiguous branching.
  • Updated all kernel usages to pass an explicit Dims (positive, -1, or new sentinel -2) and added indexing_kind template parameters in RNN kernels.
  • Replaced standard SYCL subgroup size attribute with Intel-specific attribute in one kernel.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/comm/TensorInfo.h Replaced old IndexToOffset implementation with compile-time and runtime (-1) variants, removed contiguous fast path.
src/ATen/native/xpu/sycl/WeightNormKernels.cpp Updated all IndexToOffset calls to new API (runtime -1).
src/ATen/native/xpu/sycl/TensorModeKernel.cpp Switched subgroup size attribute to Intel-specific and updated IndexToOffset usage.
src/ATen/native/xpu/sycl/TensorApplyUtils.h Updated ApplyOp2 to use runtime (-1) IndexToOffset.
src/ATen/native/xpu/sycl/SummaryOpsKernels.cpp Added ADims/BDims template params and updated IndexToOffset calls.
src/ATen/native/xpu/sycl/Sorting.cpp Pass compile-time Dim to IndexToOffset.
src/ATen/native/xpu/sycl/ScanUtils.h Migrated to runtime (-1) IndexToOffset calls.
src/ATen/native/xpu/sycl/RNNKernels.cpp Added indexing_kind template parameter and adjusted macros to new IndexToOffset signature.
src/ATen/native/xpu/sycl/Indexing.h Updated offset calculations to new runtime form.
src/ATen/native/xpu/sycl/Indexing.cpp Added DstDim/SrcDim/IdxDim template params, macros now emit various Dims (including -2) for IndexToOffset.
src/ATen/native/xpu/sycl/Dropout.cpp Added ADims / BDims template-based offset computation.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +1539 to +1541
if (selfInfo.dims == 1 && sourceInfo.dims == 1 && indContig) {
auto caller = SMALL_INDEX(
scalar_t, index_t, unsigned int, 1, 1, -2, func_t);
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentinel value -2 for IdxDim is undocumented and differs from the established -1 dynamic case; it implicitly relies on the primary template's loop skipping logic and produces offset = linearId * stride[0], which would be incorrect if indices_ is not 1D with stride[0]==1. Replace -2 with an explicit dimension (e.g., 1) or unify on -1 with a clear fast path, and document the intent.

Copilot uses AI. Check for mistakes.

@CuiYifeng CuiYifeng requested a review from guangyey October 17, 2025 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants