Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
4d38b5e
[CI] Fix CICD test issues (#2027)
mengfei25 Sep 16, 2025
df1d7ad
[CI] Add ported distributed cases (#1945)
zxd1997066 Sep 16, 2025
74b11bf
support high priority stream (#1715)
Chao1Han Sep 17, 2025
6d1a476
Reduce tensor shape avoid timeout (#2051)
Chao1Han Sep 17, 2025
a5f73c1
Move checks from nonzero kernel to operator (#1991)
Silv3S Sep 17, 2025
9f08551
Check input contiguous before collectives call to low-level communica…
zhangxiaoli73 Sep 17, 2025
6508096
Fix hardswish gradients corner case (#2050)
Silv3S Sep 17, 2025
24fab67
Clean up getDeviceIndexOfCurrentQueue (#2060)
guangyey Sep 18, 2025
bc52e63
[CI] Cleanup after build to avoid permission issue (#2088)
mengfei25 Sep 22, 2025
6e5af1e
Revert tracking of Work status for FlightRecorder in ProcessGroupXCCL…
frost-intel Sep 22, 2025
65234bd
build: enable SYCL warnings on Linux (#2096)
dvrogozh Sep 23, 2025
b755d3c
[CI] Add deps for new LLM models (#2054)
mengfei25 Sep 23, 2025
9eed218
Fix accuracy issues with CTC loss (#2074)
SanityRemnants Sep 23, 2025
0df6a62
Implement aten::nonzero_static on XPU backend (#2061)
Silv3S Sep 23, 2025
229e8ba
Stop recursive calculations in polynomial kernels if tensor has NaNs …
Silv3S Sep 23, 2025
23cd584
Remove unused variables (#2044)
Silv3S Sep 25, 2025
304983a
[CI] Login docker hub to enlarge pull limitation (#2107)
mengfei25 Sep 26, 2025
26ae9e7
Install xpu internal headers to PyTorch (#2106)
guangyey Sep 26, 2025
eed1b8f
Fix error handling for BatchLinearAlgebra Ops (#2073)
CuiYifeng Sep 26, 2025
9e95ad0
Add dynamic skip template (#2115)
RUIJIEZHONG66166 Sep 26, 2025
09edbee
[CI] Modify ci test workflow (#2116)
mengfei25 Sep 26, 2025
fa1e391
Fix unnecessary double data type conversion (#2114)
guangyey Sep 29, 2025
d5a81e0
Fix overflow when calculating workgroups count (#2104)
CuiYifeng Sep 29, 2025
f301733
Fix segmentation fault and calculation error in AveragePool2dKernel (…
yucai-intel Sep 30, 2025
086f20a
Fix test_barrier hang by using static global rank in ProcessGroupXCCL…
frost-intel Oct 3, 2025
cae6ba3
Add FlightRecorder tests (#1971)
frost-intel Oct 3, 2025
fe324d3
Modify files in install_xpu_headers only if they changes (#2138)
PawelSwider2000 Oct 9, 2025
f3800e1
Merge branch 'main' into rebase/sycl-free-func
fengyuan14 Oct 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions .github/ISSUE_TEMPLATE/dynamic-skip.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
name: 🐛 Dynamic skip
description: Create an issue to skip PR unrelated failures dynamically
title: "[Bug Skip]: "
labels: ["skipped"]

body:
- type: markdown
attributes:
value: >
#### Before submitting a bug, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/pytorch/pytorch/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: 🐛 Describe the bug with skip template
description: |
Please provide a clear and concise description of what the bug is.
The template for dynamic skip as below:

```python
# Template(Check in the github action summary)
Cases:
[Category],[Class name],[Test name]
```

```python
# example
Cases:
op_ut,third_party.torch-xpu-ops.test.xpu.test_transformers_xpu.TestTransformersXPU,test_scaled
```

If the code is too long (hopefully, it isn't), feel free to put it in a public gist and link it in the issue: https://gist.github.com.

Please also paste or describe the results you observe instead of the expected results. If you observe an error, please paste the error message including the **full** traceback of the exception. It may be relevant to wrap error messages in ```` ```triple quotes blocks``` ````.
placeholder: |
A clear and concise description of what the bug is and also align the dynamic template.
```
# Skippped cases with dynamic template
```

```python
# Sample code to reproduce the problem
```

```
The error message you got, with the full traceback.
```
validations:
required: true
- type: textarea
attributes:
label: Versions
description: |
Please run the following and paste the output below.
```sh
wget https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
```
validations:
required: true
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
22 changes: 16 additions & 6 deletions .github/actions/linux-testenv/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ runs:
run: |
# install pytorch
if [ $(echo "${{ inputs.pytorch }}" |grep -w "release_wheel" -c) -ne 0 ];then
pip install torch torchvision torchaudio --pre --index-url https://download.pytorch.org/whl/xpu
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
elif [ $(echo "${{ inputs.pytorch }}" |grep -w "test_wheel" -c) -ne 0 ];then
pip install torch torchvision torchaudio --pre --index-url https://download.pytorch.org/whl/test/xpu
elif [ $(echo "${{ inputs.pytorch }}" |grep -w "nightly_wheel" -c) -ne 0 ];then
Expand All @@ -77,7 +77,7 @@ runs:
# apply extra PRs for stock pytorch
pip install requests
if [ "${{ github.event_name }}" == "pull_request" ];then
python ../torch-xpu-ops/.github/scripts/apply_torch_pr.py -e https://github.com/pytorch/pytorch/pull/152940
python ../torch-xpu-ops/.github/scripts/apply_torch_pr.py -e https://github.com/mengfei25/pytorch/pull/27
else
python ../torch-xpu-ops/.github/scripts/apply_torch_pr.py
fi
Expand All @@ -99,7 +99,7 @@ runs:
TORCH_XPU_OPS_COMMIT="${{ inputs.torch_xpu_ops }}"
fi
fi
if [ "${{ github.event_name }}" == "pull_request" ];then
if [ "${{ github.event_name }}" == "pull_request" ] && [[ "${{ inputs.pytorch }}" != *"_wheel" ]];then
cp -r ${{ github.workspace }}/torch-xpu-ops third_party/torch-xpu-ops
cd third_party/torch-xpu-ops
else
Expand All @@ -115,6 +115,8 @@ runs:
pip install pandas psutil scipy pyyaml
cd pytorch
if [[ "${{ inputs.suite }}" == *"huggingface"* ]];then
# for new LLM models
pip install accelerate
pip install -r .ci/docker/ci_commit_pins/huggingface-requirements.txt || pip install transformers==4.54.0 soxr==0.5.0
TRANSFORMERS_VERSION_ID="$(python -c 'import os; os.chdir("/tmp"); import transformers; print(transformers.__version__)')"
elif [[ "${{ inputs.suite }}" == *"timm_models"* ]];then
Expand All @@ -139,7 +141,7 @@ runs:
fi
# for dlrm
pip install pyre-extensions
curl -fsSL https://raw.githubusercontent.com/facebookresearch/dlrm/refs/heads/torchrec-dlrm/requirements.txt |xargs pip install --no-deps
curl -fsSL https://raw.githubusercontent.com/facebookresearch/dlrm/refs/heads/torchrec-dlrm/requirements.txt |xargs pip install
# for soft_actor_critic, temp fix
pip install git+https://github.com/nocoding03/gym@fix-np
cd ../pytorch
Expand All @@ -152,10 +154,13 @@ runs:
TORCHBENCH_COMMIT_ID="$(git rev-parse --short HEAD)"
sed -i 's/^ *pynvml.*//' requirements.txt
pip install -r requirements.txt
python install.py --continue_on_fail
# python install.py --continue_on_fail
echo "PYTHONPATH=${PWD}:${PYTHONPATH}" >> ${GITHUB_ENV}
pip install dominate
python install.py Super_SloMo
# for dlrm
pip install pyre-extensions
curl -fsSL https://raw.githubusercontent.com/facebookresearch/dlrm/refs/heads/torchrec-dlrm/requirements.txt |xargs pip install --no-deps
curl -fsSL https://raw.githubusercontent.com/facebookresearch/dlrm/refs/heads/torchrec-dlrm/requirements.txt |xargs pip install
cd ../pytorch
else
pip install -r ./.ci/docker/requirements-ci.txt
Expand All @@ -170,6 +175,11 @@ runs:
else
pip install torchao --pre --index-url https://download.pytorch.org/whl/nightly/xpu
fi
if [ "${{ inputs.suite }}" != "None" ];then
# To install numpy 1.x for benchmarks as CUDA
# yolov requires numpy>=1.23
pip install -U numpy==1.26.4
fi
- name: Torch Config
shell: bash -xe {0}
run: |
Expand Down
6 changes: 5 additions & 1 deletion .github/actions/linux-uttest/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ runs:
tee ${{ github.workspace }}/ut_log/xpu_profiling/test_profiler_tree.log

- name: xpu_distributed
shell: timeout 3600 bash -xeu -o pipefail {0}
shell: timeout 36000 bash -xeu -o pipefail {0}
if: ${{ inputs.ut_name == 'xpu_distributed' }}
run: |
xpu-smi topology -m
Expand All @@ -166,9 +166,13 @@ runs:
echo -e "[ERROR] XCCL is not enabled"
exit 1
fi
export CCL_ROOT=$(dirname $(which python))/../
export PATH="${CCL_ROOT}/bin/libfabric:${PATH}"
export LD_LIBRARY_PATH="${CCL_ROOT}/lib:${LD_LIBRARY_PATH}"
python run_distributed.py \
2> ${{ github.workspace }}/ut_log/xpu_distributed/xpu_distributed_test_error.log | \
tee ${{ github.workspace }}/ut_log/xpu_distributed/xpu_distributed_test.log
find ../ -type f -name "*.xml" -exec cp {} ${{ github.workspace }}/ut_log/ \;

# Summary
- name: UT Test Results Summary
Expand Down
2 changes: 2 additions & 0 deletions .github/actions/pt2e/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,15 @@ runs:
shell: bash -xe {0}
run: |
# dataset
dataset_dir="${RUNNER_TEMP}/_datasets/imagenet"
if [ ! -d ${dataset_dir} ];then
rm -rf ${dataset_dir} && mkdir -p ${dataset_dir} && cd ${dataset_dir}
wget -O valprep.sh https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh
wget -q https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
tar -xf ILSVRC2012_img_val.tar
bash valprep.sh
fi
echo "dataset_dir=${dataset_dir}" >> ${GITHUB_ENV}
- name: PT2E Test (${{ inputs.dt }} ${{ inputs.scenario }})
shell: bash -xe {0}
run: |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,7 @@ TrOCRForCausalLM,pass,pass,pass,pass,pass
XGLMForCausalLM,pass,pass,pass,pass,pass
XLNetLMHeadModel,pass,pass,pass,pass,pass
YituTechConvBert,pass,pass,pass,pass,pass
Qwen/Qwen3-0.6B,pass,pass,pass,pass,pass
google/gemma-2-2b,pass,pass,pass,pass,pass
meta-llama/Llama-3.2-1B,pass,pass,pass,pass,pass
openai/whisper-tiny,pass,pass,pass,pass,pass
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,8 @@ TrOCRForCausalLM,pass,pass,pass,pass,pass
XGLMForCausalLM,pass,pass,pass,pass,pass
XLNetLMHeadModel,pass,pass,pass,pass,pass
YituTechConvBert,pass,pass,pass,pass,pass
# https://github.com/intel/torch-xpu-ops/issues/2055
Qwen/Qwen3-0.6B,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run
google/gemma-2-2b,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run
meta-llama/Llama-3.2-1B,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run
openai/whisper-tiny,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run,eager_fail_to_run
3 changes: 2 additions & 1 deletion .github/scripts/ut_result_check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,8 @@ if [[ "${ut_suite}" == 'op_regression' || "${ut_suite}" == 'op_regression_dev1'
fi

if [[ "${ut_suite}" == 'xpu_distributed' ]]; then
grep -E "^FAILED" xpu_distributed_test.log | awk '{print $2}' > ./"${ut_suite}"_xpu_distributed_test_failed.log
grep -E "^FAILED" xpu_distributed_test.log | awk '{print $3}' > ./"${ut_suite}"_xpu_distributed_test_failed.log
sed -i '/^[^.]\+/d' ./"${ut_suite}"_xpu_distributed_test_failed.log
grep "PASSED" xpu_distributed_test.log | awk '{print $1}' > ./"${ut_suite}"_xpu_distributed_test_passed.log
echo -e "========================================================================="
echo -e "Show Failed cases in ${ut_suite} xpu distributed"
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/_linux_accelerate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,15 +48,16 @@ concurrency:
defaults:
run:
shell: bash {0}
env:
GH_TOKEN: ${{ github.token }}
DOCKER_REGISTRY_AUTH_TOKEN: ${{ secrets.DOCKER_HUB_TOKEN }}

jobs:
conditions-filter:
name: conditions-filter
if: ${{ github.event.pull_request.draft == false }}
runs-on: ubuntu-24.04
timeout-minutes: 10
env:
GH_TOKEN: ${{ github.token }}
outputs:
disabled_tests: ${{ steps.check-pr-desc.outputs.disabled_tests }}
steps:
Expand Down
9 changes: 8 additions & 1 deletion .github/workflows/_linux_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ permissions: read-all
defaults:
run:
shell: bash -xe {0}
env:
DOCKER_REGISTRY_AUTH_TOKEN: ${{ secrets.DOCKER_HUB_TOKEN }}

jobs:
runner:
Expand Down Expand Up @@ -94,6 +96,7 @@ jobs:
- name: Build Pytorch on ${{ needs.runner.outputs.hostname }}
run: |
export USE_XCCL=1
export IS_XPU_CI=1
# only build pvc for CI
if [ "${{ github.event_name }}" == "pull_request" ];then
export TORCH_XPU_ARCH_LIST='pvc'
Expand Down Expand Up @@ -200,7 +203,6 @@ jobs:
python -c "import torchaudio; print(torchaudio.__version__)"
python pytorch/torch/utils/collect_env.py
pip list |grep -E 'torch|intel'
chmod 777 /__w /github ./ -R
- name: Upload Torch XPU Wheel
if: ${{ success() }}
uses: actions/upload-artifact@v4
Expand All @@ -213,3 +215,8 @@ jobs:
with:
name: Torch-XPU-Build-Log-${{ github.event.pull_request.number || github.sha }}
path: ${{ github.workspace }}/build_*.log
- name: Cleanup workspace
if: ${{ always() }}
run: |
chmod 777 /__w /github ./ -R
find ./ |grep -v "^\./$" |xargs rm -rf
2 changes: 1 addition & 1 deletion .github/workflows/_linux_e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ env:
GH_TOKEN: ${{ github.token }}
HF_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
DOCKER_REGISTRY_AUTH_TOKEN: ${{ secrets.DOCKER_HUB_TOKEN }}

jobs:
runner:
Expand Down Expand Up @@ -87,7 +88,6 @@ jobs:
cpus_per_xpu: ${{ needs.runner.outputs.cpus_per_xpu }}
MODEL_ONLY_NAME: ${{ inputs.model }}
AGENT_TOOLSDIRECTORY: /tmp/xpu-tool
dataset_dir: ${{ runner.temp }}/../_datasets/imagenet
steps:
- name: Checkout torch-xpu-ops
uses: actions/checkout@v4
Expand Down
19 changes: 10 additions & 9 deletions .github/workflows/_linux_e2e_summary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,15 @@ jobs:
steps:
- name: Checkout torch-xpu-ops
uses: actions/checkout@v4
- name: Install gh-cli
run: |
sudo apt-get update
sudo apt-get install gh rsync ca-certificates -y
- name: Setup python-${{ inputs.python }}
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python }}
- name: Install gh-cli
run: |
sudo apt-get update
sudo apt-get install gh rsync ca-certificates -y
pip install pandas requests
- name: Download Target Artifact
run: |
mkdir target/
Expand All @@ -64,30 +65,30 @@ jobs:
- name: Get summary
if: ${{ ! cancelled() }}
run: |
pip install pandas requests
exit_label=0
e2e_summary_csv="$(find ./target/ -name "inductor_*.csv" |head -n 1)"
if [ -f "${e2e_summary_csv}" ];then
bash ./.github/scripts/e2e_summary.sh ./target ./baseline >> ${GITHUB_STEP_SUMMARY}
exit_label=$(awk 'BEGIN{sum=0}{if($2>0){sum++}}END{print sum}' /tmp/tmp-result.txt)
if [ ${exit_label} -ne 0 ];then
grep -E "(Real failed|to passed|Warning timeout).*: [1-9]|Summary for" /tmp/tmp-*.txt |grep -E "failed|passed|timeout" -B 1
echo "There are ${exit_label} cases that need look into!!! Please check them"
exit ${exit_label}
fi
fi
pt2e_summary_csv="$(find ./target/ -name "summary.csv")"
if [ -f "${pt2e_summary_csv}" ];then
cat ${pt2e_summary_csv}
failed_num=$(grep -c ',failed' ${pt2e_summary_csv})
failed_num=$(grep -c ',failed' ${pt2e_summary_csv} || true)
if [ ${failed_num} -ne 0 ];then
echo "[Warning] PT2E has failures!"
grep 'failed' ${pt2e_summary_csv}
fi
fi
exit ${exit_label}
- name: Upload Reference Run ID
if: ${{ endsWith(inputs.test_type, 'ly') }}
run: |
gh --repo ${GITHUB_REPOSITORY} issue view ${REFERENCE_ISSUE_ID} --json body -q .body 2>&1 |tee new_body.txt 2>&1
has_or_not="$(grep -c 'Inductor-${{ inputs.test_type }}-LTS2' new_body.txt)"
has_or_not="$(grep -c 'Inductor-${{ inputs.test_type }}-LTS2' new_body.txt || true)"
if [ ${has_or_not} -ne 0 ];then
sed -i "s/Inductor-${{ inputs.test_type }}-LTS2:.*/Inductor-${{ inputs.test_type }}-LTS2: ${GITHUB_RUN_ID}/" new_body.txt
else
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/_linux_op_benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ env:
GH_TOKEN: ${{ github.token }}
HF_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
DOCKER_REGISTRY_AUTH_TOKEN: ${{ secrets.DOCKER_HUB_TOKEN }}
reference_issue: 1689

jobs:
Expand All @@ -50,8 +51,6 @@ jobs:
op_benchmark:
needs: runner
runs-on: ${{ needs.runner.outputs.runner_id }}
permissions:
issues: write
timeout-minutes: 900
container:
image: mengfeili/intel-pvc-driver:1146-1136
Expand Down Expand Up @@ -93,6 +92,8 @@ jobs:
op_benchmark_test_results_check:
needs: op_benchmark
runs-on: ubuntu-24.04
permissions:
issues: write
steps:
- name: Install gh-cli
run: |
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/_linux_transformers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ concurrency:
cancel-in-progress: true
env:
HF_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
DOCKER_REGISTRY_AUTH_TOKEN: ${{ secrets.DOCKER_HUB_TOKEN }}
HF_HUB_ETAG_TIMEOUT: 120
HF_HUB_DOWNLOAD_TIMEOUT: 120
python: ${{ inputs.python != '' && inputs.python || '3.10' }}
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/_linux_ut.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ env:
GH_TOKEN: ${{ github.token }}
HF_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
DOCKER_REGISTRY_AUTH_TOKEN: ${{ secrets.DOCKER_HUB_TOKEN }}
UT_SKIP_ISSUE: 1624

jobs:
Expand Down Expand Up @@ -99,11 +100,12 @@ jobs:

test-in-baremetal:
needs: runner
timeout-minutes: 600
if: ${{ contains(inputs.ut, 'distributed') }}
runs-on: ${{ needs.runner.outputs.runner_id }}
env:
AGENT_TOOLSDIRECTORY: /tmp/xpu-tool
PYTEST_ADDOPTS: -v --timeout 600 --timeout_method=thread -n 1
PYTEST_ADDOPTS: -v --timeout 3600 --timeout_method=thread -n 1
steps:
- name: Checkout torch-xpu-ops
uses: actions/checkout@v4
Expand Down
Loading