Skip to content

Drivers from 4885 and newer break IPEX for native windows.  #442

Open
@Mindset-Official

Description

@Mindset-Official

Describe the issue

I have tried running it in both Sd.Next and ComfyUI and both fail when trying to generate an image. There is no error message it just seems to crash the Webui comletely. 4676 and older worked perfectly fine. Since there is no error message I can't really tell you what is broken. I believe the driver team is notified but I'm not sure what they can do since it's not officially supported, so I figured I would also post in here as well.

Wsl2 seems to still work fine.

a750
Windows 11
AOT compiled IPEX for windows
ryzen 5600
32gb of ddr4 at 3200

Activity

jingxu10

jingxu10 commented on Oct 11, 2023

@jingxu10
Contributor
Nuullll

Nuullll commented on Oct 11, 2023

@Nuullll

+1.

there is no error message

Under some circumstances, I can see "Abort was called at 198 line in file:" -- I believe this is raised from compute runtime.

I'm trying to isolate the issue.

Mindset-Official

Mindset-Official commented on Oct 12, 2023

@Mindset-Official
Author

+1.

there is no error message

Under some circumstances, I can see "Abort was called at 198 line in file:" -- I believe this is raised from compute runtime.

I'm trying to isolate the issue.

Just to confirm, I also got this a few times.

Vipitis

Vipitis commented on Oct 12, 2023

@Vipitis

accelerate with --use_xpu or ipex enabled in config also throws exit status 3221225477 with A750 on Windows 10 and driver 4887

Nuullll

Nuullll commented on Oct 15, 2023

@Nuullll

It seems that driver 4885 was breaking backward compatibility against previous drivers.

The officially released IPEX Windows JIT wheels work fine with the following reproducer (the image was generated as expected):

import torch
import intel_extension_for_pytorch as ipex
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16).to("xpu")
prompt = "a photograph of an astronaut riding a horse"
image = pipe(prompt).images[0]
image.save(f"astronaut_rides_horse.png")

However, if I use IPEX AOT wheels built from source with driver 4676 (or earlier) (for example, https://github.com/Nuullll/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu-master%2Bdll-bundle), the program crashes.

pip install https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.0.110%2Bxpu-master%2Bdll-bundle/torch-2.0.0a0+gite9ebda2-cp310-cp310-win_amd64.whl
pip install https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.0.110%2Bxpu-master%2Bdll-bundle/intel_extension_for_pytorch-2.0.110+gitc6ea20b-cp310-cp310-win_amd64.whl
pip install diffusers transformers
set SYCL_PI_TRACE=2
python reproducer.py 1> trace.log 2>&1

trace.log

---> piKernelCreate(
	<unknown> : 000001CA0D158570
	<const char *>: _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_
	<unknown> : 0000000DD9BE8C78
PI ---> (*RetKernel)->initialize()
PI ---> piProgramRetain(Program)
) ---> 	pi_result : PI_SUCCESS
	[out]<unknown> ** : 0000000DD9BE8C78[ 000001C9B5B0E8D0 ... ]

...

---> piEnqueueKernelLaunch(
	<unknown> : 000001C8A3717830
	<unknown> : 000001C9B5B0E8D0
	<unknown> : 1
	<unknown> : 0000000DD9BEA658
	<unknown> : 0000000DD9BEA628
	<unknown> : 0000000DD9BEA640
	<unknown> : 0
	pi_event * : 0000000000000000[ nullptr ]
	pi_event * : 000001C8A7A7D1D8[ 0000000000000000 ... ]
PI ---> Queue->insertStartBarrierIfDiscardEventsMode(CommandList)
PI ---> EventCreate(Queue->Context, Queue, ForceHostVisible, Event)
PI ---> piEventRetain(*Event)
PI ---> piKernelRetain(Kernel)

Crashed while executing piEnqueueKernelLaunch for kernel _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_

Probably I should compile IPEX with driver 4885?

Mindset-Official

Mindset-Official commented on Oct 15, 2023

@Mindset-Official
Author

It seems that driver 4885 was breaking backward compatibility against previous drivers.

The officially released IPEX Windows JIT wheels work fine with the following reproducer (the image was generated as expected):

import torch
import intel_extension_for_pytorch as ipex
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16).to("xpu")
prompt = "a photograph of an astronaut riding a horse"
image = pipe(prompt).images[0]
image.save(f"astronaut_rides_horse.png")

However, if I use IPEX AOT wheels built from source with driver 4676 (or earlier) (for example, https://github.com/Nuullll/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu-master%2Bdll-bundle), the program crashes.

SYCL_PI_TRACE=2 log trace.log

---> piKernelCreate(
	<unknown> : 000001CA0D158570
	<const char *>: _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_
	<unknown> : 0000000DD9BE8C78
PI ---> (*RetKernel)->initialize()
PI ---> piProgramRetain(Program)
) ---> 	pi_result : PI_SUCCESS
	[out]<unknown> ** : 0000000DD9BE8C78[ 000001C9B5B0E8D0 ... ]

...

---> piEnqueueKernelLaunch(
	<unknown> : 000001C8A3717830
	<unknown> : 000001C9B5B0E8D0
	<unknown> : 1
	<unknown> : 0000000DD9BEA658
	<unknown> : 0000000DD9BEA628
	<unknown> : 0000000DD9BEA640
	<unknown> : 0
	pi_event * : 0000000000000000[ nullptr ]
	pi_event * : 000001C8A7A7D1D8[ 0000000000000000 ... ]
PI ---> Queue->insertStartBarrierIfDiscardEventsMode(CommandList)
PI ---> EventCreate(Queue->Context, Queue, ForceHostVisible, Event)
PI ---> piEventRetain(*Event)
PI ---> piKernelRetain(Kernel)

Crashed while executing piEnqueueKernelLaunch for kernel _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_

Probably I should compile IPEX with driver 4885?

You could try and see, but the official wheels haven't been updated (afaik) so I don't think they were compiled on the latest drivers. Maybe the new drivers break something in AOT?

Nuullll

Nuullll commented on Oct 16, 2023

@Nuullll

I tried compiling IPEX AOT for Arc with driver 4887. The reproducer still crashes with the same SYCL PI TRACE log.

Mindset-Official

Mindset-Official commented on Oct 26, 2023

@Mindset-Official
Author

Are there any updates on whats going on with the newest drivers? I personally haven't tried the very latest but have heard it is also not working from others.(I may give it a shot if someone says otherwise). Any progress on figuring out what's happening?

Nuullll

Nuullll commented on Oct 26, 2023

@Nuullll

I can confirm that Driver 4885, 4887 and 4900 all cannot work with IPEX AOT, simply because they ship the same Level Zero Compute Runtime "1.3.27193".

Mindset-Official

Mindset-Official commented on Oct 26, 2023

@Mindset-Official
Author

I take it this is completely driver level and no way to override and install the older runtime version?

Nuullll

Nuullll commented on Oct 26, 2023

@Nuullll

I take it this is completely driver level and no way to override and install the older runtime version?

I tried to replace the driver storage files ze_intel_gpu64.dll, ze_loader.dll, ze_tracing_layer.dll, ze_validation_layer.dll under C:\Windows\System32 with the older dlls. But apparently I could've missed something -- failed to load compute runtime library.

Mindset-Official

Mindset-Official commented on Oct 26, 2023

@Mindset-Official
Author

that's way above my level, however in my folder I do not see a ze_intel_gpu64.dll in the main folder but only in one of the driver state repository folders, this is driver 4676

Nuullll

Nuullll commented on Oct 26, 2023

@Nuullll

that's way above my level, however in my folder I do not see a ze_intel_gpu64.dll in the main folder but only in one of the driver state repository folders, this is driver 4676

Yes, correct. 4 ze_*.dll in driver storage folder and 3 ze_*.dll in system32. I replaced them all but still got no luck :-(

Nuullll

Nuullll commented on Nov 2, 2023

@Nuullll

The issue is gone with Driver 4952

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @Nuullll@Vipitis@jingxu10@Mindset-Official

        Issue actions

          Drivers from 4885 and newer break IPEX for native windows. · Issue #442 · intel/intel-extension-for-pytorch