Skip to content

Conversation

@mateuszpn
Copy link
Contributor

@mateuszpn mateuszpn commented Nov 21, 2025

Fix llvm-lit to enable the simultaneous use of _v1 and _v2 to select the l0 adapter and the use of arch to select the GPU, eg.:
llvm-lit --param "sycl_devices=level_zero_v2:arch-intel_gpu_mtl_u" sycl/test-e2e
Necessary to run tests on integrated GPU on system with another GPU
Includes fixed setting of ZE_AFFINITY_MASK in multi-gpu systems

@mateuszpn mateuszpn requested a review from a team as a code owner November 21, 2025 15:33
@aelovikov-intel aelovikov-intel changed the title [SYCL][e2e-tests] fix lit for arch selection and l0v2 adapter [SYCL][E2E] Fix lit for arch selection and l0v2 adapter Nov 21, 2025
Comment on lines 337 to 341
backend, device = parsed_dev_name.split(":", 1)
device_selector = parsed_dev_name
if backend == "level_zero" and device.isdigit():
extra_env.append(f"ZE_AFFINITY_MASK={device}")
device_selector = f"{backend}:0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some local lit configs, eg. sycl/test-e2e/AddressSanitizer/lit.local.cfg set affinity mask to 0. It needs to be overridden if we do not use level_zero:0 device. Comment added in the code

Copy link
Contributor

@aelovikov-intel aelovikov-intel Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we do that here and not fix AddressSanitizer? IMO, this file should just save {device} somewhere in the config and the AddressSanitizer/lit.local.cfg should use that for setting ZE_AFFINITY_MASK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I missed that this was specifically for asan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aelovikov-intel Setting of ZE_AFFINITY_MASK moved to local configs.

"opencl": ("cpu", "gpu", "fpga"),
"cuda": "gpu",
"level_zero": "gpu",
"level_zero": ("gpu", "0", "1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trying to allow llvm-lit --param sycl_devices=level_zero:0? We intentionally decided to disallow that in review of #18197

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the arch- setting ends up pointing at particular device, eg. level_zero:1, we do not do it in the params. Removed.

expanded = "env"

extra_env = get_extra_env([parsed_dev_name])
backend, device = parsed_dev_name.split(":", 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example of the intended lit string after the change in the PR description is

llvm-lit --param "sycl_devices=level_zero_v2:arch-intel_gpu_mtl_u" sycl/test-e2e

but I was expecting something like

llvm-lit --param "sycl_devices='level_zero_v1:gpu;level_zero_v2:gpu" sycl/test-e2e

Can you clarify on the intended use case? Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to force tests to run on specified device, eg. integrated gpu, even when other devices are present. The option was available before (see lit.cpg.py:901). Although it is rarely used, it is still perfect for testing integrated GPUs. Unfortunately, due to a bug, it did not work with the _v1 or _v2 postfixes in the backend name. The PR fixes this bug.

Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment if ZE_AFFINITY_MASK is the right way to pin it to a specific device but other than that lgtm, thanks!

Signed-off-by: Mateusz P. Nowak <[email protected]>
features.update(sg_size_features)
features.update(architecture_feature)
features.update(device_family)
features.update(aspects)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this line? Isn't 1120 enough?

Comment on lines -12 to -14
import lit.formats
import lit.util

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated.

config.unsupported=True

config.environment["ZE_AFFINITY_MASK"] = "0"
if hasattr(config, 'ze_affinity_mask') and config.ze_affinity_mask is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have

# Determine ZE_AFFINITY_MASK for Level Zero devices.
# Sanitizer tests need to set ZE_AFFINITY_MASK to a single device index
config.ze_affinity_mask = None

below, is hasattr check needed?

Comment on lines +36 to +37
else:
config.environment["ZE_AFFINITY_MASK"] = "0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is else really needed? IIUC, config.ze_affinity_mask would be set for L0 devices, and for CPU this env. variable is meaningless anyway.

Comment on lines +354 to +358
# If ZE_AFFINITY_MASK is set, it filters devices so we should use :0
device_selector = parsed_dev_name
if "ZE_AFFINITY_MASK" in test.config.environment:
backend, _ = parsed_dev_name.split(":", 1)
device_selector = f"{backend}:0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I recommend this way, having code trying to play nice with existing envvars seems bad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants