Skip to content

Conversation

@AlexeySachkov
Copy link
Contributor

This PR aims to address #780 by introducing a new KHR extension that document more headers that SYCL applications can include for better control of what they pay for in terms of compile time.

This PR aims to address KhronosGroup#780 by introducing a new
KHR extension that document more headers that SYCL applications can
include for better control of what they pay for in terms of compile
time.
Copy link
Contributor Author

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some comments for reviewers in advance.

See my talk at IWOCL 2025 for more background/context/motivation for this extension.

As of right now, we haven't prototyped the extension ourselves in Intel's SYCL implementation, but that will be done soon after first feedback on this proposal.

CTS haven't been updated yet to have a mode which uses this extension. What I discovered is that not that many CTS tests (in fact, just one (!)) test directly includes <sycl/sycl.hpp> header and the rest get it through things like common.h. So, before CTS can use this approach and benefit from it, some refactoring has to be done (which is somewhat already in motion, see my PRs like KhronosGroup/SYCL-CTS#1083 or KhronosGroup/SYCL-CTS#1077

Proper testing for this extension could actually be tricky. For example, Intel's SYCL headers are tightly interconnected and even if you include just one of them there is a high chance that you are getting a lot more. Therefore, its hard to use them to make sure that this specification did not forget about some class or function. I had an idea about taking synopsis headers from the spec and compiling CTS with them, but even without practically doing that I see that many synopsis headers are missing and it won't be the simplest thing to do (although it is probably a good exercise to make sure that CTS does not use any undocumented APIs).

Initial version of the proposal presented here roughly provides a separate header for almost every feature, bundling some of them together (or providing a higher-level aggregation header in some cases). From compile-time savings point of view it is not necessary to make some of those headers separately (like context, device, platform, etc. are very cheap), but it is not immediately clear how to bundle them together for simplicity of use (or whether it is needed at all).

Note that there are already some open issues recorded at the end of the proposed document.

Comment on lines 137 to 141
|This header provides access to functionality related to reductions (such as
[code]#reduction# interface and [code]#reducer class)

|[code]#<sycl/khr/includes/stream>#
|Contains definition of [code]#stream# class
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording that describes content of the headers changes all over the place. It took some time to prepare the PR and I was experimenting with different approaches as well.

I think that the first item here is to agree on the header contents and naming and then we can debate the wording describing them. Even though any immediate comments/suggestions about it are also welcome

- [code]#get_native#
- [code]#make_*#

Note that even though functions defined in the header operate with various
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some kind of leftover from a previous version. The intent for this note is to apply to all headers, i.e. the extension does not mandate a strict split. It is a valid (even though useless) implementation option to just include existing <sycl/sycl.hpp> from every new header.

Therefore, the note should probably be moved into some common section and generalized.

Also highlighting this to double-check that everyone is on board with the suggested non-strict approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should go with the non-strict approach. I don't think there is precedent anywhere for "if you include X, I promise you won't also get Y", it would limit what implementations can do, and it would be really hard to test. I think phrasing that says "You must include <sycl/sycl.hpp> or <sycl/header> to guarantee features X, Y and Z are available" makes the most sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generalized this note in ba0812d, also feedback on its wording is still welcome

then any macro defined by other extensions myst be made available through
[code]#<sycl/khr/includes/version># header.

== Open issues/questions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Immediately recording a point from @illuhad that we may want to have a header that provides forward-declarations only to gather feedback about this.

Comment on lines 454 to 457
Note that the simple swizzle functions ([code]#XYZW_SWIZZLE# and
[code]#RGBA_SWIZZLE# defined by the table 123) are only available when the macro
[code]#SYCL_SIMPLE_SWIZZLES# is defined before including
[code]#<sycl/khr/includes/vec>#.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could add a non-normative note here saying that it is recommended for developers to define SYCL_SIMPLE_SWIZZLES via the command-line and not in code, since this is the only way to guarantee that the macro is defined before the first include of vec (which may not be in user code)?

@AlexeySachkov
Copy link
Contributor Author

The topic was discussed at June 5th, 2025 SYCL WG call. My recollection of the discussion:

Headers should have .hpp suffix.
There were no strong opinions, but it seemed like there is a preference towards having a file extension.
It seems like no other library (except for the standard library itself) provides headers without file extensions. Plus, its presence helps with syntax highlighting.

SYCL_SIMPLE_SWIZZLES.
If simple swizzles are needed, the macro must be defined before any of the proposed headers (or existing <sycl/sycl.hpp> is included). The recommendation should be to set it on a command line to avoid possible mistakes with the order of preprocessor directives.

Wording.
C++ spec seems to use the word "provides", so we should stick with it as well.

Forward declarations.
Not a huge problem at the moment and it can be added as a separate thing later on.

Let me know if anything is wrong or missing. Otherwise, I'm going to apply comments above to the PR

@PeterTh
Copy link

PeterTh commented Jun 11, 2025

I have prototyped this extension in SimSYCL.
Good news overall: a lot of it are trivial 1:1 or 1:2 matches with the existing internal structure, which is one indication that the proposed grouping makes sense.

Only the following points that I think are worth discussing came up for me:

  1. Should sycl.hpp define SYCL_KHR_INCLUDES if the extension is supported?
  2. Do we want submit.hpp? It's the only instance of this pattern where the extension provides both a pre-made grouping and individual headers. I have no strong opinion here.
  3. Should index_space.hpp be split up further?
  4. Should groups.hpp be split up further?
    This is actually the only one I feel a bit more strongly about. I think it should be possible for libraries to provide functionality which interacts with group types without pulling in all algorithms.
  5. Should including half.hpp do anything for math on halfs?
  6. For reduction.hpp, I think the table should be clarified to note that it does not include reduction-related (sub)group stuff.
  7. While making my notes, I proposed replacing property_list.hpp with property.hpp and make it also include all the properties. But I'm not really sure about that one after considering it a bit more.

A nice side-effect of having individual headers is that it allows a better user experience in a few select scenarios. For example, if the user explicitly includes half.hpp it's pretty sure that they want to use halfs, so if the SimSYCL installation doesn't support them we can provide an immediate error.

My detailed notes on each header
  • version.hpp
    • 1:1 mapping (macros.hh)
    • (obviously with the inclusion of SYCL_KHR_INCLUDES)
    • Should sycl.hpp define SYCL_KHR_INCLUDES if the extension is supported?
  • submit.hpp
    • 1:3 mapping (event.hh, handler.hh, queue.hh)
    • This seems to be the only instance of this particular pattern (i.e. gathering things available in other individual headers) in the proposal. I don't have a particularly strong opinion about it, but it sticks out.
  • queue.hpp
    • 1:1 mapping (queue.hh)
  • handler.hpp
    • 1:1 mapping (handler.hh)
  • event.hpp
    • 1:1 mapping (event.hh)
  • device.hpp
    • 1:1 mapping (device.hh) (Even with selectors!)
  • platform.hpp
    • 1:1 mapping (platform.hh)
  • context.hpp
    • 1:1 mapping (context.hh)
  • index_space.hpp
    • 1:N mapping (id.hh, item.hh, nd_item.hh, range.hh, nd_range.hh)
    • I see why these are grouped, but on the other hand, it is quite easy to imagine applications that only need a subset of these types.
  • usm.hpp
    • 1:1 mapping (usm.hh)
  • buffer.hpp
    • 1:1 mapping (buffer.hh)
  • accessor.hpp
    • 1:1 mapping (accessor.hh)
  • atomic.hpp
    • 1:2 mapping (atomic_ref.hh, atomic_fence.hh) - I think this is fine
  • half.hpp
    • Not a 1:1 mapping; if supported, SimSYCL defines this in forward.hh
    • Individual headers actually can improve user experience (error if not supported).
    • "Contains definition of half class" -- clarify what this means for math
  • vec.hpp
    • 1:1 mapping (vec.hh)
  • marray.hpp
    • 1:1 mapping (marray.hh)
  • math.hpp
    • 1:1 mapping (math.hh)
  • groups.hpp
    • 1:N mapping (group.hh, sub_group.hh, group_algorithms.hh, group_functions.hh)
    • I feel like we should consider splitting this up further; these aren't necessarily tiny headers.
      And e.g. a library interface will need the group types, but not the algorithms or functions.
  • multi_ptr.hpp
    • 1:1 mapping (multi_ptr.hh)
  • functional.hpp
    • Maps to binary_ops.hh in SimSYCL
    • Not sure about this name, but I also don't have a better suggestion.
  • reduction.hpp
    • 1:1 mapping (reduction.hh)
    • Table should be clarified w.r.t subgroup reductions.
      "functionality related to reductions" could be interpreted to include that, but I don't think that's intended.
  • stream.hpp
    • 1:1 mapping (stream.hh)
  • type_traits.hpp
    • 1:1 mapping (type_traits.hh)
  • property_list.hpp
    • almost 1:1 mapping (property.hh)
    • I would be in favor of renaming this to property.hpp and also have it define properties (i.e. everything in the simsycl::sycl::property namespace). But this is not a strong opinion.
  • kernel_bundle.hpp
    • N:1 mapping (kernel.hh) - I think this is fine
  • images.hpp
    • 1:2 mapping (image.hh, image_accessor.hh) - I think this is fine
  • exception.hpp
    • 1:2 mapping (exception.hh, async_handler.hh) - I think this is fine
  • hierarchical_parallelism.hpp
    • 1:2 mapping (h_item.hh, private_memory.hh) - I think this is fine
  • interop_handle.hpp
    • 1:1 mapping (interop_handle.hh)
  • backend.hpp
    • 1:1 mapping (backend.hh)
  • bit.hpp
    • 1:1 mapping to , SimSYCL requires C++20
  • span.hpp
    • 1:1 mapping to , SimSYCL requires C++20
  • byte.hpp
    • Defined in forward.hh in SimSYCL

@AlexeySachkov
Copy link
Contributor Author

Thanks for the feedback, @PeterTh!

  1. Should sycl.hpp define SYCL_KHR_INCLUDES if the extension is supported?

Yes, I think that it is already covered by the proposal: version.hpp header includes extension macro and it should be included into existing sycl/sycl.hpp.

  1. Do we want submit.hpp? It's the only instance of this pattern where the extension provides both a pre-made grouping and individual headers. I have no strong opinion here.

I think that we need feedback from @Pennycook here. I'm not opposed to dropping it, neither I am opposed to having it. Both queue and handler always go side by side if accessors are in use, but with USM and in-order queues you can save on skipping handler as it is quite expensive. I.e. if you use queue.submit, you can just use submit header and save yourself a single include.

  1. Should index_space.hpp be split up further?

My background with the split of those headers is to improve compile-times. From what I see in Intel's implementation is that those index space classes are quite small and cheap in terms of compile-time. And those classes oftentimes go together in user's applications. However, I'm looking once again at the data I gathered from zjin-lcf/HeCBench and I see that only nd_range and nd_item go together often, but not id, range and item:

result2

This chart shows how many benchmarks that use feature A (let's say rows) also used feature B (columns).

Though, my data gathering approach wasn't perfect, i.e. it was a plain grep which definitely missed all uses of id/item when a kernel lambda was defined using auto keyword.

I'm not opposed to splitting this header, but I do not see much value in such split.

  1. Should groups.hpp be split up further?
    This is actually the only one I feel a bit more strongly about. I think it should be possible for libraries to provide functionality which interacts with group types without pulling in all algorithms.

I can imagine that group algorithms are way heavier than any group class itself. However, what would be a good split here?
I imagine that fences and barriers are cheap and may be used more often than scans - will we have a separate header for group synchronization functions?

  1. Should including half.hpp do anything for math on halfs?

The intent here was that half.hpp only contains half definition by itself, without corresponding math functions. If you need math, then you would also need to include math.hpp.

  1. For reduction.hpp, I think the table should be clarified to note that it does not include reduction-related (sub)group stuff.

Noted, I will take a deeper look into this header, I'm not very familiar with the reductions functionality.

  1. While making my notes, I proposed replacing property_list.hpp with property.hpp and make it also include all the properties. But I'm not really sure about that one after considering it a bit more.

I wonder if we want to have dedicated headers for different properties, because I have many questions when it comes to extensions. There could be ones which add new properties (and in fact, we have many of those at intel/llvm)

  • does it mean that all those properties should be exposed through a header that corresponds to a class those properties apply to?
  • what if a property is applicable to multiple classes?
  • what if it isn't applicable to any class at all and instead passed to a free function (some group algorithm, for example)?

Similar questions are applicable to properties documented by the core spec, so it warrants a thorough look at them.

@Pennycook
Copy link
Contributor

I think that we need feedback from @Pennycook here. I'm not opposed to dropping it, neither I am opposed to having it. Both queue and handler always go side by side if accessors are in use, but with USM and in-order queues you can save on skipping handler as it is quite expensive. I.e. if you use queue.submit, you can just use submit header and save yourself a single include.

I think we should split it up.

I can't find where I posted it, but I said something to the effect of "We should provide the most fine-grained split we're comfortable with, and let users group things if they want to." If we don't have submit.hpp in the first extension, people can define their own version, we can get feedback on the initial split, and potentially include it later. If we do have submit.hpp in the first extension, we're stuck with it forever.

I can imagine that group algorithms are way heavier than any group class itself. However, what would be a good split here? I imagine that fences and barriers are cheap and may be used more often than scans - will we have a separate header for group synchronization functions?

What if we put the groups in <group> and the group algorithms in <algorithm>?

I think it would make sense to put group_broadcast and group_barrier in <group>, because they're defined as top-level group functions. Everything else can be an opt-in where you need both <group> and <algorithm>. (I say that because we could imagine adding some device-wide algorithms in the future, so <algorithm> needn't pull in <group> if it doesn't want to.)

@PeterTh
Copy link

PeterTh commented Jun 11, 2025

I can imagine that group algorithms are way heavier than any group class itself. However, what would be a good split here?
I imagine that fences and barriers are cheap and may be used more often than scans - will we have a separate header for group synchronization functions?

What we currently have in SimSYCL is group, sub_group, group_algorithms, and group_functions. I think that's a bit overkill, but I would like e.g. group.hpp for both class definitions and group_algorithms.hpp for all functions on them?

Also, I did some basic analysis on the current implementation in SimSYCL:
header_overview
Note that this is without opting into swizzles. I'm not quite sure why some of the top ones are so heavy, but it's probably a quality of implementation issue (as in, in our internal implementation we pull in things that don't strictly need to be).

@gmlueck
Copy link
Contributor

gmlueck commented Jun 12, 2025

I wonder if we want to have dedicated headers for different properties, because I have many questions when it comes to extensions. There could be ones which add new properties (and in fact, we have many of those at intel/llvm)

I agree. I think it makes more sense to include the properties along with the interface(s) that use them. For example, enable_profiling would be provided by the same header that provide the queue class. In cases where a property is used by more than one function, the property might be provided by either of two headers. For example, no_init is used by by buffer accessors and image accessors, so it would be provided by both accessor.hpp and image.hpp. There is precedent in C++ for identifiers which are provided by multiple headers. For example, there are 7 C++ headers that can provide size_t.

What would be the advantage of combining all of the properties into a single header?

@AlexeySachkov
Copy link
Contributor Author

Headers should have .hpp suffix.

Added .hpp suffix in 736fe3e

SYCL_SIMPLE_SWIZZLES.

Recorded suggested resolution in 7e6881d

Wording.

Switched most of headers to use "provides" wording in 6aaa8df

@gmlueck, I haven't resolved all your comments yet, but most of them are addressed.

@PeterTh, @Pennycook,

Items which are still on my TODO list:

  • remaining comments from @gmlueck
  • reduction.hpp clarification from @PeterTh
  • Each property should be in its own separate header

@AlexeySachkov
Copy link
Contributor Author

AlexeySachkov commented Jul 22, 2025

@PeterTh, I made an attempt at reduction.hpp clarification in 7187f28 - let me know if you had something else in mind.

Properties were outlined into distinct headers in 59b5e14, see also #892

With those recent commits I believe that I've addressed all feedback. Whilst I wait for other comments, I will be working on revamping the prototype I had for intel/llvm and preparing a dedicated test sub-suite in the CTS for the extension

@PeterTh
Copy link

PeterTh commented Jul 24, 2025

Looks good to me!

@tomdeakin
Copy link
Contributor

Looks good to me!

Would you be able to formally review it and give approval if you're happy, @PeterTh ? Thanks!

|Provides definition of the [code]#context# class.

|[code]#<sycl/khr/includes/index_space.hpp>#
|Provides definition of the most index space identifiers from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|Provides definition of the most index space identifiers from
|Provides definitions of most of the index space identifiers from

class aliases.

|[code]#<sycl/khr/includes/functional.hpp>#
|Provides definitions of function objects like [code]#plus# or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|Provides definitions of function objects like [code]#plus# or
|Provides definitions of function objects like [code]#plus# and

* [code]#usm::alloc# enumeration
* [code]#usm_allocator# class
* Free functions like [code]#malloc_device#, [code]#aligned_alloc_host#,
[code]#malloc# and [code]#get_pointer_type# as from sections
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[code]#malloc# and [code]#get_pointer_type# as from sections
[code]#malloc# and [code]#get_pointer_type# from sections


Any extension which does not explicitly document how it can be accessed through
header files should be assumed to be available only through
[code]#<sycl/sycl.hpp>#.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to sketch out how our other KHRs will interact with this extension.

These KHRs only add new members to existing classes, so I presume they would not introduce any new headers:

  • sycl_khr_default_context
  • sycl_khr_queue_empty_query
  • sycl_khr_queue_flush

The sycl_khr_group_interface KHR adds a bunch of new classes. I presume that KHR would provide a new header named <sycl/khr/group_interface.hpp>?

What about sycl_khr_max_work_group_queries? This KHR just adds two new device descriptors sycl::khr::info::device::max_work_group_range and khr::info::device::max_work_group_range_size. Would we add a new header file with just those two descriptor named <sycl/khr/max_work_group_queries.hpp>? Or, would we document that <sycl/khr/includes/device.hpp> also includes these descriptors?

{endnote}

[[sec:khr-includes-version]]
=== [code]#<sycl/khr/includes/version.hpp># header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TOC "level" is not right here. Currently, this and all the other sections that describe each header are under "Extension overview", which isn't right. I think there are two possibilities:

  • Create a new section that is a peer of "Extension overview" named something like "New headers". Then, each section that describes a header will be under "New headers". Or,
  • Raise the level of each of these sections, so they are all peers of "Extension overview".

I have a preference for the first option.

This header contains definition of [code]#byte# type alias

[[sec:khr-includes-other-extensions]]
=== Co-existence with other extensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
=== Co-existence with other extensions
== Co-existence with other extensions

This should not be under "Extension overview". If you decide to add a new section named "New headers", then it should not be under that either.

If an implementation supports this extension, then any macro defined by other
supported extensions must be defined in [code]#<sycl/khr/includes/version.hpp>#.

== Open issues/questions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not have any open issues, let's remove this section.

AlexeySachkov added a commit to intel/llvm that referenced this pull request Oct 22, 2025
The specification can be found in KhronosGroup/SYCL-Docs#814

The implementation is a non-functional change for the existing codebase,
but definitions of macro documented by the SYCL 2020 specifications were
moved around a little bit to provide them all through the new
`sycl/khr/version.hpp`.

New headers are always present and their content is always available.
However, we do not define `SYCL_KHR_INCLUDES` macro yet because the
extension is still on review. That indicates that it is not formally
supported by the implementation.
* Backend macros in the form of [code]#SYCL_BACKEND_<backend_name># defined by
<<sec:backend-macros>>.

[code]#<sycl/khr/includes/version.hpp># header is included by every other header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor bikeshedding point: Is version.hpp the right name for this header? As I understand, it would provide all macro definitions of SYCL. It seems a bit confusing to me to have to include version.hpp if I want to get e.g. SYCL_EXTERNAL. Perhaps, macros.hpp might be more appropriate? Or since, it's included by all other headers, something more non-descriptive like core_defs.hpp?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider providing an alternative to the monolith sycl/sycl.hpp header Add forward declaration header

6 participants