Skip to content
160 changes: 38 additions & 122 deletions adoc/chapters/device_compiler.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,7 @@ Amongst other things, this restriction makes it illegal for a
However, a function may be defined in another translation unit if the
implementation defines the [code]#SYCL_EXTERNAL# macro as described in
<<subsec:syclexternal>>.
* Use of the [code]#long double# type results in undefined behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'd say:

Device code must not use the long double type.

Statements in this list are inconsistent. Some say that using a prohibited feature causes undefined behavior, while others say that application should not use the feature. It seems to me that we should consistently say that device code must not use these features.

You may argue that an extension might lift some of these restrictions. That's fine. An extension is allowed to lift any prohibition in the spec and assign some defined meaning. There is no need to use the "undefined behavior" wording to allow an extension to do this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had previously suggested that we address long double via #371. I think undefined behavior is the status quo.

Does "device code must not use these features" mean that such code is ill-formed? I think ISO standards reserve "must" for use in implication (A must be B because of C). I understand that other specifications use the term to specify requirements (e.g., RFC 2119), but I think its use can be confusing. I think "shall" communicates the intent better.

It wouldn't be a bad thing for the SYCL specification to explicitly state its requirement terminology; perhaps via reference to RFC 2119 or similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had previously suggested that we address long double via #371. I think undefined behavior is the status quo.

I still like the direction proposed in #371 -- that is that we make long double an optional device feature that is tied to some new aspect. However, that is probably something that needs to wait for SYCL-Next.

I think the status quo in DPC++ is that use of long double in device code is ill-formed. I think the compiler diagnoses a compile-time error for this. (Sorry, haven't checked lately, but this is my memory.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the status quo in DPC++ is that use of long double in device code is ill-formed. I think the compiler diagnoses a compile-time error for this. (Sorry, haven't checked lately, but this is my memory.)

I was curious about this so I just went to check.

By default, DPC++ throws an error because long double defaults to 128-bit, which isn't supported in SPIR-V:

test.cpp:10:51: error: 'max' requires 128 bit size 'long double' type support, but target 'spir64-unknown-unknown' does not support it
   10 |         ld[0] = std::numeric_limits<long double>::max();

You can circumvent that check by adding the -mlong-double-64 flag. In other words, DPC++ doesn't check for long double, it just fails if you try to use 128- or 80-bit data types.

SimSYCL also doesn't do any checks for use of long double, and because it's a header-only implementation I can't immediately think of a way they could add these checks. It's also worth noting that device code using long double always works in SimSYCL, because the code is running on the host CPU.

So I think the status quo is undefined behavior. Sometimes it's ill-formed, sometimes you can make it work (by accident), and sometimes it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can circumvent that check by adding the -mlong-double-64 flag. In other words, DPC++ doesn't check for long double, it just fails if you try to use 128- or 80-bit data types.

Yes, and some people are Argonne is using this flag in production

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. What SPIR-V does DPC++ generate if you compile with -mlong-double-64?

It just treats every instance of long double as if it were double, so everything is generated using OpTypeFloat 64.

I'm not convinced it works properly, because all of the tests I've tried so far have returned nan. But the code compiles without producing any errors or warnings, and runs without producing any errors or warnings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.

Yes, and some people are Argonne is using this flag in production

Why? Is it just because you have existing code that uses long double and it's easier to pass the flag than to rewrite the code to use double?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. It begs the question why it even works, and why they long double in the first place, but 🤷🏽

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Argonne needs it because they are using user-defined literals (UDLs). A request for UDL support is what lead me to file #371 in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, there may be a cleaner solution now that we have changed the spec to allow long double in manifestly constant-evaluated expressions. As I noted in #379, I think an application can define the the UDL operator"" as consteval, and this will cause the evaluation of UDL's to happen at compile time as a constant expression. SYCL will allow this even in device code because we lifted the restriction about long double (and all other "forbidden" C++ features) when they are used in manifestly constant-evaluated expressions.

Of course, consteval requires the application to compile in C++20 mode.


Inside a <<discarded-statement>> or in the case of a
<<manifestly-constant-evaluated>>, any code accepted by the C++ standard in this
Expand All @@ -216,132 +217,23 @@ The restriction waiver in <<discarded-statement>> or
<<device-function>>.
====

[[subsec:scalartypes]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the latest version of the "main" branch has text that refers to this section identifier. (This was part of the recent vec clarification.) You will need to reword that new text somehow to avoid the reference.

== Built-in scalar data types
== Data types

In a SYCL device compiler, the device definition of all standard {cpp}
fundamental types from <<table.types.fundamental>> must match the host
definition of those types, in both size and alignment.
A device compiler may have this preconfigured so that it can match them based on
the definitions of those types on the platform, or there may be a necessity for
a device compiler command-line option to ensure the types are the same.
All fundamental types defined by the {cpp} core language must be available in
device code except for those listed in <<sec:language.restrictions.kernels>>.
Whether extended integer and/or extended floating-point types are available in
device code is implementation-defined.

The standard {cpp} fixed width types, e.g. [code]#int8_t#, [code]#int16_t#,
[code]#int32_t#,[code]#int64_t#, should have the same size as defined by the
{cpp} standard for host and device.


[[table.types.fundamental]]
.Fundamental data types supported by SYCL
[width="100%",options="header",separator="@",cols="65%,35%"]
|====
@ Fundamental data type @ Description
a@
[source]
----
bool
----
a@ A conditional data type which can be either true or false. The value
true expands to the integer constant 1 and the value false expands to the
integer constant 0.

a@
[source]
----
char
----
a@ A signed or unsigned 8-bit integer, as defined by the {cpp} core language

a@
[source]
----
signed char
----
a@ A signed 8-bit integer, as defined by the {cpp} core language

a@
[source]
----
unsigned char
----
a@ An unsigned 8-bit integer, as defined by the {cpp} core language

a@
[source]
----
short int
----
a@ A signed integer of at least 16-bits, as defined by the {cpp} core language

a@
[source]
----
unsigned short int
----
a@ An unsigned integer of at least 16-bits, as defined by the {cpp} core language

a@
[source]
----
int
----
a@ A signed integer of at least 16-bits, as defined by the {cpp} core language

a@
[source]
----
unsigned int
----
a@ An unsigned integer of at least 16-bits, as defined by the {cpp} core language

a@
[source]
----
long int
----
a@ A signed integer of at least 32-bits, as defined by the {cpp} core language

a@
[source]
----
unsigned long int
----
a@ An unsigned integer of at least 32-bits, as defined by the {cpp} core language

a@
[source]
----
long long int
----
a@ An integer of at least 64-bits, as defined by the {cpp} core language

a@
[source]
----
unsigned long long int
----
a@ An unsigned integer of at least 64-bits, as defined by the {cpp} core language

a@
[source]
----
float
----
a@ A 32-bit floating-point. The float data type must conform to the IEEE 754
single precision storage format.

a@
[source]
----
double
----
a@ A 64-bit floating-point. The double data type must conform to the IEEE 754
double precision storage format. This type is only supported on devices
that have [code]#aspect::fp64#.

|====
The availability of a type in device code does not guarantee that it will be
supported by all devices.
Some types are only supported on devices with specific device aspects, as
described in <<sec:device-aspects>>.

All types which are available in device code must have the same size, alignment
requirements, and representation on both host and device.
Comment on lines +232 to +233

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think enumeration types should be addressed in this section as well. There is an implementation requirement as well as a user requirement. For enumerations with a fixed underlying type (e.g., enum X : char), there should be a requirement that the same fixed type be specified for host and device code. Similarly, for enumerations without a fixed underlying type, there should be a requirement that the implementation select the same underlying type for host and device code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. I've added wording very similar to what you wrote here in f4373e2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates for enumeration types looks good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unresolving this conversation. It seems confusing to me to say that applications are required to choose the same underlying type for enumerations. Why state this specifically when there are other things an application could do that cause types to be different on host and device? For example, if an application defines members of a struct to be different on host and device, you would have a similar problem.

If we feel we need to say something about this, perhaps we could say it more generally like:

Applications that define types to be have different size, alignment requirement, or representation on host vs. device via the __SYCL_DEVICE_ONLY__ macro have undefined behavior.

Or maybe it should be "... implementation defined behavior"?

I think it is useful to specifically list __SYCL_DEVICE_ONLY__ here because that is the only way that an application possibly could define types differently. At least I can't think of any other way. You might point out that there are other macros like this in DPC++, but those are vendor extensions and are therefore out of scope of this spec.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to relax the requirement that (user specified) fixed underlying types must match so long as device copyability rules retain a requirement for the same (or at least a compatible) fixed underlying type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a general note like the one @gmlueck suggests would be useful for application developers, but I didn't think that was what @tahonermann was talking about here.

The wording I added in f4373e2 is in the "Device compiler" section, and I didn't intend it to be advice for application developers. The intent was to clarify that the implementation must guarantee that enumerations have matching types. The first sentence just says that enumerations with different fixed underlying types are invalid -- that means a device compiler (and more broadly, the implementation) shouldn't introduce any of those. The second sentence is probably the more important one, because it means that a device compiler may require specific logic to ensure that it selects the same underlying type as the host compiler if an enumeration has no fixed underlying type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regard to explicit fixed types, there is both an implementation and a requirement on application developers.

I expect device code restrictions to be relaxed over time to allow more use of the C and C++ standard libraries in device code. Device code can already link with device-only TUs via SYCL_EXTERNAL. Maintaining a single C and C++ standard library that works for both host and device code isn't particularly realistic, especially when the host implementations are provided by, e.g., GNU C, libstdc++, Microsoft, etc... This increases the likelihood that enumeration types defined in these libraries will be differently specified for the host and the device. This imposes a requirement on the SYCL implementation to ensure such enumeration types are consistently defined if objects of those types might transition the host/device divide. We can either require all such types to be consistently defined, or we can require specific ones to be consistently defined so that those specific types satisfy device copyability rules, or we can say its all implementation-defined and code that uses such types is non-portable.

Likewise, user-defined enumeration types must be consistently defined to be copied between the host and device. For those cases, we can require all such types to be consistently defined, or we can place the requirement in the device copyable restrictions (with or without a compatible size/alignment allowance and corresponding std::underlying_type observability).


For enumerations without a fixed underlying type, the implementation must select
the same underlying type on host and device.

== Preprocessor directives and macros

Expand Down Expand Up @@ -387,6 +279,30 @@ literal with greater value.
In addition, for each <<backend>> supported, the preprocessor macros described
in <<sec:backends>> must be defined by all conformant implementations.

[[sec:standard-library-support]]
== {cpp} standard library support

In general, any use of functions or types from the [code]#std# namespace in
device code produces undefined behavior.
However, the functions and types listed in this section are guaranteed to have
defined semantics when used in device code.

=== Type aliases

The following type aliases must alias the same types on host and device:

* [code]#std::size_t#;
* [code]#std::ptrdiff_t#;
* [code]#std::max_align_t#;
* [code]#std::nullptr_t#; and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, I was wondering about how this is implemented in C++, looking at /usr/include/x86_64-linux-gnu/c++/14/bits/c++config.h on my laptop, I can see:

  typedef decltype(nullptr)	nullptr_t;

which follows https://en.cppreference.com/w/cpp/types/nullptr_t

But how an implementation using for example a different host and device compiler can ensure that these are the same types on host and device side? 🤔
Probably at the end, we do not care and the only important aspect is sizeof(std::nullptr_t) is equal to sizeof(void *).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably at the end, we do not care and the only important aspect is sizeof(std::nullptr_t) is equal to sizeof(void *).

I think we do care that they are the same type on host vs. device. Otherwise, you could construct constexpr values that are different on host vs. device via decltype.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also important that the types of the type alias targets match so that mangled names can be correlated across host and device compilation. Consider a SYCL kernel name type that is a class template specialization with a std::size_t template argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with nullptr_t is that there is a chicken-egg problem since it is defined as the type of nullptr and to pass such a value to a device you need to have the device type at the first place, but this is defined from the value...
The mangled name aspect is interesting. Probably the C++ ABI has some mangling for nullptr_t too.

* The type aliases for integer types defined in [code]#<cstdint>#.

=== Type traits

For any type available in device code, specializations of the following type
traits must have identical definitions on host and device:

* [code]#std::numeric_limits#

[[sec:optional-kernel-features]]
== Optional kernel features
Expand Down
22 changes: 5 additions & 17 deletions adoc/chapters/programming_interface.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17809,25 +17809,13 @@ std::error_code make_error_code(errc e) noexcept;

== Data types

SYCL as a {cpp} programming model supports the {cpp} core language data types,
and it also provides the ability for all SYCL applications to be executed on
SYCL compatible devices.
The scalar and vector data types that are supported by the SYCL system are
defined below.
More details about the SYCL device compiler support for fundamental and backend
interoperability types are found in <<subsec:scalartypes>>.
All types which are available in device code must have the same size, alignment
requirements, and representation on both host and device.

=== Scalar types

=== Scalar data types

The fundamental {cpp} data types which are supported in SYCL are described in
<<table.types.fundamental>>.
Note these types are fundamental and therefore do not exist within the
[code]#sycl# namespace.

Additional scalar data types which are supported by SYCL within the [code]#sycl#
namespace are described in <<table.types.additional>>.

In addition to the scalar types defined by {cpp}, SYCL defines a number of
scalar types as described in <<table.types.additional>>.

[[table.types.additional]]
.Additional scalar data types supported by SYCL
Expand Down
Loading