Is Changing the Floating-point environment for intrinsic/assembly code UB? #471

New issue

Open

Is Changing the Floating-point environment for intrinsic/assembly code UB?#471

Labels

A-floatsS-pending-documentation

chorman0773

opened

on Oct 18, 2023

Contributor

Disclaimer: This is not attempting to solve the general fe_setenv issue and allow changing the floating-point environment for floating-point code.

Based on rust-lang/rust#72252, it seems the following code is currently UB:

pub unsafe fn div_3_1() -> f32{
    use core::arch::x86_64::*;
    let x = _mm_set_ss(1.0);
    let y = _mm_set_ss(3.0);
    _MM_SET_ROUNDING_MODE(_MM_ROUND_TOWARD_ZERO);
    let z = _mm_div_ss(x,y);
    _MM_SET_ROUNDING_MODE(_MM_ROUND_NEAREST);
    let mut k = 0.0f32;
    _mm_store_ss(&mut k,z);
    k
}

Likewise, the following code is also considered UB:

pub unsafe fn div_3_1() -> f32{
    use core::arch::x86_64::*;
    let x = _mm_set_ss(1.0);
    let y = _mm_set_ss(3.0);
    _MM_SET_ROUNDING_MODE(_MM_ROUND_TOWARD_ZERO);
    let z;
    asm!("divss {}, {}", inlateout(xmm_reg) x => z, in(xmm_reg) y);
    _MM_SET_ROUNDING_MODE(_MM_ROUND_NEAREST);
    let mut k = 0.0f32;
    _mm_store_ss(&mut k,z);
    k
}

Both of these are suprising, as no rust-level floating-point operations are performed that would be affected by the rounding mode - only platform intrinsics or inline assembly.

These are limited examples, but a much more general example of this is a library that implements floating-point operations (including those in non-default floating-point environments from, e.g. C or C++) using a combination of software emulation, inline assembly, and platform intrinsics.

Assuming the LLVM issue mentioned in 72252 is fixed (and llvm's code generation for the llvm.x86.sse.div.ss intrinsic is fixed), can we call these examples defined behaviour, or is this code simply UB for some other reason?

chorman0773

ContributorAuthor

@rustbot label +A-floats

added

Member

Both of these are suprising, as no rust-level floating-point operations are performed that would be affected by the rounding mode - only platform intrinsics or inline assembly.

You can't know that no Rust-level floating-point operations are affected. The compiler is allowed to move floating-point operations from elsewhere to in between the SET_ROUNDING_MODE calls. This is not a bug, floating-point operations are pure operations in the Rust AM and can be reordered arbitrarily. rustc has to thus implement them in a way that their behavior does not depend on any environmental flags, and it does that by having "the rounding mode is round-to-nearest" in its representation invariant that relates the low-level target state to the high-level AM state.

So yes, this is UB.

chorman0773

ContributorAuthor

My next question is whether it should be UB.

If even being in (any) rust code with an incorrect floating-point environment is UB, then that implicates quite a few things:

The aforementioned floating-point implementation library, which must support modifications to the floating-point environment to comply with ISO 9899 and platform ABI specifications
This also really affects any compiler runtime support library that can be called from not rust code. libatomic and libunwind implementations are most suspect IMO (I've 100% written a C++ RAII wrapper that sets the floating-point environment in a constructor, then resets it in the destructor and thrown out of that function). I would not be surprised if compiler_builtins is linked a mixed C/Rust project, and may be called by the compiler from inside a function that manipulates the floating-point environment, and thus causing UB by virtue of existing in the project tree.
This also means that rust code can never be entered from a signal handler, as the floating-point environment is unspecified upon entry, and modifying it from a signal handler is undefined behaviour.

RalfJung

Member

FWIW C code compiled with LLVM (and likely most other compilers) has the same UB. This is not just a Rust problem. I don't know anything about any of your examples, but if a platform ABI requires FP env mutation (as you claim for point 1) then that's already a highly problematic ABI at best. Which ABI requires FP env mutation? Therefore I also don't buy your second example; if that runtime support is implemented in C, then it already implicitly assumes a default FP environment. The C ABI (on pretty much any target) requires the FP env to be in default state.

Allowing modification of the FP environment without compromising optimizations on the 99.9% of functions that are happy with the default FP settings is not easy. We'd have to add some notion of "scope within which the compiler does not assume that the FP environment is in the default state". This would have to interact properly with inlining and things like that. It's not impossible, but it requires a bunch of design work.

chorman0773

ContributorAuthor

FWIW C code compiled with LLVM (and likely most other compilers) has the same UB

clang supports the STDC FENV_ACCESS pragma per requirements of ISO 9899. It uses constrained floating-point intrinsics. gcc also supports this pragma (again, as required by the standard). It is also not considered undefined behaviour by C to modify the floating-point environment w/o the pragma, though floating-point operations issued in a non-default FP Env may yield incorrect results according to the current FP Env - e.g. by constant-folding ops under the default rounding mode (whether it does is unspecified).
Any C compiler that considers it UB to do so is not compliant with the standard and I am under no obligation to write code that supports it, nor to have my own implementation follow such broken compilers. This is also true of C++, though C++ does not require support of the pragma (all C++ compilers I'm aware of, even MSVC, support it, though).

Which ABI requires FP env mutation

Moreso where the FP env is stored on the ABI, so I cannot emulate the fp env in the library. Practically every ABI I'm aware of specifies the platform effects of the fesetenv and fegetenv C functions.

I don't know anything about any of your examples

The main example is libsfp, a runtime support library used by lccc to implement floating-point operations (of various sizes, among others), including those in a function marked #[fpenv(dynamic)]. To comply with ISO 9899, it must respect the floating-point environment in such functions, and must not cause UB period.

The C ABI (on pretty much any target) requires the FP env to be in default state.

This is not the case of x86_64 Sys-V or MSABI. Both require that the floating-point environment is initialized to default (and specify that default), and then marks the control bits of the relevant registers (mxcsr and the x87 fcw) as callee saved, with the exception of functions that intentionally modify the floating-point environment (fesetenv is one). This is to support the aforementioned function, which is required by ISO 9899. Any C ABI that makes fesetenv immediate undefined behaviour is not a complaint ABI.

chorman0773

ContributorAuthor

It is also not considered undefined behaviour by C to modify the floating-point environment w/o the pragma, though floating-point operations issued in a non-default FP Env may yield incorrect results according to the current FP Env

If this was the case, then being in an async-signal-handler is immediate UB (as I noted is the case for Rust), which is most definitely not the case.

RalfJung

Member

As far as I know, C/C++ code compiled with clang without special markers behaves exactly like Rust code wrt float operations, and hence has the same UB. I assume basically all C/C++ compilers will treat float operations as pure (unless some special marker is set to indicate that a scope has a non-default FP env) and hence move them around (including out of loops and out of potentially dead code). This makes mutating the FP environment UB in the real C that compilers implement, whether or not the standard agrees.

I have no interest in specifying Rust in a way that is disconnected from reality, so we should call this UB as that's what it is. Maybe it's UB because compilers are not compliant, but how's that helpful? It isn't, unless you have a proposal for how the standard can be implemented in a reasonable way.

This is not the case of x86_64 Sys-V or MSABI. Both require that the floating-point environment is initialized to default (and specify that default), and then marks the control bits of the relevant registers (mxcsr and the x87 fcw) as callee saved, with the exception of functions that intentionally modify the floating-point environment (fesetenv is one). This is to support the aforementioned function, which is required by ISO 9899. Any C ABI that makes fesetenv immediate undefined behaviour is not a complaint ABI.

That can't be true. When I write an extern "C" function, I must be able to rely on the FP env being in default state. If that wasn't the case then every single externally callable function that might use FP operations had to start by setting the rounding mode. If I set the FP env to a non-default state and then call some library and it misbehaves, I don't get to complain. There is no general expectation that libraries are resilient against non-default FP envs. (If any of this is wrong please let me know, I certainly haven't seen any evidence to the contrary.)

All of this shows that the rounding mode is part of the de-facto ABI. If the documentation disagrees then the documentation doesn't reflect the contract used in real-world software.

If this was the case, then being in an async-signal-handler is immediate UB (as I noted is the case for Rust), which is most definitely not the case.

If the async signal handler uses any float operation, then it's most definitely UB. There's also nothing in the LLVM LangRef that would forbid the compiler from introducing a new float operation into code that doesn't use a float operation. This can be as subtle as

if !in_signal { some_float_op(); }

and hoisting the operation out of the if (which is obviously legal for a pure operation).

Sounds like async signal handlers need to be compiled with such a "FP env might be in non-default state" kind of a scope, otherwise there's no way they can be sound.

Also sounds like nobody really thought this entire FP env thing through to the end and different parts of the ecosystem made mutually incompatible choices, and now it's all busted. 🤷 It doesn't get better by pretending that it's not UB, though.

chorman0773

ContributorAuthor

As far as I am aware gcc (and msvc) implement the behaviour as prescribed. If clang does not, I would consider that a bug in clang and certainly not any behaviour I would desire to emulate in lccc.

That can't be true. When I write an extern "C" function, I must be able to rely on the FP env being in default state.

This is either an extra constraint imposed by rust and that does not reflect any actual C abi, or is an incorrect reliance. As far as I am aware, no C abi is not complaint with the relevant sections of ISO 9899 in this regard. Knowing the precise behaviour of the Clever-ISA abi, I can quote the relevant text, though x86_64 Sys-V is similar (albeit less formal).

The fpcrw register is set to 0 upon start up. The fpcrw.EXCEPT bits are caller-saved. No function should modify the value of the fpcrw register unless indicated by the behaviour of the function (in particular, the fesetenv function sets the value as per the "floating-point env" section of this document). Otherwise, the register shall be treated as reserved (must be restored to value at entry prior to calling another function or returning if modified).

chorman0773

ContributorAuthor

(if you'd like, I can find the relevant portions of the x86_64 sys-v spec and msvc abi)

RalfJung

Member

As far as I am aware gcc (and msvc) implement the behaviour as prescribed. If clang does not, I would consider that a bug in clang

GCC says "Without any explicit options, GCC assumes round to nearest or even". It's unclear what that means, but it's far from obvious that it means "GCC guarantees that the code will work correctly under all FP environments".

LLVM is very clear: "The default LLVM floating-point environment assumes that traps are disabled and status flags are not observable. Therefore, floating-point math operations do not have side effects and may be speculated freely. Results assume the round-to-nearest rounding mode, and subnormals are assumed to be preserved." You might want to bring this up with the LLVM people if you think that's an issue.

Usually @comex is very good at getting compilers to apply the right optimizations in the right order to demonstrate an end-to-end miscompilation, maybe they can do it here, too? :)

This also means that rust code can never be entered from a signal handler, as the floating-point environment is unspecified upon entry, and modifying it from a signal handler is undefined behaviour.

What is this claim based on? Is it some standard that says so, or are there really targets and OSes where the kernel doesn't save and restore the FP environment when switching from a thread to its signal handler and back?

(if you'd like, I can find the relevant portions of the x86_64 sys-v spec and msvc abi)

Do you have any evidence that every single library with a C ABI is actually expected to be working correctly under arbitrary FP environments, and that library authors consider it a bug when their library misbehaves under non-default FP environments?

As I said before, I care not only about what it says in some piece of paper and but also about what is actually done in the real world. When standard and reality disagree, it's not automatically reality that's wrong. Sometimes the standard is just making completely unrealistic prescriptions that everybody ignores, and the standard should be fixed.

It's also unclear to me which alternative you are suggesting. Could you make a constructive proposal? Here are some options, and you can already immediately see why many people won't like them:

FP operations are specified to be using a non-deterministically chosen rounding mode. Or:
The compiler may not move FP operations to different parts of the code unless it can prove both use the same FP rounding mode, and the compiler may also not synthesize new FP operations. Since the compiler cannot know which functions are "documented" to change the FP status register, it has to conservatively assume that any function that is called changes the FP rounding mode, and never reorder an FP operation across a function call.

You are asking everyone to pay for a feature that hardly anyone needs. Is that your position, or do you see a better way out here?

If you further want to claim that even other aspects besides the rounding mode may be changed, such as making sNaNs trigger a trap, then either passing an sNaN to an FP operation is UB, or FP operations cannot be reordered at all with anything any more (e.g., reordering an FP operation and a store becomes illegal since the trap makes it fully observable when exactly an operation happens).

RalfJung

mentioned this

on Oct 20, 2023

add float semantics RFC rust-lang/rfcs#3514

Muon

Linux definitely restores the FP environment when it enters a signal handler. There was a big kerfuffle about it back in 2002 when SSE2 arrived (https://yarchive.net/comp/linux/fp_state_save.html). I think FreeBSD might be a target that actually does not restore the FP environment when entering a signal handler, but I am unsure (https://reviews.freebsd.org/D33599). In any case, glibc says that fesetenv is async-signal-safe (https://www.gnu.org/software/libc/manual/html_node/Control-Functions.html), so fixing this shouldn't be a problem.

The SYSV ABI (https://gitlab.com/x86-psABIs/x86-64-ABI) stipulates that the FP control bits are callee-saved, meaning that the callee needs to restore them if it changes them. (Presumably an exception is intended for fesetenv and fesetround, but it seems to have been forgotten.) This doesn't mean that publicly-accessible library functions using FP instructions have to be built defensively to be correct, just that they have an undocumented assumption (that the FP environment is default).

The main consideration for Rust is that it is ultimately bound by LLVM's quirks. LLVM has made (is still making?) progress towards letting Clang support #pragma STD FENV_ACCESS ON, so something similar would be good to implement in Rust eventually. My preference would be an attribute applicable to blocks and functions that describes how floating-point arithmetic behaves within them, similar to #pragma STDC FENV_ROUND direction.

Additionally, the C23 standard (and possibly earlier revisions) specifies in Section 7.6.1 "The FENV_ACCESS pragma" that it is UB to, under a non-default floating-point environment, execute any code that was compiled with the pragma set to off.

RalfJung

Member

@Muon thanks! So looks like in practice, signal handlers are fine on our tier 1 targets, but other targets are having issues. (Also I heard that some versions of WSL do not save and restore the FP env for signal handlers.)

Presumably an exception is intended for fesetenv and fesetround, but it seems to have been forgotten.)

For this to be useful for compilers, the exception needs to be compiler-readable. Connor quoted above some wording saying that if the function is "documented" to change the FP state then it may do so, but of course that's not very useful.

This is similar to how setjmp needs an attribute so that the compiler can understand that something very weird is going on.

(Though floats are different in that as far as I can see, even with such an attribute there'd be a global cost.)

This doesn't mean that publicly-accessible library functions using FP instructions have to be built defensively to be correct, just that they have an undocumented assumption (that the FP environment is default).

For Rust (and code compiled with clang) this means all functions have such an undocumented assumption.

RalfJung

Member

FWIW in my opinion this is a case of bad ISA design. ISAs chose to introduce some global mutable state and as usual, global mutable state is causing problems. ISAs should provide opcodes that entirely ignore the FP status register so that languages can implement the desired semantics (floating-point operations that do not depend on global mutable state) properly. But it seems like even RISCV repeats this mistake, so we'll be stuck with hacks and quirks for many decades to come. Languages can choose to either make those ISA features basically inaccessible, to penalize all users for the benefit of the tiny fraction that actually wants a non-default FP status register, or to introduce syntactic quirks that mark where in the code floating-point opcodes behave in non-default ways.

22 remaining items

chorman0773

ContributorAuthor

But I wouldn't suggest hanging your program's correctness on such assumptions.

In my case, at least, there aren't any floating-point operations in sight (beyond stuff in inline-assembly). Inlining would be a thing, but this code is on the other side of a staticlib/dylib and LTO is off (not that the calls that care about fp-env could possibly LTO with llvm-compiled code anyways - this is being called by lccc's codegen). I'd prefer a more well-defined solution, though this is probably good until said solution exists.

Muon

Sure thing, boss! Here is a program that segfaults in release mode, where the only unsafe code is a call to fesetround (which changes the floating point environment's rounding mode). Tested on Linux and macOS. In short, LLVM removes a bounds check based on assumptions that are valid in the default rounding mode, but we execute the code in a different mode where those assumptions are invalid, and the program ends up indexing out of bounds.

That's delightful. I am surprised to learn that LLVM does not perform any range tracking on floating-point variables. Though I suppose if it did optimize things like that more aggressively perhaps that would expose too many bugs with its x87 handling.

RalfJung

Member

Sure thing, boss! Here is a program that segfaults in release mode, where the only unsafe code is a call to fesetround (which changes the floating point environment's rounding mode). Tested on Linux and macOS. In short, LLVM removes a bounds check based on assumptions that are valid in the default rounding mode, but we execute the code in a different mode where those assumptions are invalid, and the program ends up indexing out of bounds.

That's amazing, thanks a ton. :) If I truly were your boss, you'd get a promotion. :D

If it were a constant, LLVM would just replace the argument to get_unchecked with whatever the constant value is, and not execute the floating point computation at runtime at all.

It would still index the wrong element though? So one could then unsafely assert that we saw the right element and we would reach an unreachable_unchecked that should be unreachable, and that'd still be a miscompilation?

EDIT: Ah no it would of course index the right element, since it'd do the computation with default rounding mode. Yeah that is quite tricky, amazing that you found an example!

beetrees

mentioned this in 3 issues

on Apr 25, 2024

HadrienG2

Assuming our beloved compiler backends can fix their broken semantics to allow it, I would ague that it makes a lot of sense for Rust to provide opt-in support for FTZ/DAZ mode (ideally in selected code regions so the rest of the code is not penalized) because...

On Intel CPUs, the performance impacts of denormals is huge (10~100x slowdown in FMA-heavy code). Though interestingly, AMD somehow managed to cut it down to much less according to my tests. I wish Intel could learn from them here, but in any case we'll need to support Intel's current CPUs for many years to come at my workplace...
The amount of affected code is equally enormous because anything that looks a decaying exponential will eventually enter denormal range if left at rest long enough. And unfortunately, decaying exponentials are everywhere, because they are the analytical solution of time-based differential equations describing real-world systems with negative feedback effects. Which means that you will often find them in the output of common signal processing algorithms (anything that looks like a low-pass filter) and numerical simulations of many phenomena in natural sciences (physics, biology, chemistry...).

HadrienG2

I think what I'd love to have is something like this:

fn do_compute() {
   // ... normal rust code ...

   // Denormals-sensitive code path starts here
   #[flush_denormals]
   // At this opening brace, three things happen:
   //
   // 1. A backend optimization barrier akin to an AcqRel atomic op is inserted,
   //    preventing floating-point code and constructs like function calls that
   //    can indirectly lead to the execution of floating-point code to be
   //    reordered after the upcoming change in FP environment configuration.
   // 2. The CPU floating-point environment is saved if needed (see below) then
   //    modified so that denormals start being flushed to zero.
   // 3. A backend optimization barrier akin to an AcqRel atomic op is inserted,
   //    preventing FP code (as defined above) inside the following block to be
   //    reordered before the floating-point environment change.
   {
       // The code that is generated inside of this code block is annotated at
       // the backend level to disable the backend's assumption that the
       // floating-point environment is in the default configuration. Indeed, if
       // the backend provides the appropriate annotations for that, we can even
       // explicitly tell it that we're using a denormals-are-zero environment
       // to reduce the loss of backend optimizations.
       //
       // Note that escaping this code block's scope via e.g. calls to functions
       // that are built in the default "assume default FP env" compiler backend
       // configuration is UB. We could handle this much like we handle
       // functions with `#[target_features]` annotations in regular code that
       // does not have these annotations (i.e. make all function calls unsafe
       // unless the functions are themselves annotated with some kind of
       // `#[flush_denormals]`-like attribute), but the ergonomics would be very
       // poor as any nontrivial use of `#[flush_denormals]` would be full of
       // unsafe even when we can trivially have the compiler backend Do The
       // Right Thing with e.g. FP arithmetic.
       //
       // Instead, it would be better to have a way to automagically force the
       // compiler backend to generate two copies of every function that is
       // invoked here, one with normal FP semantics and one with
       // `#[flush_denormals]` semantics. In that case, the only thing that
       // would be unsafe would be calling to a function that cannot be
       // transparently duplicated (think `extern`, `#[no_mangle]`...).

   // At this closing brace (or if the scope is exited in any other way like
   // panics etc), the floating point environement is restored using a procedure
   // similar to that used to set it:
   //
   // 1. A backend optimization barrier akin to an AcqRel atomic op is inserted
   //    to prevent FP code inside the previous block to be reordered after the
   //    upcoming floating-point environment change.
   // 2. The CPU floating-point environment is restored using the previously
   //    saved copy, or just reset to the expected default if we fully assume a default
   //    FP environment like LLVM seemingly does.
   // 3. A backend optimization barrier akin to an AcqRel atomic op is inserted
   //    to prevent FP code after end of the block to be reordered before the
   //    floating-point environment change.
   }

    // ... back to normal rust code ...
}

But if that's too difficult to implement, I can totally live with a function-scoped attribute (#[flush_denormals] fn gotta_go_fast {}).

A global FTZ/DAZ compiler option would be more problematic on the other hand because some numerical algorithms do depend on proper denormals behavior for correctness. Think about e.g. iterative algorithms that run until estimated error gets below a certain threshold: in this case the error estimate computation can easily end up relying on Sterbenz's lemma for correctness, as nicely highlighted by this amazing bug report.

RalfJung

Member

There's quite a big design space here, e.g. one could also imagine specifying the rounding mode and other aspects like denormal handling for each operation. That'd make a lot more sense semantically, and at least some ISAs (RISC-V) I hear are designed in a reasonable (non-stateful) way and support setting such flags on each instruction.

So, this will require someone or a small group of people proposing a t-lang project and working out some reasonable solutions here. It might require work on the LLVM side, too. t-opsem / UCG can help figure out the spec for concrete proposals, but we don't have the capacity to push for entirely new language extensions like this ourselves.

I don't think this issue is the right place to discuss the solution space here. I think the original question has been answered (yes, this is UB). The thing that's left before closing the issue is making sure this is properly documented. I am not entirely sure where such docs would go though... somewhere in the reference where we explain the assumptions Rust makes about the surrounding execution environment, but I don't think we have such a place yet?

RalfJung

added

S-pending-documentation

on Dec 13, 2024

HadrienG2

Thanks for the feedback anyway. I must admit that I'm a bit lost in the communication channels that the Rust project uses. What do you think is the best place to bring this discussion to see if there are enough other interested people ? t-lang at rust-lang zulip ? internals.rust-lang.org ? Somewhere else ?

RalfJung

Member

I'd start by writing up some pre-RFC draft and circulating it on Zulip and/or IRLO.