Floats are problematic

It is well known that floating point code often merits extra caution due to the details of rounding math, but the fact that floating point numbers are more interesting to handle applies not just to software interacting with floating point numbers and the floating point environment (all of which can affect our API significantly), but also applies to hardware implementations of both vector and scalar floating point units, so to both our SIMD code and our scalar fallbacks. These aren't bugs per se because [it's not a bug, it's a **feature**](https://www.wired.com/story/its-not-a-bug-its-a-feature/), but they are features requiring enhanced attention to detail. This is related to https://github.com/rust-lang/unsafe-code-guidelines/issues/237 and https://github.com/rust-lang/rust/issues/73328 as well.

Most domains involving SIMD code use floats extensively, so this library has a "front-row seat" to problems with floating point operations. Thus, in order for SIMD code to be reasonably portable, we should try to discover where these... unique... implementations lie and decide how to work around them, or at least inform our users of the Fun Facts we learn.

Arch-specific concerns remain for:
- [ ] 32-bit x86
- [ ] 32-bit ARM
- [ ] MIPS
- [ ] Wasm

### 32-bit x86
The x87 80-bit "long double" float registers [can do interesting things to NaN](https://github.com/rust-lang/rust/issues/73288), and in general if a floating point value ends up in them and experiences an operation, this can introduce extra precision that may lead to incorrect mathematical conclusions later on. Further, Rust no longer supports MMX code at all because their interaction with the x87 registers were just entirely too much trouble. Altogether, this is probably why the [x86-64 System V ABI] specifies usage of the XMM registers for handling floating point values, which do not have these problems.

### 32-bit ARM
There are [so many different floating point implementations on 32-bit ARM][arm-floating-point] that `armclang` includes a compiler flag [for FPU architecture][arm-mfpu] and another one for [float ABI][arm-mfloat-abi].
- **[VFP] AKA Vector Floating Point**: VFP units that appear on ARMv7 seem to default to flushing denormals to zero unless the appropriate control register has the "FZ bit" set appropriately.
- **Neon AKA Advanced SIMD**: Vector registers flush denormals to zero always. This is **not** true of aarch64's Enhanced Neon AKA Advanced SIMD v2.
- **Aarch32**: Lest we imagine that ARMv8-A is completely free of problems, the "aarch32" execution mode has [an unspecified default value for the FZ bit][aarch32-fz-bit] even if Neon v2 is available.

### MIPS
NaNs sometimes do weird things on this platform, [resulting in some of the packed_simd tests getting the very interesting number `-3.0`](https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/packed_simd.20maintenance/near/211342192).

### Wasm
Technically Wasm is IEEE754 compliant but it is very "...technically!" here because it specifies a canonicalizing behavior on NaNs that constitutes an interesting choice amongst architectures and may not be expected by a programmer that is used to e.g. being able to rely on NaN bitfields being fairly stable, so we will want to watch out for it when implementing our initial floating point test suite.

## The Good News
We have reasonable confidence in float behavior in x86's XMM-and-later registers (so from SSE2 onwards), aarch64's Neon v2 registers, PowerPC, and z/Architecture (`s390x`). As far as we know, on these architectures ordinary binary32 and binary64s are what they say they are, support basic operations in a reasonably consistent fashion, and nothing particularly weird occurs even with NaNs. So we are actually in a pretty good position for actually using most vector ISAs! It's just all the edge cases that pile up.

## Main Takeaway
We want to extensively test even "simple" scalar operations on floating point numbers, especially casts and bitwhacking, or anything else that might possibly be affected by denormal numbers or NaN, so as to surface known and unknown quirks in floating point architectures and how LLVM handles them.

[arm-floating-point]: https://developer.arm.com/architectures/instruction-sets/floating-point
[arm-mfpu]: https://developer.arm.com/documentation/dui0774/k/Compiler-Command-line-Options/-mfpu?lang=en
[arm-mfloat-abi]: https://developer.arm.com/documentation/dui0774/k/Compiler-Command-line-Options/-mfloat-abi?lang=en
[aarch32-fz-bit]: https://developer.arm.com/docs/ddi0595/f/aarch32-system-registers/fpscr#FZ_24
[VFP]: https://developer.arm.com/documentation/dui0472/m/compiler-coding-practices/vector-floating-point--vfp--architectures?lang=en
[Neon]: https://developer.arm.com/documentation/dui0472/m/using-the-neon-vectorizing-compiler/the-neon-unit?lang=en
[x86-64 System V ABI]: https://raw.githubusercontent.com/wiki/hjl-tools/x86-psABI/x86-64-psABI-1.0.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Floats are problematic #39

32-bit x86

32-bit ARM

MIPS

Wasm

The Good News

Main Takeaway

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Floats are problematic #39

Description

32-bit x86

32-bit ARM

MIPS

Wasm

The Good News

Main Takeaway

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions