Skip to content

Some improvements around Q_rsqrt() #1458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 17, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions src/engine/qcommon/q_math_test.cpp
Original file line number Diff line number Diff line change
@@ -219,4 +219,18 @@ TEST(QSharedMathTest, InverseSquareRoot)
EXPECT_THAT(Q_rsqrt(1e6), RsqrtEq(1e-3));
}

TEST(QSharedMathTest, FastInverseSquareRoot)
{
constexpr float relativeTolerance = 6.50196699e-4;
auto RsqrtEq = [=](float expected) { return FloatNear(expected, expected * relativeTolerance); };

EXPECT_THAT(Q_rsqrt_fast(1e-6), RsqrtEq(1e3));
EXPECT_THAT(Q_rsqrt_fast(0.036), RsqrtEq(5.270463));
EXPECT_THAT(Q_rsqrt_fast(0.2), RsqrtEq(2.236068));
EXPECT_THAT(Q_rsqrt_fast(1), RsqrtEq(1));
EXPECT_THAT(Q_rsqrt_fast(3), RsqrtEq(0.5773503));
EXPECT_THAT(Q_rsqrt_fast(29.1), RsqrtEq(0.1853760));
EXPECT_THAT(Q_rsqrt_fast(1e6), RsqrtEq(1e-3));
}

} // namespace
108 changes: 92 additions & 16 deletions src/engine/qcommon/q_shared.h
Original file line number Diff line number Diff line change
@@ -338,25 +338,101 @@ extern const quat_t quatIdentity;

#define Q_ftol(x) ((long)(x))

// Overall relative error bound (ignoring unknown powerpc case): 5 * 10^-6
// https://en.wikipedia.org/wiki/Fast_inverse_square_root#/media/File:2nd-iter.png
inline float Q_rsqrt( float number )
{
float x = 0.5f * number;
float y;
/* The original Q_rsqrt algorithm is:

float Q_rsqrt( float n )
{
uint32_t magic = 0x5f3759dful;
float a = 0.5f;
float b = 3.0f;
union { float f; uint32_t u; } o = {n};
o.u = magic - ( o.u >> 1 );
return a * o.f * ( b - n * o.f * o.f );
}

It could be written like this, this is what Quake 3 did:

float Q_rsqrt( float n )
{
uint32_t magic = 0x5f3759dful;
float a = 0.5f;
float b = 3.0f;
float c = a * b; // 1.5f
union { float f; uint32_t u; } o = {n};
o.u = magic - ( o.u >> 1);
float x = n * a;
return o.f * ( c - ( x * o.f * o.f ) );
o.f *= c - ( x * o.f * o.f );
// o.f *= c - ( x * o.f * o.f );
return o.f;
}

It was written with a second iteration commented out.

The relative error bound after the initial iteration was: 1.8×10⁻³
The relative error bound after a second iteration was: 5×10⁻⁶

The 0x5f3759df magic constant comes from the Quake 3 source code:
https://github.com/id-Software/Quake-III-Arena/blob/dbe4ddb/code/game/q_math.c#L56

That magic constant was good enough but better ones can be used.

// compute approximate inverse square root
Chris lomont computed a better magic constant of 0x5f375a86 while
keeping the other values of 0.5 and 3.0 for all iterations:
https://www.lomont.org/papers/2003/InvSqrt.pdf

Jan Kadlec computed an ever better magic constant but it requires
different values for the first iteration: http://rrrola.wz.cz/inv_sqrt.html

float Q_rsqrt( float n )
{
uint32_t magic = 0x5f1ffff9ul:
float a = 0.703952253f;
float b = 2.38924456f;
union { float f; uint32_t u; } o = {n};
o.u = magic - ( o.u >> 1 );
return a * o.f * ( b - n * y.f * y.f );
}

The relative error bound is: 6.50196699×10⁻⁴ */

// Compute approximate inverse square root.
inline float Q_rsqrt_fast( const float n )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is the error bound for Q_rsqrt_fast?

Copy link
Member Author

@illwieckz illwieckz Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page http://rrrola.wz.cz/inv_sqrt.html says:

max rel_error avg rel_error²
6.50196699×10⁻⁴ 2.00010826×10⁻⁷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what means the 2 in avg rel_error²

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it may simple be the squared relative error, annoying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I guess the relative error is √2.00010826×10⁻⁷ so 4.4722569917212941×10⁻⁴.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max rel error is what you want for a unit test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, when compiled with -DUSE_ARCH_INTRINSICS=OFF, with a relative tolerance of 4.4722569917212941e-4, the tests for Q_rsqrt_fast() fail this way:

src/engine/qcommon/q_math_test.cpp:228: Failure
Value of: Q_rsqrt_fast(1e-6)
Expected: is approximately 1000 (absolute error <= 0.44722569)
  Actual: 1000.5 (of type float), which is 0.502686 from 1000
src/engine/qcommon/q_math_test.cpp:229: Failure
Value of: Q_rsqrt_fast(0.036)
Expected: is approximately 5.270463 (absolute error <= 0.0023570864)
  Actual: 5.27387 (of type float), which is 0.00340557 from 5.27046
src/engine/qcommon/q_math_test.cpp:232: Failure
Value of: Q_rsqrt_fast(3)
Expected: is approximately 0.57735032 (absolute error <= 0.0002582059)
  Actual: 0.576975 (of type float), which is -0.00037539 from 0.57735
[  FAILED  ] QSharedMathTest.FastInverseSquareRoot (0 ms)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max rel error is what you want for a unit test

Ah OK.

Copy link
Member Author

@illwieckz illwieckz Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the tests pass (both with SSE and without).

{
#if defined(DAEMON_USE_ARCH_INTRINSICS_i686_sse)
// SSE rsqrt relative error bound: 3.7 * 10^-4
_mm_store_ss( &y, _mm_rsqrt_ss( _mm_load_ss( &number ) ) );
float o;
// The SSE rsqrt relative error bound is 3.7×10⁻⁴.
_mm_store_ss( &o, _mm_rsqrt_ss( _mm_load_ss( &n ) ) );
#else
y = Util::bit_cast<float>( 0x5f3759df - ( Util::bit_cast<uint32_t>( number ) >> 1 ) );
y *= ( 1.5f - ( x * y * y ) ); // initial iteration
// relative error bound after the initial iteration: 1.8 * 10^-3
/* Magic constants by Jan Kadlec, with a relative error bound
of 6.50196699×10⁻⁴.
See: http://rrrola.wz.cz/inv_sqrt.html */
constexpr float a = 0.703952253f;
constexpr float b = 2.38924456f;
constexpr uint32_t magic = 0x5f1ffff9ul;
float o = Util::bit_cast<float>( magic - ( Util::bit_cast<uint32_t>( n ) >> 1 ) );
o *= a * ( b - n * o * o );
#endif
y *= ( 1.5f - ( x * y * y ) ); // second iteration for higher precision
return y;
}
return o;
}

inline float Q_rsqrt( const float n )
{
/* When using the magic constants, the relative error bound after the
iteration is expected to be at most 5×10⁻⁶. It was achieved with the
less-good Quake 3 constants with a first iteration having originally
a relative error bound of 1.8×10⁻³.
Since the new magic constants provide a better relative error bound of
6.50196699×10⁻⁴, the relative error bound is now expected to be smaller.
When using the SSE rsqrt, the initial error bound is 3.7×10⁻⁴ so after
the iteration it is also expected to be smaller. */
constexpr float a = 0.5f;
constexpr float b = 3.0f;
float o = Q_rsqrt_fast( n );
// Do an iteration of Newton's method for finding the zero of: f(x) = 1÷x² - n
o *= a * ( b - n * o * o );
return o;
}

inline float Q_fabs( float x )
{
@@ -617,7 +693,7 @@ inline vec_t VectorNormalize( vec3_t v )
// that length != 0, nor does it return length
inline void VectorNormalizeFast( vec3_t v )
{
vec_t ilength = Q_rsqrt( DotProduct( v, v ) );
vec_t ilength = Q_rsqrt_fast( DotProduct( v, v ) );

VectorScale( v, ilength, v );
}
8 changes: 4 additions & 4 deletions src/engine/renderer/tr_main.cpp
Original file line number Diff line number Diff line change
@@ -246,7 +246,7 @@ void R_TBNtoQtangents( const vec3_t tangent, const vec3_t binormal,
if ( ( trace = tangent2[ 0 ] + binormal2[ 1 ] + normal2[ 2 ] ) > 0.0f )
{
trace += 1.0f;
scale = 0.5f * Q_rsqrt( trace );
scale = 0.5f * Q_rsqrt_fast( trace );

q[ 3 ] = trace * scale;
q[ 2 ] = ( tangent2 [ 1 ] - binormal2[ 0 ] ) * scale;
@@ -257,7 +257,7 @@ void R_TBNtoQtangents( const vec3_t tangent, const vec3_t binormal,
else if ( tangent2[ 0 ] > binormal2[ 1 ] && tangent2[ 0 ] > normal2[ 2 ] )
{
trace = tangent2[ 0 ] - binormal2[ 1 ] - normal2[ 2 ] + 1.0f;
scale = 0.5f * Q_rsqrt( trace );
scale = 0.5f * Q_rsqrt_fast( trace );

q[ 0 ] = trace * scale;
q[ 1 ] = ( tangent2 [ 1 ] + binormal2[ 0 ] ) * scale;
@@ -268,7 +268,7 @@ void R_TBNtoQtangents( const vec3_t tangent, const vec3_t binormal,
else if ( binormal2[ 1 ] > normal2[ 2 ] )
{
trace = -tangent2[ 0 ] + binormal2[ 1 ] - normal2[ 2 ] + 1.0f;
scale = 0.5f * Q_rsqrt( trace );
scale = 0.5f * Q_rsqrt_fast( trace );

q[ 1 ] = trace * scale;
q[ 0 ] = ( tangent2 [ 1 ] + binormal2[ 0 ] ) * scale;
@@ -279,7 +279,7 @@ void R_TBNtoQtangents( const vec3_t tangent, const vec3_t binormal,
else
{
trace = -tangent2[ 0 ] - binormal2[ 1 ] + normal2[ 2 ] + 1.0f;
scale = 0.5f * Q_rsqrt( trace );
scale = 0.5f * Q_rsqrt_fast( trace );

q[ 2 ] = trace * scale;
q[ 3 ] = ( tangent2 [ 1 ] - binormal2[ 0 ] ) * scale;