Skip to content

[Optimization] Faster string reversal with bitshift (and bswap/rev/movbe)? #20692

@richardleach

Description

@richardleach

Description
In a nutshell, pp_reverse reverses strings using a loop that in each iteration swaps a byte from the front of the char buffer with a byte from the back.

            while (down > up) {
                const char tmp = *up;
                *up++ = *down;
                *down-- = tmp;
            }

That seems like a common and portable string reversal technique. However, Fast array reversal with SIMD! is a lovely little explainer that shows how string reversal can be done much more quickly by reversing and swapping whole registers full of bytes at a time. Maximum speedup (see the SSSE3, AVX, AVX2 sections) might involve using intrinsics for which CPU support is far from guaranteed. (I imagine that the implementation complexity and maintenance overhead would therefore be too great at this time.)

However, it looks like it should be possible to use a generic and portable bit-shifting implementation (see the bswap section), which should be faster than pp_reverse's current method AND decent compilers can recognize this implementation and replace it with faster CPU instructions (bswap, rev, movbe) where possible.

The C++ code from that post:

inline std::uint64_t Swap64(std::uint64_t x)
{
    return (
        ((x & 0x00000000000000FF) << 56) |
        ((x & 0x000000000000FF00) << 40) |
        ((x & 0x0000000000FF0000) << 24) |
        ((x & 0x00000000FF000000) <<  8) |
        ((x & 0x000000FF00000000) >>  8) |
        ((x & 0x0000FF0000000000) >> 24) |
        ((x & 0x00FF000000000000) >> 40) |
        ((x & 0xFF00000000000000) >> 56)
    );
}

inline std::uint32_t Swap32(std::uint32_t x)
{
    return(
        ((x & 0x000000FF) << 24) |
        ((x & 0x0000FF00) <<  8) |
        ((x & 0x00FF0000) >>  8) |
        ((x & 0xFF000000) >> 24)
    );
}

inline std::uint16_t Swap16(std::uint16_t x)
{
    // This tends to emit a 16-bit `rol` or `ror` instruction
    return (
        ((x & 0x00FF) <<  8) |
        ((x & 0xFF00) >>  8)
    );
}

Note: I've had this on my ideas queue for ages and am just not likely to get around to it any time soon. So it's totally waiting for someone enthusiastic to take a look.

Perl configuration
pp_reverse's 2-bytes-at-a-time implementation goes way back.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions