Skip to content

Speeding up intrinsic-test #1851

Open
Open
@folkertdev

Description

@folkertdev

The intrinsic-test crate runs incredibly slowly, and takes a long time both on CI and locally. I'd like to speed that up.

Looking at the code, it also shows signs of its age. I think we can do a much better job today. Based on some rough profiling, the main bottleneck appears to be the compilation of 3K+ C++ files into executables. On my machine each file takes roughly ~280ms to compile. By using C instead, and compiling to an object file, I'm able to get a ~4X speedup.

My idea is to emit C files like this (we emit many C files because clang won't parallelize its workload by itself):

#include <arm_neon.h>
#include <arm_acle.h>
#include <arm_fp16.h>

const uint32_t a_vals[] = {
    0x0,
    0x800000,
    0x3effffff,
    0x3f000000,
    // ...
};
const uint8_t b_vals[] = {
    0x0,
    0x1,
    0x2,
    0x3,
    // ...
};


uint32_t __crc32b_output[20] = {};

extern uint32_t *c___crc32b_generate(void) {
    for (int i=0; i<20; i++) {
        __crc32b_output[i] = __crc32b(a_vals[i], b_vals[i]);
    }

    return __crc32b_output;
}

The for rust, we can generate all the tests in one binary (not sure if splitting into files is useful there; it might be), and link it together to the C object files. Then the final check of the output can happen in rust (calling the rust and C version of the test and comparing results). This crucially means we only need to compile the formatting logic once (and in rust, so it's trivially consistent).

cc @adamgemmell @Jamesbarford if you have thoughts on this idea, or other ways to speed up this program.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions