xorsimd provides high-throughput XOR operations for byte slices.
On amd64 it selects AVX-512/AVX2/SSE2 implementations, and falls back to
portable Go code on other architectures.
go get github.com/templexxx/xorsimdpackage main
import "github.com/templexxx/xorsimd"
func main() {
a := []byte{1, 2, 3, 4}
b := []byte{9, 8, 7, 6}
dst := make([]byte, 4)
// dst = a ^ b
n := xorsimd.Bytes(dst, a, b)
_ = n
// dst = src[0] ^ src[1] ^ src[2] ...
src := [][]byte{a, b, []byte{1, 1, 1, 1}}
n = xorsimd.Encode(dst, src)
_ = n
}Encode(dst, src)XORs all rows insrcintodstand returns the number of bytes processed.Encoderequireslen(src) >= 1.Bytes(dst, a, b)is a convenience wrapper aroundEncode.Bytes8,Bytes16,Bytes8Align, andBytes16Alignare fixed-size helpers.BytesAprocesseslen(a)bytes;BytesBprocesseslen(b)bytes.
All APIs may reuse overlapping slices. For correctness, callers must ensure the input slices are long enough for the selected function variant.
Performance mainly depends on:
- CPU SIMD instruction support.
- Number of input source rows.
- Slice size.
Benchmark formula:
I/O = (src_num + 1) * vector_size / cost
Test environment:
- AWS c5d.xlarge
- Intel Xeon Platinum 8124M @ 3.00GHz
- Single physical core
| Src Num | Vector Size | AVX512 I/O (MB/s) | AVX2 I/O (MB/s) | SSE2 I/O (MB/s) |
|---|---|---|---|---|
| 5 | 4KB | 270403.73 | 142825.25 | 74443.91 |
| 5 | 1MB | 26948.34 | 26887.37 | 26950.65 |
| 5 | 8MB | 17881.32 | 17212.56 | 16402.97 |
| 10 | 4KB | 190445.30 | 102953.59 | 53244.04 |
| 10 | 1MB | 26424.44 | 26618.65 | 26094.39 |
| 10 | 8MB | 15471.31 | 14866.72 | 13565.80 |
go test ./...
go test -bench=. -benchmem