Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
internal/lz4block: Speed up noasm decoder
When the compiler is told exactly how many bytes a copy call should copy, and that number is at most 16, it will inline the call. Also, the old code only took the short match shortcut when the short literal shortcut was also taken. But long literals with short matches are common. Benchmark results on older Intel: goos: linux goarch: amd64 pkg: github.com/pierrec/lz4/v4 cpu: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz │ old │ new │ │ B/s │ B/s vs base │ UncompressPg1661-8 327.9Mi ± 1% 549.7Mi ± 0% +67.61% (p=0.000 n=10) UncompressDigits-8 1.111Gi ± 1% 1.499Gi ± 1% +34.94% (p=0.000 n=10) UncompressTwain-8 348.3Mi ± 0% 579.4Mi ± 0% +66.32% (p=0.000 n=10) UncompressRand-8 3.296Gi ± 0% 3.309Gi ± 1% ~ (p=0.739 n=10) geomean 813.8Mi 1.108Gi +39.40% On newer AMD: goos: linux goarch: amd64 pkg: github.com/pierrec/lz4/v4 cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics │ old │ new │ │ B/s │ B/s vs base │ UncompressPg1661-16 643.6Mi ± 2% 1076.9Mi ± 1% +67.33% (p=0.000 n=10) UncompressDigits-16 2.808Gi ± 1% 3.786Gi ± 0% +34.82% (p=0.000 n=10) UncompressTwain-16 702.8Mi ± 1% 1309.5Mi ± 7% +86.32% (p=0.000 n=10) UncompressRand-16 6.878Gi ± 0% 6.850Gi ± 1% -0.42% (p=0.009 n=10) geomean 1.699Gi 2.430Gi +43.04%
- Loading branch information