forked from golang/go
-
Notifications
You must be signed in to change notification settings - Fork 0
compress/flate: improve compression speed #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
austinderek
wants to merge
5
commits into
master
Choose a base branch
from
deflate-improve-comp
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+3,901
−1,551
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…speed Fixes golang#75532 This improves the compression speed of the flate package. This is a cleaned version of github.com/klauspost/compress/flate Overall changes: * Compression level 2-6 are custom implementations. * Compression level 7-9 tweaked to match levels 2-6 with minor improvements. * Tokens are encoded and indexed when added. * Huffman encoding attempts to continue blocks instead of always starting a new one. * Loads/Stores in separate functions and can be made to use unsafe. In overall terms this attempts to better balance out the compression levels, which tended to have little spread in the top levels. The intention is to place "default" at the place where performance drops off considerably without a proportional improvement in compression ratio. In my package I have set "5" to be the default, but this keeps it at level 6. There are built-in benchmarks using the standard library's benchmark below. I do not think this is a particular good representation of different data types, so I have also done benchmarks on various data types. I have compiled the benchmarks on https://stdeflate.klauspost.com/ The main focus has been on level 1 (fastest), level 5+6 (default) and level 9 (smallest). It is quite rare that levels outside of this are used, but they should still fit their role reasonably. Level 9 will attempt more aggressive compression, but will also typically be slightly slower than before. I hope the graphs above shows that focusing on a few data types doesn't always give the full picture. My own observations: Level 1 and 2 are often "trading places" depending on data type. Since level 1 is usually the lowest compressing of the two - and mostly slightly faster, with lower memory usage - it is placed as the lowest. The switchover between level 6 and 7 is not always smooth, since the search method changes significantly. Random data is now ~100x faster on levels 2-6, and ~3 faster on levels 7-9. You can feed pre-compressed data with no significant speed penalty. "Unsafe" operations have been removed for now. They can trivially be added back. This is an approximately 10% speed penalty. benchmark old ns/op new ns/op delta BenchmarkEncode/Digits/Huffman/1e4-32 11431 8001 -30.01% BenchmarkEncode/Digits/Huffman/1e5-32 123175 74780 -39.29% BenchmarkEncode/Digits/Huffman/1e6-32 1260402 750022 -40.49% BenchmarkEncode/Digits/Speed/1e4-32 35100 23758 -32.31% BenchmarkEncode/Digits/Speed/1e5-32 675355 385954 -42.85% BenchmarkEncode/Digits/Speed/1e6-32 6878375 4873784 -29.14% BenchmarkEncode/Digits/Default/1e4-32 63411 40974 -35.38% BenchmarkEncode/Digits/Default/1e5-32 1815762 801563 -55.86% BenchmarkEncode/Digits/Default/1e6-32 18875894 8101836 -57.08% BenchmarkEncode/Digits/Compression/1e4-32 63859 85275 +33.54% BenchmarkEncode/Digits/Compression/1e5-32 1803745 2752174 +52.58% BenchmarkEncode/Digits/Compression/1e6-32 18931995 30727403 +62.30% BenchmarkEncode/Newton/Huffman/1e4-32 15770 11108 -29.56% BenchmarkEncode/Newton/Huffman/1e5-32 134567 85103 -36.76% BenchmarkEncode/Newton/Huffman/1e6-32 1663889 1030186 -38.09% BenchmarkEncode/Newton/Speed/1e4-32 32749 22934 -29.97% BenchmarkEncode/Newton/Speed/1e5-32 565609 336750 -40.46% BenchmarkEncode/Newton/Speed/1e6-32 5996011 3815437 -36.37% BenchmarkEncode/Newton/Default/1e4-32 70505 34148 -51.57% BenchmarkEncode/Newton/Default/1e5-32 2374066 570673 -75.96% BenchmarkEncode/Newton/Default/1e6-32 24562355 5975917 -75.67% BenchmarkEncode/Newton/Compression/1e4-32 71505 77670 +8.62% BenchmarkEncode/Newton/Compression/1e5-32 3345768 3730804 +11.51% BenchmarkEncode/Newton/Compression/1e6-32 35770364 39768939 +11.18% benchmark old MB/s new MB/s speedup BenchmarkEncode/Digits/Huffman/1e4-32 874.80 1249.91 1.43x BenchmarkEncode/Digits/Huffman/1e5-32 811.86 1337.25 1.65x BenchmarkEncode/Digits/Huffman/1e6-32 793.40 1333.29 1.68x BenchmarkEncode/Digits/Speed/1e4-32 284.90 420.91 1.48x BenchmarkEncode/Digits/Speed/1e5-32 148.07 259.10 1.75x BenchmarkEncode/Digits/Speed/1e6-32 145.38 205.18 1.41x BenchmarkEncode/Digits/Default/1e4-32 157.70 244.06 1.55x BenchmarkEncode/Digits/Default/1e5-32 55.07 124.76 2.27x BenchmarkEncode/Digits/Default/1e6-32 52.98 123.43 2.33x BenchmarkEncode/Digits/Compression/1e4-32 156.59 117.27 0.75x BenchmarkEncode/Digits/Compression/1e5-32 55.44 36.33 0.66x BenchmarkEncode/Digits/Compression/1e6-32 52.82 32.54 0.62x BenchmarkEncode/Newton/Huffman/1e4-32 634.13 900.25 1.42x BenchmarkEncode/Newton/Huffman/1e5-32 743.12 1175.04 1.58x BenchmarkEncode/Newton/Huffman/1e6-32 601.00 970.70 1.62x BenchmarkEncode/Newton/Speed/1e4-32 305.35 436.03 1.43x BenchmarkEncode/Newton/Speed/1e5-32 176.80 296.96 1.68x BenchmarkEncode/Newton/Speed/1e6-32 166.78 262.09 1.57x BenchmarkEncode/Newton/Default/1e4-32 141.83 292.84 2.06x BenchmarkEncode/Newton/Default/1e5-32 42.12 175.23 4.16x BenchmarkEncode/Newton/Default/1e6-32 40.71 167.34 4.11x BenchmarkEncode/Newton/Compression/1e4-32 139.85 128.75 0.92x BenchmarkEncode/Newton/Compression/1e5-32 29.89 26.80 0.90x BenchmarkEncode/Newton/Compression/1e6-32 27.96 25.15 0.90x Static Memory Usage: Before: Level -2: Memory Used: 704KB, 8 allocs Level -1: Memory Used: 776KB, 7 allocs Level 0: Memory Used: 704KB, 7 allocs Level 1: Memory Used: 1160KB, 13 allocs Level 2: Memory Used: 776KB, 8 allocs Level 3: Memory Used: 776KB, 8 allocs Level 4: Memory Used: 776KB, 8 allocs Level 5: Memory Used: 776KB, 8 allocs Level 6: Memory Used: 776KB, 8 allocs Level 7: Memory Used: 776KB, 8 allocs Level 8: Memory Used: 776KB, 9 allocs Level 9: Memory Used: 776KB, 8 allocs After: Level -2: Memory Used: 272KB, 12 allocs Level -1: Memory Used: 1016KB, 7 allocs Level 0: Memory Used: 304KB, 6 allocs Level 1: Memory Used: 760KB, 13 allocs Level 2: Memory Used: 1144KB, 8 allocs Level 3: Memory Used: 1144KB, 8 allocs Level 4: Memory Used: 888KB, 14 allocs Level 5: Memory Used: 1016KB, 8 allocs Level 6: Memory Used: 1016KB, 8 allocs Level 7: Memory Used: 952KB, 7 allocs Level 8: Memory Used: 952KB, 7 allocs Level 9: Memory Used: 1080KB, 9 allocs This package has been fuzz tested for about 24 hours. Currently, there is about 1h between new "interesting" finds. Change-Id: Icb4c9839dc8f1bb96fd6d548038679a7554a559b
af6999e
to
37c78b5
Compare
Change-Id: I0ac5571da9585daba9491b360c9a6b4e0cecbcee
…table bytes. Change-Id: Ia141c7ec888bf51ceb6351d2a1c3f1501c2c4e12
Change-Id: I1cef87da8cf7a2f2b330115f8eeecb7bf825af76
37c78b5
to
34e6762
Compare
fe3ba74
to
d17a28a
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes golang#75532
This improves the compression speed of the flate package.
This is a cleaned version of github.com/klauspost/compress/flate
Overall changes:
In overall terms this attempts to better balance out the compression levels,
which tended to have little spread in the top levels.
The intention is to place "default" at the place where performance drops off
considerably without a proportional improvement in compression ratio.
In my package I have set "5" to be the default, but this keeps it at level 6.
"Unsafe" operations have been removed for now.
They can trivially be added back.
This is an approximately 10% speed penalty.
There are built-in benchmarks using the standard library's benchmark below.
I do not think this is a particular good representation of different
data types, so I have also done benchmarks on various data types.
I have compiled the benchmarks on https://stdeflate.klauspost.com/
The main focus has been on level 1 (fastest),
level 5+6 (default) and level 9 (smallest).
It is quite rare that levels outside of this are used, but they should still
fit their role reasonably.
Level 9 will attempt more aggressive compression,
but will also typically be slightly slower than before.
I hope the graphs above shows that focusing on a few data types
doesn't always give the full picture.
My own observations:
Level 1 and 2 are often "trading places" depending on data type.
Since level 1 is usually the lowest compressing of the two -
mostly slightly faster, with lower memory usage -
it is placed as the lowest.
The switchover between level 6 and 7 is not always smooth,
since the search method changes significantly.
Random data is now ~100x faster on levels 2-6, and ~3 faster on levels 7-9.
You can feed pre-compressed data with no significant speed penalty.
benchmark old ns/op new ns/op delta
BenchmarkEncode/Digits/Huffman/1e4-32 11431 8001 -30.01%
BenchmarkEncode/Digits/Huffman/1e5-32 123175 74780 -39.29%
BenchmarkEncode/Digits/Huffman/1e6-32 1260402 750022 -40.49%
BenchmarkEncode/Digits/Speed/1e4-32 35100 23758 -32.31%
BenchmarkEncode/Digits/Speed/1e5-32 675355 385954 -42.85%
BenchmarkEncode/Digits/Speed/1e6-32 6878375 4873784 -29.14%
BenchmarkEncode/Digits/Default/1e4-32 63411 40974 -35.38%
BenchmarkEncode/Digits/Default/1e5-32 1815762 801563 -55.86%
BenchmarkEncode/Digits/Default/1e6-32 18875894 8101836 -57.08%
BenchmarkEncode/Digits/Compression/1e4-32 63859 85275 +33.54%
BenchmarkEncode/Digits/Compression/1e5-32 1803745 2752174 +52.58%
BenchmarkEncode/Digits/Compression/1e6-32 18931995 30727403 +62.30%
BenchmarkEncode/Newton/Huffman/1e4-32 15770 11108 -29.56%
BenchmarkEncode/Newton/Huffman/1e5-32 134567 85103 -36.76%
BenchmarkEncode/Newton/Huffman/1e6-32 1663889 1030186 -38.09%
BenchmarkEncode/Newton/Speed/1e4-32 32749 22934 -29.97%
BenchmarkEncode/Newton/Speed/1e5-32 565609 336750 -40.46%
BenchmarkEncode/Newton/Speed/1e6-32 5996011 3815437 -36.37%
BenchmarkEncode/Newton/Default/1e4-32 70505 34148 -51.57%
BenchmarkEncode/Newton/Default/1e5-32 2374066 570673 -75.96%
BenchmarkEncode/Newton/Default/1e6-32 24562355 5975917 -75.67%
BenchmarkEncode/Newton/Compression/1e4-32 71505 77670 +8.62%
BenchmarkEncode/Newton/Compression/1e5-32 3345768 3730804 +11.51%
BenchmarkEncode/Newton/Compression/1e6-32 35770364 39768939 +11.18%
benchmark old MB/s new MB/s speedup
BenchmarkEncode/Digits/Huffman/1e4-32 874.80 1249.91 1.43x
BenchmarkEncode/Digits/Huffman/1e5-32 811.86 1337.25 1.65x
BenchmarkEncode/Digits/Huffman/1e6-32 793.40 1333.29 1.68x
BenchmarkEncode/Digits/Speed/1e4-32 284.90 420.91 1.48x
BenchmarkEncode/Digits/Speed/1e5-32 148.07 259.10 1.75x
BenchmarkEncode/Digits/Speed/1e6-32 145.38 205.18 1.41x
BenchmarkEncode/Digits/Default/1e4-32 157.70 244.06 1.55x
BenchmarkEncode/Digits/Default/1e5-32 55.07 124.76 2.27x
BenchmarkEncode/Digits/Default/1e6-32 52.98 123.43 2.33x
BenchmarkEncode/Digits/Compression/1e4-32 156.59 117.27 0.75x
BenchmarkEncode/Digits/Compression/1e5-32 55.44 36.33 0.66x
BenchmarkEncode/Digits/Compression/1e6-32 52.82 32.54 0.62x
BenchmarkEncode/Newton/Huffman/1e4-32 634.13 900.25 1.42x
BenchmarkEncode/Newton/Huffman/1e5-32 743.12 1175.04 1.58x
BenchmarkEncode/Newton/Huffman/1e6-32 601.00 970.70 1.62x
BenchmarkEncode/Newton/Speed/1e4-32 305.35 436.03 1.43x
BenchmarkEncode/Newton/Speed/1e5-32 176.80 296.96 1.68x
BenchmarkEncode/Newton/Speed/1e6-32 166.78 262.09 1.57x
BenchmarkEncode/Newton/Default/1e4-32 141.83 292.84 2.06x
BenchmarkEncode/Newton/Default/1e5-32 42.12 175.23 4.16x
BenchmarkEncode/Newton/Default/1e6-32 40.71 167.34 4.11x
BenchmarkEncode/Newton/Compression/1e4-32 139.85 128.75 0.92x
BenchmarkEncode/Newton/Compression/1e5-32 29.89 26.80 0.90x
BenchmarkEncode/Newton/Compression/1e6-32 27.96 25.15 0.90x
Static Memory Usage:
Before:
Level -2: Memory Used: 704KB, 8 allocs
Level -1: Memory Used: 776KB, 7 allocs
Level 0: Memory Used: 704KB, 7 allocs
Level 1: Memory Used: 1160KB, 13 allocs
Level 2: Memory Used: 776KB, 8 allocs
Level 3: Memory Used: 776KB, 8 allocs
Level 4: Memory Used: 776KB, 8 allocs
Level 5: Memory Used: 776KB, 8 allocs
Level 6: Memory Used: 776KB, 8 allocs
Level 7: Memory Used: 776KB, 8 allocs
Level 8: Memory Used: 776KB, 9 allocs
Level 9: Memory Used: 776KB, 8 allocs
After:
Level -2: Memory Used: 272KB, 12 allocs
Level -1: Memory Used: 1016KB, 7 allocs
Level 0: Memory Used: 304KB, 6 allocs
Level 1: Memory Used: 760KB, 13 allocs
Level 2: Memory Used: 1144KB, 8 allocs
Level 3: Memory Used: 1144KB, 8 allocs
Level 4: Memory Used: 888KB, 14 allocs
Level 5: Memory Used: 1016KB, 8 allocs
Level 6: Memory Used: 1016KB, 8 allocs
Level 7: Memory Used: 952KB, 7 allocs
Level 8: Memory Used: 952KB, 7 allocs
Level 9: Memory Used: 1080KB, 9 allocs
This package has been fuzz tested for about 24 hours.
Currently, there is about 1h between new "interesting" finds.
Change-Id: Icb4c9839dc8f1bb96fd6d548038679a7554a559b
🔄 This is a mirror of upstream PR golang#75624