forked from golang/go
-
Notifications
You must be signed in to change notification settings - Fork 0
cmd/internal/obj/riscv: support zawrs assembly on riscv64 #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…kages Intrinsifying things inside the module (crypto/internal/fips140/subtle) is asking for trouble, as the import paths are rewritten by the GOFIPS140 mechanism, and we might have to support multiple modules in the future. Importing crypto/subtle from inside a FIPS 140-3 module is not allowed, and is basically asking for circular dependencies. Instead, break off the intrinsics into their own package (crypto/internal/constanttime), and keep the byte slice operations in crypto/internal/fips140/subtle. crypto/subtle then becomes a thin dispatch layer. Change-Id: I6a6a6964cd5cb5ad06e9d1679201447f5a811da4 Reviewed-on: https://go-review.googlesource.com/c/go/+/716120 Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Keith Randall <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Filippo Valsorda <[email protected]> Reviewed-by: Jorropo <[email protected]>
Since we always can get the address of `CALL runtime.deferreturn(SB)` from the unwinder, so it is not necessary to record the caller's pc in the _defer struct. For the stack allocated _defer, this CL makes the frame smaller. Change-Id: I0fd347e4bc07cf8a9b954816323df30fc52552b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/716720 Reviewed-by: Keith Randall <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Michael Knyszek <[email protected]>
Fixes golang#73794 Change-Id: I0a57db05aacfa805213fe8278fc727e76eb8a65e GitHub-Last-Rev: 3494d93 GitHub-Pull-Request: golang#73795 Reviewed-on: https://go-review.googlesource.com/c/go/+/674415 Reviewed-by: Sean Liao <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Michael Pratt <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
In our case, it greatly improves the performance of continuously collecting diff profiles from the net/http/pprof endpoint, such as /debug/pprof/allocs?seconds=30. This CL is a cherry-pick of my PR upstream: google/pprof#951 Benchmark of profile Parse func: goos: linux goarch: amd64 pkg: github.com/google/pprof/profile cpu: 13th Gen Intel(R) Core(TM) i7-1360P │ old-parse.txt │ new-parse.txt │ │ sec/op │ sec/op vs base │ Parse-16 62.07m ± 13% 55.54m ± 13% -10.52% (p=0.035 n=10) │ old-parse.txt │ new-parse.txt │ │ B/op │ B/op vs base │ Parse-16 47.56Mi ± 0% 41.09Mi ± 0% -13.59% (p=0.000 n=10) │ old-parse.txt │ new-parse.txt │ │ allocs/op │ allocs/op vs base │ Parse-16 272.9k ± 0% 175.8k ± 0% -35.58% (p=0.000 n=10) Change-Id: I737ff9b9f815fdc56bc3b5743403717c4b6f07fd GitHub-Last-Rev: a09108f GitHub-Pull-Request: golang#76145 Reviewed-on: https://go-review.googlesource.com/c/go/+/717081 Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Florian Lehner <[email protected]>
Change-Id: I26302d801732f40b1fe6b30ff69d222047bca490 Reviewed-on: https://go-review.googlesource.com/c/go/+/716740 Reviewed-by: Robert Griesemer <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Knyszek <[email protected]>
Change-Id: I0ea4d15da163cec6fe2a703376ce5a6032e15484 Reviewed-on: https://go-review.googlesource.com/c/go/+/714861 Reviewed-by: Keith Randall <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Keith Randall <[email protected]> Auto-Submit: Keith Randall <[email protected]>
Knowing how many times cgo is used is useful information to have in the local telemetry database. It also opens the door for uploading them in the future if desired. Change-Id: Ia92b11fc489f015bbface7f28ed5a5c2871c44f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/707055 Reviewed-by: Michael Matloob <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Robert Findley <[email protected]> Reviewed-by: Michael Matloob <[email protected]> Reviewed-by: Florian Lehner <[email protected]>
Move global variable to a field on the State type. Change-Id: I1edd32e1d28ce814bcd75501098ee4b22227546b Reviewed-on: https://go-review.googlesource.com/c/go/+/716162 Reviewed-by: Michael Matloob <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Matloob <[email protected]>
[git-generate] cd src/cmd/go/internal/modload rf ' mv InitWorkfile State.InitWorkfile mv FindGoWork State.FindGoWork mv WillBeEnabled State.WillBeEnabled mv Enabled State.Enabled mv inWorkspaceMode State.inWorkspaceMode mv HasModRoot State.HasModRoot mv MustHaveModRoot State.MustHaveModRoot mv ModFilePath State.ModFilePath ' Change-Id: I207113868af037c9c0049f4207c3d3b4c19468bb Reviewed-on: https://go-review.googlesource.com/c/go/+/716602 Reviewed-by: Michael Matloob <[email protected]> Reviewed-by: Michael Matloob <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Change-Id: I43ea575aff87a3e420477cb26d35185d03df5ccc Reviewed-on: https://go-review.googlesource.com/c/go/+/713283 Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
We've been seeing the flakes where we get a 'no errors' output on freebsd in addition to windows and solaris. Also allow that case to avoid flakes. For golang#73976 Change-Id: I6a6a696445ec908b55520d8d75e7c1f867b9c092 Reviewed-on: https://go-review.googlesource.com/c/go/+/715640 Reviewed-by: Alan Donovan <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Matloob <[email protected]> Reviewed-by: Ian Alexander <[email protected]>
Support arm64 FMOVQ with large offset in immediate which is encoded using register offset instruction in opldrr or opstrr. This will help allowing folding immediate into new ssa ops FMOVQload and FMOVQstore. For example: FMOVQ F0, -20000(R0) is encoded as following: MOVD 3(PC), R27 FMOVQ F0, (R0)(R27) RET ffff b1e0 # constant value Change-Id: Ib71f92f6ff4b310bda004a440b1df41ffe164523 Reviewed-on: https://go-review.googlesource.com/c/go/+/716960 Reviewed-by: Cherry Mui <[email protected]> Auto-Submit: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
This commit adds test coverage of path building and name constraint verification using the suite of test data provided by Netflix's BetterTLS project. Since the uncompressed raw JSON test data exported by BetterTLS for external test integrations is ~31MB we use a similar approach to the BoGo and ACVP test integrations and fetch the BetterTLS Go module, and run its export tool on-the-fly to generate the test data in a tempdir. As expected, all tests pass currently and this coverage is mainly helpful in catching regressions, especially with tricky/cursed name constraints. Change-Id: I23d7c24232e314aece86bcbfd133b7f02c9e71b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/717420 TryBot-Bypass: Daniel McCarney <[email protected]> Reviewed-by: Roland Shoemaker <[email protected]> Auto-Submit: Daniel McCarney <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
This makes the user experience better, before users would receive an unknown godebug error message, now we explicitly mention that it was removed and link to go.dev/doc/godebug where users can find more information about the removal. Additionally we keep all the removed GODEBUGs in the source, making sure we do not reuse such GODEBUG after it is removed. Updates golang#72111 Updates golang#75316 Change-Id: I6a6a6964cce1c100108fdba4bfba7d13cd9a893a Reviewed-on: https://go-review.googlesource.com/c/go/+/701875 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Mateusz Poliwczak <[email protected]> Reviewed-by: Michael Matloob <[email protected]> Reviewed-by: Michael Matloob <[email protected]>
When an unpinned Go pointer (or a pointer to an unpinned Go pointer) is
returned from Go to C,
1 package main
2
3 import (
4 "C"
5 )
6
7 //export foo
8 func foo(CLine *C.char) string {
9 return C.GoString(CLine)
10 }
11
12
13 func main() {
14 }
The error message mentions the file/line of the cgo wrapper,
panic: runtime error: cgo result is unpinned Go pointer or points to unpinned Go pointer
goroutine 17 [running, locked to thread]:
panic({0x798f2341a4c0?, 0xc000112000?})
/usr/lib/go/src/runtime/panic.go:802 +0x168
runtime.cgoCheckArg(0x798f23417e20, 0xc000066e50, 0x0?, 0x0, {0x798f233f5a62, 0x42})
/usr/lib/go/src/runtime/cgocall.go:679 +0x35b
runtime.cgoCheckResult({0x798f23417e20, 0xc000066e50})
/usr/lib/go/src/runtime/cgocall.go:795 +0x4b
_cgoexp_3c910ddb72c4_foo(0x7ffc9fa9bfa0)
_cgo_gotypes.go:65 +0x5d
runtime.cgocallbackg1(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0)
/usr/lib/go/src/runtime/cgocall.go:446 +0x289
runtime.cgocallbackg(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0)
/usr/lib/go/src/runtime/cgocall.go:350 +0x132
runtime.cgocallbackg(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0)
<autogenerated>:1 +0x2b
runtime.cgocallback(0x0, 0x0, 0x0)
/usr/lib/go/src/runtime/asm_amd64.s:1082 +0xcd
runtime.goexit({})
/usr/lib/go/src/runtime/asm_amd64.s:1693 +0x1
The cgo wrapper (_cgoexp_3c910ddb72c4_foo) is located in a temporary
build artifact (_cgo_gotypes.go)
$ go tool cgo -objdir objdir parse.go
$ cat -n objdir/_cgo_gotypes.go | sed -n '55,70p'
55 //go:cgo_export_dynamic foo
56 //go:linkname _cgoexp_d48770e267d1_foo _cgoexp_d48770e267d1_foo
57 //go:cgo_export_static _cgoexp_d48770e267d1_foo
58 func _cgoexp_d48770e267d1_foo(a *struct {
59 p0 *_Ctype_char
60 r0 string
61 }) {
62 a.r0 = foo(a.p0)
63 _cgoCheckResult(a.r0)
64 }
The file/line of the export'ed function is expected in the error message.
Use it in error messages.
panic: runtime error: cgo result is unpinned Go pointer or points to unpinned Go pointer
goroutine 17 [running, locked to thread]:
panic({0x7df72b1d8ae0?, 0x3ec8a1790030?})
/mnt/go/src/runtime/panic.go:877 +0x16f
runtime.cgoCheckArg(0x7df72b1d62c0, 0x3ec8a16eee50, 0x68?, 0x0, {0x7df72b1ad44c, 0x42})
/mnt/go/src/runtime/cgocall.go:679 +0x35b
runtime.cgoCheckResult({0x7df72b1d62c0, 0x3ec8a16eee50})
/mnt/go/src/runtime/cgocall.go:795 +0x4b
_cgoexp_3c910ddb72c4_foo(0x7ffca1b21020)
/mnt/tmp/parse.go:8 +0x5d
runtime.cgocallbackg1(0x7df72b1a4360, 0x7ffca1b21020, 0x0)
/mnt/go/src/runtime/cgocall.go:446 +0x289
runtime.cgocallbackg(0x7df72b1a4360, 0x7ffca1b21020, 0x0)
/mnt/go/src/runtime/cgocall.go:350 +0x132
runtime.cgocallbackg(0x7df72b1a4360, 0x7ffca1b21020, 0x0)
<autogenerated>:1 +0x2b
runtime.cgocallback(0x0, 0x0, 0x0)
/mnt/go/src/runtime/asm_amd64.s:1101 +0xcd
runtime.goexit({})
/mnt/go/src/runtime/asm_amd64.s:1712 +0x1
So doing, fix typos in comments.
Link: https://web.archive.org/web/20251008114504/https://dave.cheney.net/2018/01/08/gos-hidden-pragmas
Suggested-by: Keith Randall <[email protected]>
For golang#75856
Change-Id: I0bf36d5c8c5c0c7df13b00818bc4641009058979
GitHub-Last-Rev: e65839c
GitHub-Pull-Request: golang#76118
Reviewed-on: https://go-review.googlesource.com/c/go/+/716441
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
Auto-Submit: Michael Pratt <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Florian Lehner <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
The panic calls gopanic which may have write barriers, but castogscanstatus is called from //go:nowritebarrier contexts. The panic is dead code anyway, and appears immediately before a call to 'throw'. Change-Id: I4a8e296b71bf002295a3aa1db4f723c305ed939a Reviewed-on: https://go-review.googlesource.com/c/go/+/717406 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
net/http/cgi.TestCopyError calls runtime.Stack to take a stack trace of all goroutines, and searches for a specific line in that stack trace. It currently sometimes fails because it encounters the goroutine its looking for in the small window where a goroutine might be in _Grunning while in a syscall, introduced in CL 646198. In that case, the traceback will give up, failing to print the stack TestCopyError is expecting. This represents a general regression, since previously runtime.Stack could never fail to take a goroutine's stack; giving up was only possible in fatal panic cases. Fix this the same way we fixed goroutine profiles: allow the stack trace to proceed if the g's syscallsp != 0. This is safe in any stop-the-world-related context, because syscallsp won't be mutated while the goroutine fails to acquire a P, and thus fails to fully exit the syscall context. This also means the stack below syscallsp won't be mutated, and thus taking a traceback is also safe. Fixes golang#66639. Change-Id: Ie6f4b0661d9f8df02c9b8434e99bc95f26fe5f0d Reviewed-on: https://go-review.googlesource.com/c/go/+/716680 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Should make cmd/link/internal/ld.TestAbstractOriginSanity happier. Change-Id: I121927d42e527ff23d996e7387066f149b11cc59 Reviewed-on: https://go-review.googlesource.com/c/go/+/717480 Reviewed-by: Cherry Mui <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
…upport
Go asm syntax:
VPERMIW $0x1b, vj, vd
XVPERMI{W,V,Q} $0x1b, xj, xd
Equivalent platform assembler syntax:
vpermi.w vd, vj, $0x1b
xvpermi.{w,d,q} xd, xj, $0x1b
Change-Id: Ie23b2fdd09b4c93801dc804913206f1c5a496268
Reviewed-on: https://go-review.googlesource.com/c/go/+/716800
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Meidan Li <[email protected]>
Reviewed-by: sophie zhao <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
…en vector registers Go asm syntax: VMOVQ Vj, Vd XVMOVQ Xj, Xd Equivalent platform assembler syntax: vslli.d vd, vj, 0x0 xvslli.d xd, xj, 0x0 Change-Id: Ifddc3d4d3fbaa6fee2e079bf2ebfe96a2febaa1c Reviewed-on: https://go-review.googlesource.com/c/go/+/716801 Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Meidan Li <[email protected]> Reviewed-by: sophie zhao <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
The exact meaning of pow10 was not defined nor tested directly. Define it as pow10(e) returns mant, exp where mant/2^128 * 2**exp = 10^e. This is the most natural definition but is off-by-one from what it had been returning. Fix the off-by-one and then adjust the call sites to stop compensating for it. Change-Id: I9ee475854f30be4bd0d4f4d770a6b12ec68281fe Reviewed-on: https://go-review.googlesource.com/c/go/+/717180 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Alan Donovan <[email protected]> Auto-Submit: Russ Cox <[email protected]>
ftoaFixed is in the next CL; this proves the tests are correct against the current implementation, and it adds a benchmark for comparison with the new implementation. Change-Id: I7ac8a1f699b693ea6d11a7122b22fc70cc135af6 Reviewed-on: https://go-review.googlesource.com/c/go/+/717181 Auto-Submit: Russ Cox <[email protected]> Reviewed-by: Alan Donovan <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
The fixed-precision ftoa algorithm is not actually
documented in the Ryū paper, and it is fairly
straightforward: multiply by a power of 10 to get
an integer that contains the digits we need.
There is also no need for separate float32 and float64
implementations.
This CL implements a new fixedFtoa, separate from Ryū.
The overall algorithm is the same, but the new code
is simpler, faster, and better documented.
Now ftoaryu.go is only about shortest-output formatting,
so if and when yet another algorithm comes along, it will
be clearer what should be replaced (all of ftoaryu.go)
and what should not (all of ftoafixed.go).
benchmark \ host linux-arm64 local linux-amd64 s7 linux-386 s7:GOARCH=386
vs base vs base vs base vs base vs base vs base
AppendFloat/Decimal -0.18% ~ ~ -0.68% +0.49% -0.79%
AppendFloat/Float +0.09% ~ +1.50% +0.84% -0.37% -0.69%
AppendFloat/Exp -0.51% ~ ~ +1.20% -1.27% -1.01%
AppendFloat/NegExp -1.01% ~ +3.43% +1.35% -2.33% ~
AppendFloat/LongExp -1.22% +0.77% ~ ~ -1.48% ~
AppendFloat/Big -2.07% ~ -2.07% -1.97% -2.89% -2.93%
AppendFloat/BinaryExp -0.28% +1.06% ~ +1.35% -0.64% -1.64%
AppendFloat/32Integer ~ ~ ~ -0.79% ~ -0.66%
AppendFloat/32ExactFraction -0.50% ~ +5.69% ~ -1.24% +0.69%
AppendFloat/32Point ~ -1.19% +2.59% +1.03% -1.37% +0.80%
AppendFloat/32Exp -3.39% -2.79% -8.36% -0.94% -5.72% -5.92%
AppendFloat/32NegExp -0.63% ~ ~ +0.98% -1.34% -0.73%
AppendFloat/32Shortest -1.00% +1.36% +2.94% ~ ~ ~
AppendFloat/32Fixed8Hard -5.91% -12.45% -6.62% ~ +18.46% +11.61%
AppendFloat/32Fixed9Hard -6.53% -11.35% -6.01% -0.97% -18.31% -9.16%
AppendFloat/64Fixed1 -13.84% -16.90% -13.13% -10.71% -24.52% -18.94%
AppendFloat/64Fixed2 -11.12% -16.97% -12.13% -9.88% -22.73% -15.48%
AppendFloat/64Fixed2.5 -21.98% -20.75% -19.08% -14.74% -28.11% -24.92%
AppendFloat/64Fixed3 -11.53% -16.21% -10.75% -7.53% -23.11% -15.78%
AppendFloat/64Fixed4 -12.89% -12.36% -11.07% -9.79% -14.51% -13.44%
AppendFloat/64Fixed5Hard -47.62% -38.59% -40.83% -37.06% -60.51% -55.29%
AppendFloat/64Fixed12 -7.40% ~ -8.56% -4.31% -13.82% -8.61%
AppendFloat/64Fixed16 -9.10% -8.95% -6.92% -3.92% -12.99% -9.03%
AppendFloat/64Fixed12Hard -9.14% -5.24% -6.23% -4.82% -13.58% -8.99%
AppendFloat/64Fixed17Hard -6.80% ~ -4.03% -2.84% -19.81% -10.27%
AppendFloat/64Fixed18Hard -0.12% ~ ~ ~ ~ ~
AppendFloat/64FixedF1 ~ ~ ~ ~ -0.40% +2.72%
AppendFloat/64FixedF2 -0.18% ~ -1.98% -0.95% ~ +1.25%
AppendFloat/64FixedF3 -0.29% ~ ~ ~ ~ +1.22%
AppendFloat/Slowpath64 -1.16% ~ ~ ~ ~ -2.16%
AppendFloat/SlowpathDenormal64 -1.09% ~ ~ -0.88% -0.83% ~
host: linux-arm64
goos: linux
goarch: arm64
pkg: internal/strconv
cpu: unknown
│ 14b7e09f493 │ f9bf7fcb8e2 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-8 60.35n ± 0% 60.24n ± 0% -0.18% (p=0.000 n=20)
AppendFloat/Float-8 88.83n ± 0% 88.91n ± 0% +0.09% (p=0.000 n=20)
AppendFloat/Exp-8 93.55n ± 0% 93.06n ± 0% -0.51% (p=0.000 n=20)
AppendFloat/NegExp-8 94.01n ± 0% 93.06n ± 0% -1.01% (p=0.000 n=20)
AppendFloat/LongExp-8 101.00n ± 0% 99.77n ± 0% -1.22% (p=0.000 n=20)
AppendFloat/Big-8 106.1n ± 0% 103.9n ± 0% -2.07% (p=0.000 n=20)
AppendFloat/BinaryExp-8 47.48n ± 0% 47.35n ± 0% -0.28% (p=0.000 n=20)
AppendFloat/32Integer-8 60.45n ± 0% 60.43n ± 0% ~ (p=0.150 n=20)
AppendFloat/32ExactFraction-8 86.65n ± 0% 86.22n ± 0% -0.50% (p=0.000 n=20)
AppendFloat/32Point-8 83.26n ± 0% 83.21n ± 0% ~ (p=0.046 n=20)
AppendFloat/32Exp-8 92.55n ± 0% 89.42n ± 0% -3.39% (p=0.000 n=20)
AppendFloat/32NegExp-8 87.89n ± 0% 87.34n ± 0% -0.63% (p=0.000 n=20)
AppendFloat/32Shortest-8 77.05n ± 0% 76.28n ± 0% -1.00% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-8 55.73n ± 0% 52.44n ± 0% -5.91% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-8 64.80n ± 0% 60.57n ± 0% -6.53% (p=0.000 n=20)
AppendFloat/64Fixed1-8 53.72n ± 0% 46.29n ± 0% -13.84% (p=0.000 n=20)
AppendFloat/64Fixed2-8 52.64n ± 0% 46.79n ± 0% -11.12% (p=0.000 n=20)
AppendFloat/64Fixed2.5-8 56.01n ± 0% 43.70n ± 0% -21.98% (p=0.000 n=20)
AppendFloat/64Fixed3-8 53.38n ± 0% 47.23n ± 0% -11.53% (p=0.000 n=20)
AppendFloat/64Fixed4-8 50.62n ± 0% 44.10n ± 0% -12.89% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-8 98.94n ± 0% 51.82n ± 0% -47.62% (p=0.000 n=20)
AppendFloat/64Fixed12-8 84.70n ± 0% 78.44n ± 0% -7.40% (p=0.000 n=20)
AppendFloat/64Fixed16-8 71.68n ± 0% 65.16n ± 0% -9.10% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-8 68.41n ± 0% 62.16n ± 0% -9.14% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-8 79.31n ± 0% 73.92n ± 0% -6.80% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-8 4.290µ ± 0% 4.285µ ± 0% -0.12% (p=0.000 n=20)
AppendFloat/64FixedF1-8 216.0n ± 0% 216.1n ± 0% ~ (p=0.090 n=20)
AppendFloat/64FixedF2-8 228.2n ± 0% 227.8n ± 0% -0.18% (p=0.000 n=20)
AppendFloat/64FixedF3-8 208.8n ± 0% 208.2n ± 0% -0.29% (p=0.000 n=20)
AppendFloat/Slowpath64-8 98.56n ± 0% 97.42n ± 0% -1.16% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-8 95.81n ± 0% 94.77n ± 0% -1.09% (p=0.000 n=20)
geomean 93.81n 87.87n -6.33%
host: local
goos: darwin
cpu: Apple M3 Pro
│ 14b7e09f493 │ f9bf7fcb8e2 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-12 21.14n ± 0% 21.15n ± 0% ~ (p=0.963 n=20)
AppendFloat/Float-12 32.48n ± 1% 32.43n ± 0% ~ (p=0.358 n=20)
AppendFloat/Exp-12 31.85n ± 0% 31.94n ± 1% ~ (p=0.634 n=20)
AppendFloat/NegExp-12 31.75n ± 0% 32.04n ± 0% ~ (p=0.004 n=20)
AppendFloat/LongExp-12 33.55n ± 0% 33.81n ± 0% +0.77% (p=0.000 n=20)
AppendFloat/Big-12 35.62n ± 1% 35.73n ± 1% ~ (p=0.888 n=20)
AppendFloat/BinaryExp-12 19.26n ± 0% 19.46n ± 1% +1.06% (p=0.000 n=20)
AppendFloat/32Integer-12 21.41n ± 0% 21.46n ± 1% ~ (p=0.733 n=20)
AppendFloat/32ExactFraction-12 31.23n ± 1% 31.30n ± 1% ~ (p=0.857 n=20)
AppendFloat/32Point-12 31.39n ± 1% 31.02n ± 0% -1.19% (p=0.000 n=20)
AppendFloat/32Exp-12 32.42n ± 1% 31.52n ± 1% -2.79% (p=0.000 n=20)
AppendFloat/32NegExp-12 30.66n ± 1% 30.66n ± 1% ~ (p=0.380 n=20)
AppendFloat/32Shortest-12 26.88n ± 1% 27.25n ± 1% +1.36% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-12 19.52n ± 0% 17.09n ± 1% -12.45% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-12 21.55n ± 2% 19.11n ± 1% -11.35% (p=0.000 n=20)
AppendFloat/64Fixed1-12 18.64n ± 0% 15.49n ± 0% -16.90% (p=0.000 n=20)
AppendFloat/64Fixed2-12 18.65n ± 0% 15.49n ± 0% -16.97% (p=0.000 n=20)
AppendFloat/64Fixed2.5-12 19.23n ± 1% 15.24n ± 0% -20.75% (p=0.000 n=20)
AppendFloat/64Fixed3-12 18.61n ± 0% 15.59n ± 1% -16.21% (p=0.000 n=20)
AppendFloat/64Fixed4-12 17.55n ± 1% 15.38n ± 0% -12.36% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-12 29.27n ± 1% 17.97n ± 0% -38.59% (p=0.000 n=20)
AppendFloat/64Fixed12-12 28.26n ± 1% 28.17n ± 10% ~ (p=0.941 n=20)
AppendFloat/64Fixed16-12 23.56n ± 0% 21.46n ± 0% -8.95% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-12 21.85n ± 2% 20.70n ± 1% -5.24% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-12 26.91n ± 1% 27.10n ± 0% ~ (p=0.059 n=20)
AppendFloat/64Fixed18Hard-12 2.197µ ± 1% 2.169µ ± 1% ~ (p=0.013 n=20)
AppendFloat/64FixedF1-12 103.7n ± 1% 103.3n ± 0% ~ (p=0.035 n=20)
AppendFloat/64FixedF2-12 114.8n ± 1% 114.1n ± 1% ~ (p=0.234 n=20)
AppendFloat/64FixedF3-12 107.8n ± 1% 107.1n ± 1% ~ (p=0.180 n=20)
AppendFloat/Slowpath64-12 32.05n ± 1% 32.00n ± 0% ~ (p=0.952 n=20)
AppendFloat/SlowpathDenormal64-12 29.98n ± 1% 30.20n ± 0% ~ (p=0.004 n=20)
geomean 33.83n 31.91n -5.68%
host: linux-amd64
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
│ 14b7e09f493 │ f9bf7fcb8e2 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-16 64.00n ± 1% 63.67n ± 1% ~ (p=0.784 n=20)
AppendFloat/Float-16 95.99n ± 1% 97.42n ± 1% +1.50% (p=0.000 n=20)
AppendFloat/Exp-16 97.59n ± 1% 97.72n ± 1% ~ (p=0.984 n=20)
AppendFloat/NegExp-16 97.80n ± 1% 101.15n ± 1% +3.43% (p=0.000 n=20)
AppendFloat/LongExp-16 103.1n ± 1% 104.5n ± 1% ~ (p=0.006 n=20)
AppendFloat/Big-16 110.8n ± 1% 108.5n ± 1% -2.07% (p=0.000 n=20)
AppendFloat/BinaryExp-16 47.82n ± 1% 47.33n ± 1% ~ (p=0.007 n=20)
AppendFloat/32Integer-16 63.65n ± 1% 63.51n ± 0% ~ (p=0.560 n=20)
AppendFloat/32ExactFraction-16 91.81n ± 1% 97.03n ± 1% +5.69% (p=0.000 n=20)
AppendFloat/32Point-16 89.84n ± 1% 92.16n ± 1% +2.59% (p=0.000 n=20)
AppendFloat/32Exp-16 103.80n ± 1% 95.12n ± 1% -8.36% (p=0.000 n=20)
AppendFloat/32NegExp-16 93.70n ± 1% 94.87n ± 1% ~ (p=0.003 n=20)
AppendFloat/32Shortest-16 83.98n ± 1% 86.45n ± 1% +2.94% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-16 61.91n ± 1% 57.81n ± 1% -6.62% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-16 71.08n ± 0% 66.81n ± 1% -6.01% (p=0.000 n=20)
AppendFloat/64Fixed1-16 59.27n ± 2% 51.49n ± 1% -13.13% (p=0.000 n=20)
AppendFloat/64Fixed2-16 57.89n ± 1% 50.87n ± 1% -12.13% (p=0.000 n=20)
AppendFloat/64Fixed2.5-16 61.04n ± 1% 49.40n ± 1% -19.08% (p=0.000 n=20)
AppendFloat/64Fixed3-16 58.42n ± 1% 52.14n ± 1% -10.75% (p=0.000 n=20)
AppendFloat/64Fixed4-16 56.52n ± 1% 50.27n ± 1% -11.07% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-16 97.79n ± 1% 57.86n ± 1% -40.83% (p=0.000 n=20)
AppendFloat/64Fixed12-16 90.78n ± 1% 83.01n ± 1% -8.56% (p=0.000 n=20)
AppendFloat/64Fixed16-16 76.11n ± 1% 70.84n ± 0% -6.92% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-16 73.56n ± 1% 68.98n ± 2% -6.23% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-16 83.20n ± 1% 79.85n ± 1% -4.03% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-16 4.947µ ± 1% 4.915µ ± 1% ~ (p=0.229 n=20)
AppendFloat/64FixedF1-16 242.4n ± 1% 239.4n ± 1% ~ (p=0.038 n=20)
AppendFloat/64FixedF2-16 257.7n ± 2% 252.6n ± 1% -1.98% (p=0.000 n=20)
AppendFloat/64FixedF3-16 237.5n ± 0% 237.5n ± 1% ~ (p=0.440 n=20)
AppendFloat/Slowpath64-16 99.75n ± 1% 99.78n ± 1% ~ (p=0.995 n=20)
AppendFloat/SlowpathDenormal64-16 97.41n ± 1% 98.20n ± 1% ~ (p=0.006 n=20)
geomean 100.7n 95.60n -5.05%
host: s7
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 14b7e09f493 │ f9bf7fcb8e2 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-32 22.19n ± 0% 22.04n ± 0% -0.68% (p=0.000 n=20)
AppendFloat/Float-32 34.59n ± 0% 34.88n ± 0% +0.84% (p=0.000 n=20)
AppendFloat/Exp-32 34.47n ± 0% 34.88n ± 0% +1.20% (p=0.000 n=20)
AppendFloat/NegExp-32 34.85n ± 0% 35.32n ± 0% +1.35% (p=0.000 n=20)
AppendFloat/LongExp-32 37.23n ± 0% 37.09n ± 0% ~ (p=0.003 n=20)
AppendFloat/Big-32 39.27n ± 0% 38.50n ± 0% -1.97% (p=0.000 n=20)
AppendFloat/BinaryExp-32 17.38n ± 0% 17.61n ± 0% +1.35% (p=0.000 n=20)
AppendFloat/32Integer-32 22.26n ± 0% 22.08n ± 0% -0.79% (p=0.000 n=20)
AppendFloat/32ExactFraction-32 32.82n ± 0% 32.91n ± 0% ~ (p=0.018 n=20)
AppendFloat/32Point-32 32.88n ± 0% 33.22n ± 0% +1.03% (p=0.000 n=20)
AppendFloat/32Exp-32 34.95n ± 0% 34.62n ± 0% -0.94% (p=0.000 n=20)
AppendFloat/32NegExp-32 33.23n ± 0% 33.55n ± 0% +0.98% (p=0.000 n=20)
AppendFloat/32Shortest-32 30.19n ± 0% 30.12n ± 0% ~ (p=0.122 n=20)
AppendFloat/32Fixed8Hard-32 22.94n ± 0% 22.88n ± 0% ~ (p=0.124 n=20)
AppendFloat/32Fixed9Hard-32 26.20n ± 0% 25.94n ± 1% -0.97% (p=0.000 n=20)
AppendFloat/64Fixed1-32 21.10n ± 0% 18.84n ± 0% -10.71% (p=0.000 n=20)
AppendFloat/64Fixed2-32 20.75n ± 0% 18.70n ± 0% -9.88% (p=0.000 n=20)
AppendFloat/64Fixed2.5-32 21.07n ± 0% 17.96n ± 0% -14.74% (p=0.000 n=20)
AppendFloat/64Fixed3-32 21.24n ± 0% 19.64n ± 0% -7.53% (p=0.000 n=20)
AppendFloat/64Fixed4-32 20.63n ± 0% 18.61n ± 0% -9.79% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-32 34.48n ± 0% 21.70n ± 0% -37.06% (p=0.000 n=20)
AppendFloat/64Fixed12-32 32.26n ± 0% 30.87n ± 1% -4.31% (p=0.000 n=20)
AppendFloat/64Fixed16-32 27.95n ± 0% 26.86n ± 0% -3.92% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-32 27.30n ± 0% 25.98n ± 1% -4.82% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-32 30.80n ± 0% 29.93n ± 0% -2.84% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-32 1.833µ ± 0% 1.831µ ± 0% ~ (p=0.663 n=20)
AppendFloat/64FixedF1-32 83.42n ± 1% 84.00n ± 1% ~ (p=0.003 n=20)
AppendFloat/64FixedF2-32 90.10n ± 0% 89.23n ± 1% -0.95% (p=0.001 n=20)
AppendFloat/64FixedF3-32 84.42n ± 1% 84.39n ± 0% ~ (p=0.878 n=20)
AppendFloat/Slowpath64-32 35.72n ± 0% 35.59n ± 0% ~ (p=0.007 n=20)
AppendFloat/SlowpathDenormal64-32 35.36n ± 0% 35.05n ± 0% -0.88% (p=0.000 n=20)
geomean 36.05n 34.69n -3.77%
host: linux-386
goarch: 386
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
│ 14b7e09f493 │ f9bf7fcb8e2 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-16 132.8n ± 0% 133.5n ± 0% +0.49% (p=0.001 n=20)
AppendFloat/Float-16 242.6n ± 0% 241.7n ± 0% -0.37% (p=0.000 n=20)
AppendFloat/Exp-16 252.2n ± 0% 249.1n ± 0% -1.27% (p=0.000 n=20)
AppendFloat/NegExp-16 253.6n ± 0% 247.7n ± 0% -2.33% (p=0.000 n=20)
AppendFloat/LongExp-16 260.9n ± 0% 257.1n ± 0% -1.48% (p=0.000 n=20)
AppendFloat/Big-16 293.7n ± 0% 285.2n ± 0% -2.89% (p=0.000 n=20)
AppendFloat/BinaryExp-16 89.63n ± 1% 89.06n ± 0% -0.64% (p=0.000 n=20)
AppendFloat/32Integer-16 132.6n ± 0% 133.2n ± 0% ~ (p=0.016 n=20)
AppendFloat/32ExactFraction-16 216.9n ± 0% 214.2n ± 0% -1.24% (p=0.000 n=20)
AppendFloat/32Point-16 205.0n ± 0% 202.2n ± 0% -1.37% (p=0.000 n=20)
AppendFloat/32Exp-16 250.2n ± 0% 235.9n ± 0% -5.72% (p=0.000 n=20)
AppendFloat/32NegExp-16 213.5n ± 0% 210.6n ± 0% -1.34% (p=0.000 n=20)
AppendFloat/32Shortest-16 198.3n ± 0% 197.8n ± 0% ~ (p=0.147 n=20)
AppendFloat/32Fixed8Hard-16 114.9n ± 1% 136.0n ± 1% +18.46% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-16 189.8n ± 0% 155.0n ± 1% -18.31% (p=0.000 n=20)
AppendFloat/64Fixed1-16 175.8n ± 0% 132.7n ± 0% -24.52% (p=0.000 n=20)
AppendFloat/64Fixed2-16 166.6n ± 0% 128.7n ± 0% -22.73% (p=0.000 n=20)
AppendFloat/64Fixed2.5-16 176.5n ± 0% 126.8n ± 0% -28.11% (p=0.000 n=20)
AppendFloat/64Fixed3-16 165.3n ± 0% 127.1n ± 0% -23.11% (p=0.000 n=20)
AppendFloat/64Fixed4-16 141.3n ± 0% 120.8n ± 1% -14.51% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-16 344.6n ± 0% 136.0n ± 0% -60.51% (p=0.000 n=20)
AppendFloat/64Fixed12-16 184.2n ± 0% 158.7n ± 0% -13.82% (p=0.000 n=20)
AppendFloat/64Fixed16-16 174.0n ± 0% 151.3n ± 0% -12.99% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-16 169.7n ± 0% 146.7n ± 0% -13.58% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-16 207.7n ± 0% 166.6n ± 0% -19.81% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-16 10.66µ ± 0% 10.63µ ± 0% ~ (p=0.030 n=20)
AppendFloat/64FixedF1-16 615.9n ± 0% 613.5n ± 0% -0.40% (p=0.000 n=20)
AppendFloat/64FixedF2-16 846.6n ± 0% 847.4n ± 0% ~ (p=0.551 n=20)
AppendFloat/64FixedF3-16 609.9n ± 0% 609.5n ± 0% ~ (p=0.213 n=20)
AppendFloat/Slowpath64-16 254.1n ± 0% 252.6n ± 1% ~ (p=0.048 n=20)
AppendFloat/SlowpathDenormal64-16 251.5n ± 0% 249.4n ± 0% -0.83% (p=0.000 n=20)
geomean 249.2n 225.4n -9.54%
host: s7:GOARCH=386
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 14b7e09f493 │ f9bf7fcb8e2 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-32 42.65n ± 0% 42.31n ± 0% -0.79% (p=0.000 n=20)
AppendFloat/Float-32 71.56n ± 0% 71.06n ± 0% -0.69% (p=0.000 n=20)
AppendFloat/Exp-32 75.61n ± 1% 74.85n ± 1% -1.01% (p=0.000 n=20)
AppendFloat/NegExp-32 74.36n ± 0% 74.30n ± 0% ~ (p=0.482 n=20)
AppendFloat/LongExp-32 75.82n ± 0% 75.73n ± 0% ~ (p=0.490 n=20)
AppendFloat/Big-32 85.10n ± 0% 82.61n ± 0% -2.93% (p=0.000 n=20)
AppendFloat/BinaryExp-32 33.02n ± 0% 32.48n ± 1% -1.64% (p=0.000 n=20)
AppendFloat/32Integer-32 41.54n ± 1% 41.27n ± 1% -0.66% (p=0.000 n=20)
AppendFloat/32ExactFraction-32 62.48n ± 0% 62.91n ± 0% +0.69% (p=0.000 n=20)
AppendFloat/32Point-32 60.17n ± 0% 60.65n ± 0% +0.80% (p=0.000 n=20)
AppendFloat/32Exp-32 73.34n ± 0% 68.99n ± 0% -5.92% (p=0.000 n=20)
AppendFloat/32NegExp-32 63.29n ± 0% 62.83n ± 0% -0.73% (p=0.000 n=20)
AppendFloat/32Shortest-32 58.97n ± 0% 59.07n ± 0% ~ (p=0.029 n=20)
AppendFloat/32Fixed8Hard-32 37.42n ± 0% 41.76n ± 1% +11.61% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-32 55.18n ± 0% 50.13n ± 1% -9.16% (p=0.000 n=20)
AppendFloat/64Fixed1-32 50.89n ± 1% 41.25n ± 0% -18.94% (p=0.000 n=20)
AppendFloat/64Fixed2-32 48.33n ± 1% 40.85n ± 1% -15.48% (p=0.000 n=20)
AppendFloat/64Fixed2.5-32 52.46n ± 0% 39.39n ± 0% -24.92% (p=0.000 n=20)
AppendFloat/64Fixed3-32 48.28n ± 1% 40.66n ± 0% -15.78% (p=0.000 n=20)
AppendFloat/64Fixed4-32 44.57n ± 0% 38.58n ± 0% -13.44% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-32 96.16n ± 0% 42.99n ± 1% -55.29% (p=0.000 n=20)
AppendFloat/64Fixed12-32 56.84n ± 0% 51.95n ± 1% -8.61% (p=0.000 n=20)
AppendFloat/64Fixed16-32 54.23n ± 0% 49.33n ± 0% -9.03% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-32 53.47n ± 0% 48.67n ± 0% -8.99% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-32 61.76n ± 0% 55.42n ± 1% -10.27% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-32 3.998µ ± 1% 4.001µ ± 0% ~ (p=0.449 n=20)
AppendFloat/64FixedF1-32 161.8n ± 0% 166.2n ± 1% +2.72% (p=0.000 n=20)
AppendFloat/64FixedF2-32 223.4n ± 2% 226.2n ± 1% +1.25% (p=0.000 n=20)
AppendFloat/64FixedF3-32 159.6n ± 0% 161.6n ± 1% +1.22% (p=0.000 n=20)
AppendFloat/Slowpath64-32 76.69n ± 0% 75.03n ± 0% -2.16% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-32 75.02n ± 0% 74.36n ± 1% ~ (p=0.003 n=20)
geomean 74.66n 69.39n -7.06%
Change-Id: I9db46471a93bd2aab3c2796e563d154cb531d4cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/717182
Reviewed-by: Alan Donovan <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Russ Cox <[email protected]>
We re-slice the data being processed at the stat of each loop. If the var that we use to calculate where to re-slice is < 0 or > the length of the remaining data, return instead of attempting to re-slice. Change-Id: I1d6c2b6c596feedeea8feeaace370ea73ba02c4c Reviewed-on: https://go-review.googlesource.com/c/go/+/715260 LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Roland Shoemaker <[email protected]> Reviewed-by: Damien Neil <[email protected]>
This lets us remove useAvg and useHmul from the division rules.
The compiler is simpler and the generated code is faster.
goos: wasip1
goarch: wasm
pkg: internal/strconv
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal 192.8n ± 1% 194.6n ± 0% +0.91% (p=0.000 n=10)
AppendFloat/Float 328.6n ± 0% 279.6n ± 0% -14.93% (p=0.000 n=10)
AppendFloat/Exp 335.6n ± 1% 289.2n ± 1% -13.80% (p=0.000 n=10)
AppendFloat/NegExp 336.0n ± 0% 289.1n ± 1% -13.97% (p=0.000 n=10)
AppendFloat/LongExp 332.4n ± 0% 285.2n ± 1% -14.20% (p=0.000 n=10)
AppendFloat/Big 348.2n ± 0% 300.1n ± 0% -13.83% (p=0.000 n=10)
AppendFloat/BinaryExp 137.4n ± 0% 138.2n ± 0% +0.55% (p=0.001 n=10)
AppendFloat/32Integer 193.3n ± 1% 196.5n ± 0% +1.66% (p=0.000 n=10)
AppendFloat/32ExactFraction 283.3n ± 0% 268.9n ± 1% -5.08% (p=0.000 n=10)
AppendFloat/32Point 279.9n ± 0% 266.5n ± 0% -4.80% (p=0.000 n=10)
AppendFloat/32Exp 300.1n ± 0% 288.3n ± 1% -3.90% (p=0.000 n=10)
AppendFloat/32NegExp 288.2n ± 1% 277.9n ± 1% -3.59% (p=0.000 n=10)
AppendFloat/32Shortest 261.7n ± 0% 250.2n ± 0% -4.39% (p=0.000 n=10)
AppendFloat/32Fixed8Hard 173.3n ± 1% 158.9n ± 1% -8.31% (p=0.000 n=10)
AppendFloat/32Fixed9Hard 180.0n ± 0% 167.9n ± 2% -6.70% (p=0.000 n=10)
AppendFloat/64Fixed1 167.1n ± 0% 149.6n ± 1% -10.50% (p=0.000 n=10)
AppendFloat/64Fixed2 162.4n ± 1% 146.5n ± 0% -9.73% (p=0.000 n=10)
AppendFloat/64Fixed2.5 165.5n ± 0% 149.4n ± 1% -9.70% (p=0.000 n=10)
AppendFloat/64Fixed3 166.4n ± 1% 150.2n ± 0% -9.74% (p=0.000 n=10)
AppendFloat/64Fixed4 163.7n ± 0% 149.6n ± 1% -8.62% (p=0.000 n=10)
AppendFloat/64Fixed5Hard 182.8n ± 1% 167.1n ± 1% -8.61% (p=0.000 n=10)
AppendFloat/64Fixed12 222.2n ± 0% 208.8n ± 0% -6.05% (p=0.000 n=10)
AppendFloat/64Fixed16 197.6n ± 1% 181.7n ± 0% -8.02% (p=0.000 n=10)
AppendFloat/64Fixed12Hard 194.5n ± 0% 181.0n ± 0% -6.99% (p=0.000 n=10)
AppendFloat/64Fixed17Hard 205.1n ± 1% 191.9n ± 0% -6.44% (p=0.000 n=10)
AppendFloat/64Fixed18Hard 6.269µ ± 0% 6.643µ ± 0% +5.97% (p=0.000 n=10)
AppendFloat/64FixedF1 211.7n ± 1% 197.0n ± 0% -6.95% (p=0.000 n=10)
AppendFloat/64FixedF2 189.4n ± 0% 174.2n ± 0% -8.08% (p=0.000 n=10)
AppendFloat/64FixedF3 169.0n ± 0% 154.9n ± 0% -8.32% (p=0.000 n=10)
AppendFloat/Slowpath64 321.2n ± 0% 274.2n ± 1% -14.63% (p=0.000 n=10)
AppendFloat/SlowpathDenormal64 307.4n ± 1% 261.2n ± 0% -15.03% (p=0.000 n=10)
AppendInt 3.367µ ± 1% 3.376µ ± 0% ~ (p=0.517 n=10)
AppendUint 675.5n ± 0% 676.9n ± 0% ~ (p=0.196 n=10)
AppendIntSmall 28.13n ± 1% 28.17n ± 0% +0.14% (p=0.015 n=10)
AppendUintVarlen/digits=1 20.70n ± 0% 20.51n ± 1% -0.89% (p=0.018 n=10)
AppendUintVarlen/digits=2 20.43n ± 0% 20.27n ± 0% -0.81% (p=0.001 n=10)
AppendUintVarlen/digits=3 38.48n ± 0% 37.93n ± 0% -1.43% (p=0.000 n=10)
AppendUintVarlen/digits=4 41.10n ± 0% 38.78n ± 1% -5.62% (p=0.000 n=10)
AppendUintVarlen/digits=5 42.25n ± 1% 42.11n ± 0% -0.32% (p=0.041 n=10)
AppendUintVarlen/digits=6 45.40n ± 1% 43.14n ± 0% -4.98% (p=0.000 n=10)
AppendUintVarlen/digits=7 46.81n ± 1% 46.03n ± 0% -1.66% (p=0.000 n=10)
AppendUintVarlen/digits=8 48.88n ± 1% 46.59n ± 1% -4.68% (p=0.000 n=10)
AppendUintVarlen/digits=9 49.94n ± 2% 49.41n ± 1% -1.06% (p=0.000 n=10)
AppendUintVarlen/digits=10 57.28n ± 1% 56.92n ± 1% -0.62% (p=0.045 n=10)
AppendUintVarlen/digits=11 60.09n ± 1% 58.11n ± 2% -3.30% (p=0.000 n=10)
AppendUintVarlen/digits=12 62.22n ± 0% 61.85n ± 0% -0.59% (p=0.000 n=10)
AppendUintVarlen/digits=13 64.94n ± 0% 62.92n ± 0% -3.10% (p=0.000 n=10)
AppendUintVarlen/digits=14 65.42n ± 1% 65.19n ± 1% -0.34% (p=0.005 n=10)
AppendUintVarlen/digits=15 68.17n ± 0% 66.13n ± 0% -2.99% (p=0.000 n=10)
AppendUintVarlen/digits=16 70.21n ± 1% 70.09n ± 1% ~ (p=0.517 n=10)
AppendUintVarlen/digits=17 72.93n ± 0% 70.49n ± 0% -3.34% (p=0.000 n=10)
AppendUintVarlen/digits=18 73.01n ± 0% 72.75n ± 0% -0.35% (p=0.000 n=10)
AppendUintVarlen/digits=19 79.27n ± 1% 79.49n ± 1% ~ (p=0.671 n=10)
AppendUintVarlen/digits=20 82.18n ± 0% 80.43n ± 1% -2.14% (p=0.000 n=10)
geomean 143.4n 136.0n -5.20%
Change-Id: I8245814a0259ad13cf9225f57db8e9fe3d2e4267
Reviewed-on: https://go-review.googlesource.com/c/go/+/717407
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
Everyone writes papers about fast shortest-output formatting.
Eventually we also sped up fixed-length formatting %e and %g.
But we've neglected %f, which falls back to the slow general code
even for relatively trivial things like %.2f on 1.23.
This CL uses the fast path fixedFtoa for %f when possible by
estimating the number of digits needed.
benchmark \ host linux-arm64 local linux-amd64 s7 linux-386 s7:GOARCH=386
vs base vs base vs base vs base vs base vs base
AppendFloat/Decimal ~ ~ ~ +0.30% ~ ~
AppendFloat/Float -0.45% ~ -2.20% ~ -2.19% ~
AppendFloat/Exp +0.12% ~ +4.11% ~ ~ ~
AppendFloat/NegExp +0.53% ~ ~ ~ ~ ~
AppendFloat/LongExp +0.41% -1.42% +4.50% ~ ~ ~
AppendFloat/Big ~ -1.25% +3.69% ~ ~ ~
AppendFloat/BinaryExp +0.38% +1.68% ~ ~ +2.65% +0.97%
AppendFloat/32Integer ~ ~ ~ ~ ~ ~
AppendFloat/32ExactFraction ~ ~ -2.61% ~ ~ ~
AppendFloat/32Point -0.41% ~ -2.65% ~ ~ ~
AppendFloat/32Exp ~ ~ +5.35% ~ +1.44% +0.39%
AppendFloat/32NegExp +0.30% ~ +2.31% ~ ~ +0.82%
AppendFloat/32Shortest +0.28% -0.85% ~ ~ -3.20% ~
AppendFloat/32Fixed8Hard -0.29% ~ ~ -1.75% ~ +4.30%
AppendFloat/32Fixed9Hard ~ ~ ~ ~ ~ +1.52%
AppendFloat/64Fixed1 +0.61% -2.03% ~ ~ ~ +4.36%
AppendFloat/64Fixed2 ~ -3.43% ~ ~ ~ +1.03%
AppendFloat/64Fixed2.5 +0.57% -2.23% ~ ~ ~ +2.66%
AppendFloat/64Fixed3 ~ -1.64% ~ +0.31% +2.32% +2.10%
AppendFloat/64Fixed4 +0.15% -2.11% ~ ~ +1.48% +1.58%
AppendFloat/64Fixed5Hard +0.45% ~ +1.58% ~ ~ +1.73%
AppendFloat/64Fixed12 -0.16% ~ +1.63% -1.23% +3.93% +2.42%
AppendFloat/64Fixed16 -0.33% -0.49% ~ ~ +3.67% +2.33%
AppendFloat/64Fixed12Hard -0.58% ~ ~ ~ +4.98% +0.62%
AppendFloat/64Fixed17Hard +0.27% -0.94% ~ ~ +2.07% +1.79%
AppendFloat/64Fixed18Hard ~ ~ ~ ~ ~ ~
AppendFloat/64FixedF1 -69.59% -76.08% -70.94% -68.26% -75.27% -69.88%
AppendFloat/64FixedF2 -76.28% -81.82% -76.95% -77.34% -83.53% -80.04%
AppendFloat/64FixedF3 -77.30% -84.51% -77.82% -77.81% -78.77% -73.69%
AppendFloat/Slowpath64 ~ -1.30% +1.64% ~ -2.66% -0.44%
AppendFloat/SlowpathDenormal64 +0.11% -1.69% ~ ~ -2.90% ~
host: linux-arm64
goos: linux
goarch: arm64
pkg: internal/strconv
cpu: unknown
│ 1cc918cc725 │ b66c604f523 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-8 60.22n ± 0% 60.21n ± 0% ~ (p=0.416 n=20)
AppendFloat/Float-8 88.93n ± 0% 88.53n ± 0% -0.45% (p=0.000 n=20)
AppendFloat/Exp-8 93.09n ± 0% 93.20n ± 0% +0.12% (p=0.000 n=20)
AppendFloat/NegExp-8 93.06n ± 0% 93.56n ± 0% +0.53% (p=0.000 n=20)
AppendFloat/LongExp-8 99.79n ± 0% 100.20n ± 0% +0.41% (p=0.000 n=20)
AppendFloat/Big-8 103.9n ± 0% 104.0n ± 0% ~ (p=0.004 n=20)
AppendFloat/BinaryExp-8 47.34n ± 0% 47.52n ± 0% +0.38% (p=0.000 n=20)
AppendFloat/32Integer-8 60.43n ± 0% 60.40n ± 0% ~ (p=0.006 n=20)
AppendFloat/32ExactFraction-8 86.21n ± 0% 86.24n ± 0% ~ (p=0.634 n=20)
AppendFloat/32Point-8 83.20n ± 0% 82.87n ± 0% -0.41% (p=0.000 n=20)
AppendFloat/32Exp-8 89.43n ± 0% 89.45n ± 0% ~ (p=0.193 n=20)
AppendFloat/32NegExp-8 87.31n ± 0% 87.58n ± 0% +0.30% (p=0.000 n=20)
AppendFloat/32Shortest-8 76.28n ± 0% 76.49n ± 0% +0.28% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-8 52.44n ± 0% 52.29n ± 0% -0.29% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-8 60.57n ± 0% 60.54n ± 0% ~ (p=0.285 n=20)
AppendFloat/64Fixed1-8 46.27n ± 0% 46.55n ± 0% +0.61% (p=0.000 n=20)
AppendFloat/64Fixed2-8 46.77n ± 0% 46.80n ± 0% ~ (p=0.060 n=20)
AppendFloat/64Fixed2.5-8 43.70n ± 0% 43.95n ± 0% +0.57% (p=0.000 n=20)
AppendFloat/64Fixed3-8 47.22n ± 0% 47.19n ± 0% ~ (p=0.008 n=20)
AppendFloat/64Fixed4-8 44.07n ± 0% 44.13n ± 0% +0.15% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-8 51.81n ± 0% 52.04n ± 0% +0.45% (p=0.000 n=20)
AppendFloat/64Fixed12-8 78.41n ± 0% 78.29n ± 0% -0.16% (p=0.000 n=20)
AppendFloat/64Fixed16-8 65.14n ± 0% 64.93n ± 0% -0.33% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-8 62.12n ± 0% 61.76n ± 0% -0.58% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-8 73.93n ± 0% 74.13n ± 0% +0.27% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-8 4.285µ ± 0% 4.283µ ± 0% ~ (p=0.039 n=20)
AppendFloat/64FixedF1-8 216.10n ± 0% 65.71n ± 0% -69.59% (p=0.000 n=20)
AppendFloat/64FixedF2-8 227.70n ± 0% 54.02n ± 0% -76.28% (p=0.000 n=20)
AppendFloat/64FixedF3-8 208.20n ± 1% 47.25n ± 0% -77.30% (p=0.000 n=20)
AppendFloat/Slowpath64-8 97.40n ± 0% 97.45n ± 0% ~ (p=0.018 n=20)
AppendFloat/SlowpathDenormal64-8 94.75n ± 0% 94.86n ± 0% +0.11% (p=0.000 n=20)
geomean 87.86n 76.99n -12.37%
host: local
goos: darwin
cpu: Apple M3 Pro
│ 1cc918cc725 │ b66c604f523 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-12 21.05n ± 1% 20.91n ± 1% ~ (p=0.051 n=20)
AppendFloat/Float-12 32.13n ± 0% 32.04n ± 1% ~ (p=0.457 n=20)
AppendFloat/Exp-12 31.84n ± 0% 31.72n ± 0% ~ (p=0.151 n=20)
AppendFloat/NegExp-12 31.78n ± 1% 31.79n ± 1% ~ (p=0.867 n=20)
AppendFloat/LongExp-12 33.70n ± 0% 33.22n ± 1% -1.42% (p=0.000 n=20)
AppendFloat/Big-12 35.52n ± 1% 35.07n ± 1% -1.25% (p=0.000 n=20)
AppendFloat/BinaryExp-12 19.32n ± 1% 19.64n ± 0% +1.68% (p=0.000 n=20)
AppendFloat/32Integer-12 21.32n ± 0% 21.18n ± 1% ~ (p=0.025 n=20)
AppendFloat/32ExactFraction-12 30.88n ± 0% 31.07n ± 0% ~ (p=0.087 n=20)
AppendFloat/32Point-12 30.88n ± 0% 30.95n ± 1% ~ (p=0.250 n=20)
AppendFloat/32Exp-12 31.57n ± 0% 31.67n ± 2% ~ (p=0.126 n=20)
AppendFloat/32NegExp-12 30.50n ± 1% 30.76n ± 1% ~ (p=0.087 n=20)
AppendFloat/32Shortest-12 27.14n ± 0% 26.91n ± 1% -0.85% (p=0.001 n=20)
AppendFloat/32Fixed8Hard-12 17.11n ± 0% 17.08n ± 0% ~ (p=0.027 n=20)
AppendFloat/32Fixed9Hard-12 19.16n ± 1% 19.31n ± 1% ~ (p=0.062 n=20)
AppendFloat/64Fixed1-12 15.50n ± 0% 15.18n ± 1% -2.03% (p=0.000 n=20)
AppendFloat/64Fixed2-12 15.46n ± 0% 14.93n ± 0% -3.43% (p=0.000 n=20)
AppendFloat/64Fixed2.5-12 15.28n ± 0% 14.94n ± 1% -2.23% (p=0.000 n=20)
AppendFloat/64Fixed3-12 15.58n ± 0% 15.32n ± 1% -1.64% (p=0.000 n=20)
AppendFloat/64Fixed4-12 15.39n ± 0% 15.06n ± 1% -2.11% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-12 18.00n ± 0% 18.07n ± 1% ~ (p=0.011 n=20)
AppendFloat/64Fixed12-12 27.97n ± 8% 29.05n ± 3% ~ (p=0.107 n=20)
AppendFloat/64Fixed16-12 21.48n ± 0% 21.38n ± 0% -0.49% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-12 20.79n ± 1% 21.05n ± 2% ~ (p=0.784 n=20)
AppendFloat/64Fixed17Hard-12 27.21n ± 1% 26.95n ± 1% -0.94% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-12 2.166µ ± 1% 2.182µ ± 1% ~ (p=0.031 n=20)
AppendFloat/64FixedF1-12 103.35n ± 0% 24.72n ± 0% -76.08% (p=0.000 n=20)
AppendFloat/64FixedF2-12 114.30n ± 1% 20.78n ± 0% -81.82% (p=0.000 n=20)
AppendFloat/64FixedF3-12 107.10n ± 0% 16.58n ± 0% -84.51% (p=0.000 n=20)
AppendFloat/Slowpath64-12 32.01n ± 0% 31.59n ± 0% -1.30% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-12 30.21n ± 0% 29.70n ± 0% -1.69% (p=0.000 n=20)
geomean 31.84n 27.00n -15.20%
host: linux-amd64
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
│ 1cc918cc725 │ b66c604f523 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-16 63.62n ± 1% 64.05n ± 1% ~ (p=0.753 n=20)
AppendFloat/Float-16 97.12n ± 1% 94.98n ± 1% -2.20% (p=0.000 n=20)
AppendFloat/Exp-16 98.12n ± 1% 102.15n ± 1% +4.11% (p=0.000 n=20)
AppendFloat/NegExp-16 101.1n ± 1% 101.5n ± 1% ~ (p=0.089 n=20)
AppendFloat/LongExp-16 104.5n ± 1% 109.2n ± 1% +4.50% (p=0.000 n=20)
AppendFloat/Big-16 108.5n ± 0% 112.5n ± 1% +3.69% (p=0.000 n=20)
AppendFloat/BinaryExp-16 47.68n ± 1% 47.44n ± 1% ~ (p=0.143 n=20)
AppendFloat/32Integer-16 63.77n ± 2% 63.45n ± 1% ~ (p=0.015 n=20)
AppendFloat/32ExactFraction-16 97.69n ± 1% 95.14n ± 1% -2.61% (p=0.000 n=20)
AppendFloat/32Point-16 92.17n ± 1% 89.72n ± 1% -2.65% (p=0.000 n=20)
AppendFloat/32Exp-16 95.63n ± 1% 100.75n ± 1% +5.35% (p=0.000 n=20)
AppendFloat/32NegExp-16 94.53n ± 1% 96.72n ± 0% +2.31% (p=0.000 n=20)
AppendFloat/32Shortest-16 86.43n ± 0% 86.95n ± 0% ~ (p=0.010 n=20)
AppendFloat/32Fixed8Hard-16 57.75n ± 1% 57.95n ± 1% ~ (p=0.098 n=20)
AppendFloat/32Fixed9Hard-16 66.56n ± 2% 66.97n ± 1% ~ (p=0.380 n=20)
AppendFloat/64Fixed1-16 51.02n ± 1% 50.99n ± 1% ~ (p=0.473 n=20)
AppendFloat/64Fixed2-16 50.94n ± 1% 51.01n ± 1% ~ (p=0.136 n=20)
AppendFloat/64Fixed2.5-16 49.27n ± 1% 49.37n ± 1% ~ (p=0.218 n=20)
AppendFloat/64Fixed3-16 51.85n ± 1% 52.55n ± 1% ~ (p=0.045 n=20)
AppendFloat/64Fixed4-16 50.30n ± 1% 50.43n ± 1% ~ (p=0.794 n=20)
AppendFloat/64Fixed5Hard-16 57.57n ± 1% 58.48n ± 1% +1.58% (p=0.000 n=20)
AppendFloat/64Fixed12-16 82.67n ± 1% 84.02n ± 1% +1.63% (p=0.000 n=20)
AppendFloat/64Fixed16-16 71.10n ± 1% 70.94n ± 1% ~ (p=0.569 n=20)
AppendFloat/64Fixed12Hard-16 68.36n ± 1% 68.64n ± 1% ~ (p=0.155 n=20)
AppendFloat/64Fixed17Hard-16 80.16n ± 1% 80.10n ± 1% ~ (p=0.836 n=20)
AppendFloat/64Fixed18Hard-16 4.916µ ± 1% 4.919µ ± 1% ~ (p=0.507 n=20)
AppendFloat/64FixedF1-16 239.75n ± 1% 69.67n ± 1% -70.94% (p=0.000 n=20)
AppendFloat/64FixedF2-16 252.50n ± 1% 58.20n ± 1% -76.95% (p=0.000 n=20)
AppendFloat/64FixedF3-16 238.00n ± 1% 52.79n ± 1% -77.82% (p=0.000 n=20)
AppendFloat/Slowpath64-16 100.4n ± 1% 102.0n ± 1% +1.64% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-16 97.92n ± 1% 98.01n ± 1% ~ (p=0.304 n=20)
geomean 95.58n 84.00n -12.12%
host: s7
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 1cc918cc725 │ b66c604f523 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-32 22.00n ± 0% 22.06n ± 0% +0.30% (p=0.001 n=20)
AppendFloat/Float-32 34.83n ± 0% 34.76n ± 0% ~ (p=0.159 n=20)
AppendFloat/Exp-32 34.91n ± 0% 34.89n ± 0% ~ (p=0.188 n=20)
AppendFloat/NegExp-32 35.24n ± 0% 35.32n ± 0% ~ (p=0.026 n=20)
AppendFloat/LongExp-32 37.02n ± 0% 37.02n ± 0% ~ (p=0.317 n=20)
AppendFloat/Big-32 38.51n ± 0% 38.43n ± 0% ~ (p=0.060 n=20)
AppendFloat/BinaryExp-32 17.57n ± 0% 17.59n ± 0% ~ (p=0.278 n=20)
AppendFloat/32Integer-32 22.06n ± 0% 22.09n ± 0% ~ (p=0.762 n=20)
AppendFloat/32ExactFraction-32 32.91n ± 0% 33.00n ± 0% ~ (p=0.055 n=20)
AppendFloat/32Point-32 33.24n ± 0% 33.18n ± 0% ~ (p=0.068 n=20)
AppendFloat/32Exp-32 34.50n ± 0% 34.55n ± 0% ~ (p=0.030 n=20)
AppendFloat/32NegExp-32 33.53n ± 0% 33.61n ± 0% ~ (p=0.045 n=20)
AppendFloat/32Shortest-32 30.10n ± 0% 30.10n ± 0% ~ (p=0.931 n=20)
AppendFloat/32Fixed8Hard-32 22.89n ± 0% 22.49n ± 0% -1.75% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-32 25.82n ± 0% 25.75n ± 1% ~ (p=0.143 n=20)
AppendFloat/64Fixed1-32 18.80n ± 0% 18.70n ± 0% ~ (p=0.004 n=20)
AppendFloat/64Fixed2-32 18.64n ± 1% 18.54n ± 0% ~ (p=0.001 n=20)
AppendFloat/64Fixed2.5-32 17.89n ± 0% 17.81n ± 0% ~ (p=0.001 n=20)
AppendFloat/64Fixed3-32 19.62n ± 0% 19.68n ± 0% +0.31% (p=0.000 n=20)
AppendFloat/64Fixed4-32 18.64n ± 0% 18.82n ± 0% ~ (p=0.010 n=20)
AppendFloat/64Fixed5Hard-32 21.62n ± 0% 21.57n ± 0% ~ (p=0.058 n=20)
AppendFloat/64Fixed12-32 30.98n ± 1% 30.61n ± 1% -1.23% (p=0.000 n=20)
AppendFloat/64Fixed16-32 26.89n ± 0% 27.08n ± 1% ~ (p=0.003 n=20)
AppendFloat/64Fixed12Hard-32 26.03n ± 0% 26.20n ± 1% ~ (p=0.344 n=20)
AppendFloat/64Fixed17Hard-32 30.03n ± 1% 29.72n ± 1% ~ (p=0.001 n=20)
AppendFloat/64Fixed18Hard-32 1.824µ ± 0% 1.825µ ± 1% ~ (p=0.567 n=20)
AppendFloat/64FixedF1-32 83.58n ± 1% 26.52n ± 0% -68.26% (p=0.000 n=20)
AppendFloat/64FixedF2-32 89.68n ± 1% 20.32n ± 1% -77.34% (p=0.000 n=20)
AppendFloat/64FixedF3-32 84.84n ± 0% 18.82n ± 0% -77.81% (p=0.000 n=20)
AppendFloat/Slowpath64-32 35.55n ± 0% 35.61n ± 0% ~ (p=0.394 n=20)
AppendFloat/SlowpathDenormal64-32 35.03n ± 0% 35.02n ± 0% ~ (p=0.733 n=20)
geomean 34.67n 30.31n -12.56%
host: linux-386
goarch: 386
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
│ 1cc918cc725 │ b66c604f523 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-16 133.6n ± 1% 130.5n ± 1% ~ (p=0.002 n=20)
AppendFloat/Float-16 242.3n ± 1% 237.0n ± 1% -2.19% (p=0.000 n=20)
AppendFloat/Exp-16 249.1n ± 3% 252.5n ± 1% ~ (p=0.005 n=20)
AppendFloat/NegExp-16 248.7n ± 3% 253.8n ± 2% ~ (p=0.006 n=20)
AppendFloat/LongExp-16 258.4n ± 2% 253.0n ± 6% ~ (p=0.185 n=20)
AppendFloat/Big-16 285.6n ± 1% 279.2n ± 5% ~ (p=0.012 n=20)
AppendFloat/BinaryExp-16 89.47n ± 1% 91.85n ± 2% +2.65% (p=0.000 n=20)
AppendFloat/32Integer-16 133.5n ± 1% 129.9n ± 1% ~ (p=0.004 n=20)
AppendFloat/32ExactFraction-16 213.7n ± 1% 212.2n ± 2% ~ (p=0.071 n=20)
AppendFloat/32Point-16 202.0n ± 0% 200.4n ± 1% ~ (p=0.223 n=20)
AppendFloat/32Exp-16 236.4n ± 1% 239.8n ± 1% +1.44% (p=0.000 n=20)
AppendFloat/32NegExp-16 212.5n ± 1% 211.9n ± 1% ~ (p=0.995 n=20)
AppendFloat/32Shortest-16 200.3n ± 1% 193.9n ± 1% -3.20% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-16 136.0n ± 1% 133.2n ± 4% ~ (p=0.323 n=20)
AppendFloat/32Fixed9Hard-16 155.6n ± 1% 156.7n ± 2% ~ (p=0.022 n=20)
AppendFloat/64Fixed1-16 132.8n ± 1% 133.0n ± 3% ~ (p=0.199 n=20)
AppendFloat/64Fixed2-16 128.9n ± 1% 129.7n ± 3% ~ (p=0.018 n=20)
AppendFloat/64Fixed2.5-16 127.0n ± 1% 126.5n ± 3% ~ (p=0.825 n=20)
AppendFloat/64Fixed3-16 127.3n ± 1% 130.3n ± 4% +2.32% (p=0.001 n=20)
AppendFloat/64Fixed4-16 121.4n ± 1% 123.2n ± 2% +1.48% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-16 136.2n ± 1% 136.2n ± 3% ~ (p=0.256 n=20)
AppendFloat/64Fixed12-16 159.0n ± 1% 165.2n ± 2% +3.93% (p=0.000 n=20)
AppendFloat/64Fixed16-16 151.4n ± 0% 156.9n ± 1% +3.67% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-16 146.5n ± 1% 153.8n ± 1% +4.98% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-16 166.3n ± 1% 169.8n ± 1% +2.07% (p=0.001 n=20)
AppendFloat/64Fixed18Hard-16 10.59µ ± 2% 10.60µ ± 0% ~ (p=0.499 n=20)
AppendFloat/64FixedF1-16 614.4n ± 1% 152.0n ± 1% -75.27% (p=0.000 n=20)
AppendFloat/64FixedF2-16 845.0n ± 0% 139.1n ± 1% -83.53% (p=0.000 n=20)
AppendFloat/64FixedF3-16 608.8n ± 1% 129.3n ± 1% -78.77% (p=0.000 n=20)
AppendFloat/Slowpath64-16 251.7n ± 1% 245.0n ± 1% -2.66% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-16 248.4n ± 1% 241.2n ± 1% -2.90% (p=0.000 n=20)
geomean 225.7n 193.8n -14.14%
host: s7:GOARCH=386
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 1cc918cc725 │ b66c604f523 │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal-32 41.88n ± 0% 42.02n ± 1% ~ (p=0.004 n=20)
AppendFloat/Float-32 71.05n ± 0% 71.24n ± 0% ~ (p=0.044 n=20)
AppendFloat/Exp-32 74.91n ± 1% 74.80n ± 0% ~ (p=0.433 n=20)
AppendFloat/NegExp-32 74.10n ± 0% 74.20n ± 0% ~ (p=0.867 n=20)
AppendFloat/LongExp-32 75.73n ± 0% 75.84n ± 0% ~ (p=0.147 n=20)
AppendFloat/Big-32 82.47n ± 0% 82.36n ± 0% ~ (p=0.490 n=20)
AppendFloat/BinaryExp-32 32.31n ± 1% 32.62n ± 0% +0.97% (p=0.000 n=20)
AppendFloat/32Integer-32 41.38n ± 1% 41.40n ± 1% ~ (p=0.106 n=20)
AppendFloat/32ExactFraction-32 62.72n ± 0% 62.92n ± 0% ~ (p=0.009 n=20)
AppendFloat/32Point-32 60.36n ± 0% 60.33n ± 0% ~ (p=0.050 n=20)
AppendFloat/32Exp-32 68.97n ± 0% 69.24n ± 0% +0.39% (p=0.000 n=20)
AppendFloat/32NegExp-32 62.63n ± 0% 63.15n ± 0% +0.82% (p=0.000 n=20)
AppendFloat/32Shortest-32 58.76n ± 0% 58.87n ± 0% ~ (p=0.053 n=20)
AppendFloat/32Fixed8Hard-32 41.67n ± 1% 43.46n ± 1% +4.30% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-32 49.78n ± 1% 50.53n ± 1% +1.52% (p=0.000 n=20)
AppendFloat/64Fixed1-32 41.15n ± 0% 42.95n ± 1% +4.36% (p=0.000 n=20)
AppendFloat/64Fixed2-32 40.83n ± 1% 41.24n ± 1% +1.03% (p=0.000 n=20)
AppendFloat/64Fixed2.5-32 39.42n ± 0% 40.47n ± 1% +2.66% (p=0.000 n=20)
AppendFloat/64Fixed3-32 40.73n ± 1% 41.58n ± 1% +2.10% (p=0.000 n=20)
AppendFloat/64Fixed4-32 38.68n ± 0% 39.29n ± 0% +1.58% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-32 42.88n ± 1% 43.62n ± 1% +1.73% (p=0.000 n=20)
AppendFloat/64Fixed12-32 51.67n ± 1% 52.92n ± 1% +2.42% (p=0.000 n=20)
AppendFloat/64Fixed16-32 49.15n ± 0% 50.30n ± 0% +2.33% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-32 48.51n ± 0% 48.81n ± 0% +0.62% (p=0.001 n=20)
AppendFloat/64Fixed17Hard-32 54.62n ± 1% 55.60n ± 1% +1.79% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-32 3.979µ ± 1% 3.980µ ± 1% ~ (p=0.569 n=20)
AppendFloat/64FixedF1-32 165.90n ± 1% 49.97n ± 0% -69.88% (p=0.000 n=20)
AppendFloat/64FixedF2-32 225.50n ± 0% 45.02n ± 1% -80.04% (p=0.000 n=20)
AppendFloat/64FixedF3-32 160.20n ± 1% 42.16n ± 1% -73.69% (p=0.000 n=20)
AppendFloat/Slowpath64-32 75.55n ± 0% 75.23n ± 0% -0.44% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-32 74.84n ± 0% 75.00n ± 0% ~ (p=0.268 n=20)
geomean 69.22n 61.13n -11.69%
Change-Id: I722d2e2621e74e32cb3fc34a2df5b16cc595715c
Reviewed-on: https://go-review.googlesource.com/c/go/+/717183
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Alan Donovan <[email protected]>
Now that the bootstrap compiler is 1.24, it's no longer needed. Change-Id: I9b3d6b7176af10fbc580173d50130120b542e7f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/717060 Reviewed-by: David Chase <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Propagate "unread" across OpMoves. If the addr of this auto is only used
by an OpMove as its source arg, and the OpMove's target arg is the addr
of another auto. If the 2nd auto can be eliminated, this one can also be
eliminated.
This CL eliminates unnecessary memory copies and makes the frame smaller
in the following code snippet:
func contains(m map[string][16]int, k string) bool {
_, ok := m[k]
return ok
}
These are the benchmark results followed by the benchmark code:
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Map1Access2Ok-8 9.582n ± 2% 9.226n ± 0% -3.72% (p=0.000 n=20)
Map2Access2Ok-8 13.79n ± 1% 10.24n ± 1% -25.77% (p=0.000 n=20)
Map3Access2Ok-8 68.68n ± 1% 12.65n ± 1% -81.58% (p=0.000 n=20)
package main_test
import "testing"
var (
m1 = map[int]int{}
m2 = map[int][16]int{}
m3 = map[int][256]int{}
)
func init() {
for i := range 1000 {
m1[i] = i
m2[i] = [16]int{15:i}
m3[i] = [256]int{255:i}
}
}
func BenchmarkMap1Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m1[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
func BenchmarkMap2Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m2[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
func BenchmarkMap3Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m3[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
Fixes golang#75398
Change-Id: If75e9caaa50d460efc31a94565b9ba28c8158771
Reviewed-on: https://go-review.googlesource.com/c/go/+/702875
Reviewed-by: Keith Randall <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Keith Randall <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
Change-Id: I9c5a8f8a031e368bda312c830dc266f5986e8b1a GitHub-Last-Rev: 23145e8 GitHub-Pull-Request: golang#76160 Reviewed-on: https://go-review.googlesource.com/c/go/+/717341 Reviewed-by: Keith Randall <[email protected]> Auto-Submit: Keith Randall <[email protected]> Reviewed-by: Keith Randall <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
We don't need to check that the bit patterns of the constants match, it is sufficient to just check the constant is equal to the given value. While we're here also change the FCLASSD rules to use a bit pattern for the mask. I think this improves readability, particularly as more uses of FCLASSD get added (e.g. CL 717560). These changes should not affect codegen. Change-Id: I92a6338dc71e6a71e04306f67d7d86016c6e9c47 Reviewed-on: https://go-review.googlesource.com/c/go/+/717580 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Keith Randall <[email protected]> Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Keith Randall <[email protected]>
ReOpenFile is documented to return INVALID_HANDLE_VALUE on error, but the previous definition was checking for 0 instead. Change-Id: Idec5e75e40b9f6c409e068d63a9b606781e80a46 Reviewed-on: https://go-review.googlesource.com/c/go/+/717320 Auto-Submit: Quim Muntal <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Damien Neil <[email protected]> Reviewed-by: Alex Brainman <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
Instead of iterating until dataflow stabilization, fill in values known to be live at the beginning of loop headers. This reduces the number of passes over the CFG from the depth of the loopnest to just 2. In a test instrumented version of this change, run against cmd/compile/internal/ssa, it brought the time spent in liveness analysis down to 150.52ms from 225.49ms on my machine. Change-Id: Ic72762eedfd1f10b1ba74c430ed62ab4ebd3ec5c Reviewed-on: https://go-review.googlesource.com/c/go/+/695255 Reviewed-by: Keith Randall <[email protected]> Auto-Submit: Keith Randall <[email protected]> Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Keith Randall <[email protected]>
Change-Id: Ie297a19a59362e0f32eae20e511e298a0a87ab6b Reviewed-on: https://go-review.googlesource.com/c/go/+/715540 Reviewed-by: Than McIntosh <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Cherry Mui <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
Last reference appears to have been removed in CL 227759. Change-Id: Ieb9da0a69a8beb96dcb5309ca43cf1df61d39bce Reviewed-on: https://go-review.googlesource.com/c/go/+/715541 Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Than McIntosh <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
The comment suggests that the text section is briefly writable. That is not the case. As the earlier part of the comment explains, part of the text section is mapped twice, once r-x and once rw-. It is never the case that there is writable executable memory. Change-Id: I56841e19a8a08f2515f29752536a5c8f180ac8c9 Reviewed-on: https://go-review.googlesource.com/c/go/+/715622 Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Reviewed-by: Than McIntosh <[email protected]> Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
The linker sources in several places used SXREF to mark the first SymKind which is not allocated in memory. This is cryptic. Instead use SFirstUnallocated, following the example of the existing SFirstWritable. Change-Id: If326ad63027402699094bcc49ef860db3772f82a Reviewed-on: https://go-review.googlesource.com/c/go/+/715623 Reviewed-by: Than McIntosh <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Since Go 1.2 the section is always empty. Also remove the code looking for .gosymtab in cmd/internal/objfile. For golang#76038 Change-Id: Icd34c870ed0c6da8001e8d32305f79905ee2b066 Reviewed-on: https://go-review.googlesource.com/c/go/+/717200 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Commit-Queue: Ian Lance Taylor <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
The textStart field requires a relocation, the only relocation in pclntab. And nothing uses it. So remove it. Replace it with a zero, which can itself be removed at some point in coordination with Delve. For golang#76038 Change-Id: I35675c0868c5d957bb375e40b804c516ae0300ca Reviewed-on: https://go-review.googlesource.com/c/go/+/717240 Reviewed-by: Cherry Mui <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch add support for Zawrs extension on RISCV64.
Updates golang#76179
🔄 This is a mirror of upstream PR golang#76178