Skip to content

Conversation

@austinderek
Copy link

This patch add support for Zawrs extension on RISCV64


🔄 This is a mirror of upstream PR golang#76178

FiloSottile and others added 30 commits November 3, 2025 07:14
…kages

Intrinsifying things inside the module (crypto/internal/fips140/subtle)
is asking for trouble, as the import paths are rewritten by the
GOFIPS140 mechanism, and we might have to support multiple modules
in the future.

Importing crypto/subtle from inside a FIPS 140-3 module is not allowed,
and is basically asking for circular dependencies.

Instead, break off the intrinsics into their own package
(crypto/internal/constanttime), and keep the byte slice operations
in crypto/internal/fips140/subtle. crypto/subtle then becomes a thin
dispatch layer.

Change-Id: I6a6a6964cd5cb5ad06e9d1679201447f5a811da4
Reviewed-on: https://go-review.googlesource.com/c/go/+/716120
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Filippo Valsorda <[email protected]>
Reviewed-by: Jorropo <[email protected]>
Since we always can get the address of `CALL runtime.deferreturn(SB)`
from the unwinder, so it is not necessary to record the caller's pc
in the _defer struct. For the stack allocated _defer, this CL makes
the frame smaller.

Change-Id: I0fd347e4bc07cf8a9b954816323df30fc52552b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/716720
Reviewed-by: Keith Randall <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
Fixes golang#73794

Change-Id: I0a57db05aacfa805213fe8278fc727e76eb8a65e
GitHub-Last-Rev: 3494d93
GitHub-Pull-Request: golang#73795
Reviewed-on: https://go-review.googlesource.com/c/go/+/674415
Reviewed-by: Sean Liao <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Michael Pratt <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
In our case, it greatly improves the performance of continuously collecting diff profiles from the net/http/pprof endpoint, such as /debug/pprof/allocs?seconds=30.

This CL is a cherry-pick of my PR upstream: google/pprof#951

Benchmark of profile Parse func:

goos: linux
goarch: amd64
pkg: github.com/google/pprof/profile
cpu: 13th Gen Intel(R) Core(TM) i7-1360P
         │ old-parse.txt │            new-parse.txt             │
         │    sec/op     │    sec/op     vs base                │
Parse-16    62.07m ± 13%   55.54m ± 13%  -10.52% (p=0.035 n=10)

         │ old-parse.txt │            new-parse.txt             │
         │     B/op      │     B/op      vs base                │
Parse-16    47.56Mi ± 0%   41.09Mi ± 0%  -13.59% (p=0.000 n=10)

         │ old-parse.txt │            new-parse.txt            │
         │   allocs/op   │  allocs/op   vs base                │
Parse-16     272.9k ± 0%   175.8k ± 0%  -35.58% (p=0.000 n=10)

Change-Id: I737ff9b9f815fdc56bc3b5743403717c4b6f07fd
GitHub-Last-Rev: a09108f
GitHub-Pull-Request: golang#76145
Reviewed-on: https://go-review.googlesource.com/c/go/+/717081
Reviewed-by: Michael Pratt <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Florian Lehner <[email protected]>
Change-Id: I26302d801732f40b1fe6b30ff69d222047bca490
Reviewed-on: https://go-review.googlesource.com/c/go/+/716740
Reviewed-by: Robert Griesemer <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
Change-Id: I0ea4d15da163cec6fe2a703376ce5a6032e15484
Reviewed-on: https://go-review.googlesource.com/c/go/+/714861
Reviewed-by: Keith Randall <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Auto-Submit: Keith Randall <[email protected]>
Knowing how many times cgo is used is useful information to have in the
local telemetry database.

It also opens the door for uploading them in the future if desired.

Change-Id: Ia92b11fc489f015bbface7f28ed5a5c2871c44f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/707055
Reviewed-by: Michael Matloob <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Robert Findley <[email protected]>
Reviewed-by: Michael Matloob <[email protected]>
Reviewed-by: Florian Lehner <[email protected]>
Move global variable to a field on the State type.

Change-Id: I1edd32e1d28ce814bcd75501098ee4b22227546b
Reviewed-on: https://go-review.googlesource.com/c/go/+/716162
Reviewed-by: Michael Matloob <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Matloob <[email protected]>
[git-generate]
cd src/cmd/go/internal/modload
rf '
  mv InitWorkfile State.InitWorkfile
  mv FindGoWork State.FindGoWork
  mv WillBeEnabled State.WillBeEnabled
  mv Enabled State.Enabled
  mv inWorkspaceMode State.inWorkspaceMode
  mv HasModRoot State.HasModRoot
  mv MustHaveModRoot State.MustHaveModRoot
  mv ModFilePath State.ModFilePath
'

Change-Id: I207113868af037c9c0049f4207c3d3b4c19468bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/716602
Reviewed-by: Michael Matloob <[email protected]>
Reviewed-by: Michael Matloob <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Change-Id: I43ea575aff87a3e420477cb26d35185d03df5ccc
Reviewed-on: https://go-review.googlesource.com/c/go/+/713283
Reviewed-by: Michael Pratt <[email protected]>
Auto-Submit: Michael Knyszek <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
We've been seeing the flakes where we get a 'no errors' output on
freebsd in addition to windows and solaris. Also allow that case to
avoid flakes.

For golang#73976

Change-Id: I6a6a696445ec908b55520d8d75e7c1f867b9c092
Reviewed-on: https://go-review.googlesource.com/c/go/+/715640
Reviewed-by: Alan Donovan <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Matloob <[email protected]>
Reviewed-by: Ian Alexander <[email protected]>
Support arm64 FMOVQ with large offset in immediate which is encoded
using register offset instruction in opldrr or opstrr. This will help
allowing folding immediate into new ssa ops FMOVQload and FMOVQstore.

For example: FMOVQ F0, -20000(R0) is encoded as following:
  MOVD 3(PC), R27
  FMOVQ F0, (R0)(R27)
  RET
  ffff b1e0 # constant value

Change-Id: Ib71f92f6ff4b310bda004a440b1df41ffe164523
Reviewed-on: https://go-review.googlesource.com/c/go/+/716960
Reviewed-by: Cherry Mui <[email protected]>
Auto-Submit: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
This commit adds test coverage of path building and name constraint
verification using the suite of test data provided by Netflix's
BetterTLS project.

Since the uncompressed raw JSON test data exported by BetterTLS for
external test integrations is ~31MB we use a similar approach to the
BoGo and ACVP test integrations and fetch the BetterTLS Go module, and
run its export tool on-the-fly to generate the test data in a tempdir.

As expected, all tests pass currently and this coverage is mainly
helpful in catching regressions, especially with tricky/cursed name
constraints.

Change-Id: I23d7c24232e314aece86bcbfd133b7f02c9e71b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/717420
TryBot-Bypass: Daniel McCarney <[email protected]>
Reviewed-by: Roland Shoemaker <[email protected]>
Auto-Submit: Daniel McCarney <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
This makes the user experience better, before users would receive
an unknown godebug error message, now we explicitly mention that
it was removed and link to go.dev/doc/godebug where users can find
more information about the removal.

Additionally we keep all the removed GODEBUGs in the source, making
sure we do not reuse such GODEBUG after it is removed.

Updates golang#72111
Updates golang#75316

Change-Id: I6a6a6964cce1c100108fdba4bfba7d13cd9a893a
Reviewed-on: https://go-review.googlesource.com/c/go/+/701875
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Mateusz Poliwczak <[email protected]>
Reviewed-by: Michael Matloob <[email protected]>
Reviewed-by: Michael Matloob <[email protected]>
When an unpinned Go pointer (or a pointer to an unpinned Go pointer) is
returned from Go to C,

     1  package main
     2
     3  import (
     4          "C"
     5  )
     6
     7  //export foo
     8  func foo(CLine *C.char) string {
     9          return C.GoString(CLine)
    10  }
    11
    12
    13  func main() {
    14  }

The error message mentions the file/line of the cgo wrapper,

    panic: runtime error: cgo result is unpinned Go pointer or points to unpinned Go pointer

    goroutine 17 [running, locked to thread]:
    panic({0x798f2341a4c0?, 0xc000112000?})
            /usr/lib/go/src/runtime/panic.go:802 +0x168
    runtime.cgoCheckArg(0x798f23417e20, 0xc000066e50, 0x0?, 0x0, {0x798f233f5a62, 0x42})
            /usr/lib/go/src/runtime/cgocall.go:679 +0x35b
    runtime.cgoCheckResult({0x798f23417e20, 0xc000066e50})
            /usr/lib/go/src/runtime/cgocall.go:795 +0x4b
    _cgoexp_3c910ddb72c4_foo(0x7ffc9fa9bfa0)
            _cgo_gotypes.go:65 +0x5d
    runtime.cgocallbackg1(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0)
            /usr/lib/go/src/runtime/cgocall.go:446 +0x289
    runtime.cgocallbackg(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0)
            /usr/lib/go/src/runtime/cgocall.go:350 +0x132
    runtime.cgocallbackg(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0)
            <autogenerated>:1 +0x2b
    runtime.cgocallback(0x0, 0x0, 0x0)
            /usr/lib/go/src/runtime/asm_amd64.s:1082 +0xcd
    runtime.goexit({})
            /usr/lib/go/src/runtime/asm_amd64.s:1693 +0x1

The cgo wrapper (_cgoexp_3c910ddb72c4_foo) is located in a temporary
build artifact (_cgo_gotypes.go)

    $ go tool cgo -objdir objdir parse.go
    $ cat -n objdir/_cgo_gotypes.go | sed -n '55,70p'
        55  //go:cgo_export_dynamic foo
        56  //go:linkname _cgoexp_d48770e267d1_foo _cgoexp_d48770e267d1_foo
        57  //go:cgo_export_static _cgoexp_d48770e267d1_foo
        58  func _cgoexp_d48770e267d1_foo(a *struct {
        59                  p0 *_Ctype_char
        60                  r0 string
        61          }) {
        62          a.r0 = foo(a.p0)
        63          _cgoCheckResult(a.r0)
        64  }

The file/line of the export'ed function is expected in the error message.

Use it in error messages.

    panic: runtime error: cgo result is unpinned Go pointer or points to unpinned Go pointer

    goroutine 17 [running, locked to thread]:
    panic({0x7df72b1d8ae0?, 0x3ec8a1790030?})
            /mnt/go/src/runtime/panic.go:877 +0x16f
    runtime.cgoCheckArg(0x7df72b1d62c0, 0x3ec8a16eee50, 0x68?, 0x0, {0x7df72b1ad44c, 0x42})
            /mnt/go/src/runtime/cgocall.go:679 +0x35b
    runtime.cgoCheckResult({0x7df72b1d62c0, 0x3ec8a16eee50})
            /mnt/go/src/runtime/cgocall.go:795 +0x4b
    _cgoexp_3c910ddb72c4_foo(0x7ffca1b21020)
            /mnt/tmp/parse.go:8 +0x5d
    runtime.cgocallbackg1(0x7df72b1a4360, 0x7ffca1b21020, 0x0)
            /mnt/go/src/runtime/cgocall.go:446 +0x289
    runtime.cgocallbackg(0x7df72b1a4360, 0x7ffca1b21020, 0x0)
            /mnt/go/src/runtime/cgocall.go:350 +0x132
    runtime.cgocallbackg(0x7df72b1a4360, 0x7ffca1b21020, 0x0)
            <autogenerated>:1 +0x2b
    runtime.cgocallback(0x0, 0x0, 0x0)
            /mnt/go/src/runtime/asm_amd64.s:1101 +0xcd
    runtime.goexit({})
            /mnt/go/src/runtime/asm_amd64.s:1712 +0x1

So doing, fix typos in comments.

Link: https://web.archive.org/web/20251008114504/https://dave.cheney.net/2018/01/08/gos-hidden-pragmas
Suggested-by: Keith Randall <[email protected]>
For golang#75856

Change-Id: I0bf36d5c8c5c0c7df13b00818bc4641009058979
GitHub-Last-Rev: e65839c
GitHub-Pull-Request: golang#76118
Reviewed-on: https://go-review.googlesource.com/c/go/+/716441
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
Auto-Submit: Michael Pratt <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Florian Lehner <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
The panic calls gopanic which may have write barriers, but
castogscanstatus is called from //go:nowritebarrier contexts.

The panic is dead code anyway, and appears immediately before a call to
'throw'.

Change-Id: I4a8e296b71bf002295a3aa1db4f723c305ed939a
Reviewed-on: https://go-review.googlesource.com/c/go/+/717406
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
net/http/cgi.TestCopyError calls runtime.Stack to take a stack trace of
all goroutines, and searches for a specific line in that stack trace.

It currently sometimes fails because it encounters the goroutine its
looking for in the small window where a goroutine might be in _Grunning
while in a syscall, introduced in CL 646198. In that case, the traceback
will give up, failing to print the stack TestCopyError is expecting.

This represents a general regression, since previously runtime.Stack
could never fail to take a goroutine's stack; giving up was only
possible in fatal panic cases.

Fix this the same way we fixed goroutine profiles: allow the stack trace
to proceed if the g's syscallsp != 0. This is safe in any
stop-the-world-related context, because syscallsp won't be mutated while
the goroutine fails to acquire a P, and thus fails to fully exit the
syscall context. This also means the stack below syscallsp won't be
mutated, and thus taking a traceback is also safe.

Fixes golang#66639.

Change-Id: Ie6f4b0661d9f8df02c9b8434e99bc95f26fe5f0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/716680
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Should make cmd/link/internal/ld.TestAbstractOriginSanity happier.

Change-Id: I121927d42e527ff23d996e7387066f149b11cc59
Reviewed-on: https://go-review.googlesource.com/c/go/+/717480
Reviewed-by: Cherry Mui <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
…upport

Go asm syntax:
	 VPERMIW        $0x1b, vj, vd
	XVPERMI{W,V,Q}  $0x1b, xj, xd

Equivalent platform assembler syntax:
	 vpermi.w       vd, vj, $0x1b
	xvpermi.{w,d,q} xd, xj, $0x1b

Change-Id: Ie23b2fdd09b4c93801dc804913206f1c5a496268
Reviewed-on: https://go-review.googlesource.com/c/go/+/716800
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Meidan Li <[email protected]>
Reviewed-by: sophie zhao <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
…en vector registers

Go asm syntax:
	VMOVQ     Vj, Vd
	XVMOVQ    Xj, Xd

Equivalent platform assembler syntax:
	vslli.d   vd, vj, 0x0
	xvslli.d  xd, xj, 0x0

Change-Id: Ifddc3d4d3fbaa6fee2e079bf2ebfe96a2febaa1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/716801
Reviewed-by: Michael Knyszek <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
Reviewed-by: Meidan Li <[email protected]>
Reviewed-by: sophie zhao <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
The exact meaning of pow10 was not defined nor tested directly.
Define it as pow10(e) returns mant, exp where mant/2^128 * 2**exp = 10^e.
This is the most natural definition but is off-by-one from what
it had been returning. Fix the off-by-one and then adjust the
call sites to stop compensating for it.

Change-Id: I9ee475854f30be4bd0d4f4d770a6b12ec68281fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/717180
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Alan Donovan <[email protected]>
Auto-Submit: Russ Cox <[email protected]>
ftoaFixed is in the next CL; this proves the tests are correct
against the current implementation, and it adds a benchmark
for comparison with the new implementation.

Change-Id: I7ac8a1f699b693ea6d11a7122b22fc70cc135af6
Reviewed-on: https://go-review.googlesource.com/c/go/+/717181
Auto-Submit: Russ Cox <[email protected]>
Reviewed-by: Alan Donovan <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
The fixed-precision ftoa algorithm is not actually
documented in the Ryū paper, and it is fairly
straightforward: multiply by a power of 10 to get
an integer that contains the digits we need.
There is also no need for separate float32 and float64
implementations.

This CL implements a new fixedFtoa, separate from Ryū.
The overall algorithm is the same, but the new code
is simpler, faster, and better documented.

Now ftoaryu.go is only about shortest-output formatting,
so if and when yet another algorithm comes along, it will
be clearer what should be replaced (all of ftoaryu.go)
and what should not (all of ftoafixed.go).

benchmark \ host                  linux-arm64    local  linux-amd64       s7  linux-386  s7:GOARCH=386
                                      vs base  vs base      vs base  vs base    vs base        vs base
AppendFloat/Decimal                    -0.18%        ~            ~   -0.68%     +0.49%         -0.79%
AppendFloat/Float                      +0.09%        ~       +1.50%   +0.84%     -0.37%         -0.69%
AppendFloat/Exp                        -0.51%        ~            ~   +1.20%     -1.27%         -1.01%
AppendFloat/NegExp                     -1.01%        ~       +3.43%   +1.35%     -2.33%              ~
AppendFloat/LongExp                    -1.22%   +0.77%            ~        ~     -1.48%              ~
AppendFloat/Big                        -2.07%        ~       -2.07%   -1.97%     -2.89%         -2.93%
AppendFloat/BinaryExp                  -0.28%   +1.06%            ~   +1.35%     -0.64%         -1.64%
AppendFloat/32Integer                       ~        ~            ~   -0.79%          ~         -0.66%
AppendFloat/32ExactFraction            -0.50%        ~       +5.69%        ~     -1.24%         +0.69%
AppendFloat/32Point                         ~   -1.19%       +2.59%   +1.03%     -1.37%         +0.80%
AppendFloat/32Exp                      -3.39%   -2.79%       -8.36%   -0.94%     -5.72%         -5.92%
AppendFloat/32NegExp                   -0.63%        ~            ~   +0.98%     -1.34%         -0.73%
AppendFloat/32Shortest                 -1.00%   +1.36%       +2.94%        ~          ~              ~
AppendFloat/32Fixed8Hard               -5.91%  -12.45%       -6.62%        ~    +18.46%        +11.61%
AppendFloat/32Fixed9Hard               -6.53%  -11.35%       -6.01%   -0.97%    -18.31%         -9.16%
AppendFloat/64Fixed1                  -13.84%  -16.90%      -13.13%  -10.71%    -24.52%        -18.94%
AppendFloat/64Fixed2                  -11.12%  -16.97%      -12.13%   -9.88%    -22.73%        -15.48%
AppendFloat/64Fixed2.5                -21.98%  -20.75%      -19.08%  -14.74%    -28.11%        -24.92%
AppendFloat/64Fixed3                  -11.53%  -16.21%      -10.75%   -7.53%    -23.11%        -15.78%
AppendFloat/64Fixed4                  -12.89%  -12.36%      -11.07%   -9.79%    -14.51%        -13.44%
AppendFloat/64Fixed5Hard              -47.62%  -38.59%      -40.83%  -37.06%    -60.51%        -55.29%
AppendFloat/64Fixed12                  -7.40%        ~       -8.56%   -4.31%    -13.82%         -8.61%
AppendFloat/64Fixed16                  -9.10%   -8.95%       -6.92%   -3.92%    -12.99%         -9.03%
AppendFloat/64Fixed12Hard              -9.14%   -5.24%       -6.23%   -4.82%    -13.58%         -8.99%
AppendFloat/64Fixed17Hard              -6.80%        ~       -4.03%   -2.84%    -19.81%        -10.27%
AppendFloat/64Fixed18Hard              -0.12%        ~            ~        ~          ~              ~
AppendFloat/64FixedF1                       ~        ~            ~        ~     -0.40%         +2.72%
AppendFloat/64FixedF2                  -0.18%        ~       -1.98%   -0.95%          ~         +1.25%
AppendFloat/64FixedF3                  -0.29%        ~            ~        ~          ~         +1.22%
AppendFloat/Slowpath64                 -1.16%        ~            ~        ~          ~         -2.16%
AppendFloat/SlowpathDenormal64         -1.09%        ~            ~   -0.88%     -0.83%              ~

host: linux-arm64
goos: linux
goarch: arm64
pkg: internal/strconv
cpu: unknown
                                 │ 14b7e09f493  │             f9bf7fcb8e2             │
                                 │    sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-8               60.35n ± 0%   60.24n ± 0%   -0.18% (p=0.000 n=20)
AppendFloat/Float-8                 88.83n ± 0%   88.91n ± 0%   +0.09% (p=0.000 n=20)
AppendFloat/Exp-8                   93.55n ± 0%   93.06n ± 0%   -0.51% (p=0.000 n=20)
AppendFloat/NegExp-8                94.01n ± 0%   93.06n ± 0%   -1.01% (p=0.000 n=20)
AppendFloat/LongExp-8              101.00n ± 0%   99.77n ± 0%   -1.22% (p=0.000 n=20)
AppendFloat/Big-8                   106.1n ± 0%   103.9n ± 0%   -2.07% (p=0.000 n=20)
AppendFloat/BinaryExp-8             47.48n ± 0%   47.35n ± 0%   -0.28% (p=0.000 n=20)
AppendFloat/32Integer-8             60.45n ± 0%   60.43n ± 0%        ~ (p=0.150 n=20)
AppendFloat/32ExactFraction-8       86.65n ± 0%   86.22n ± 0%   -0.50% (p=0.000 n=20)
AppendFloat/32Point-8               83.26n ± 0%   83.21n ± 0%        ~ (p=0.046 n=20)
AppendFloat/32Exp-8                 92.55n ± 0%   89.42n ± 0%   -3.39% (p=0.000 n=20)
AppendFloat/32NegExp-8              87.89n ± 0%   87.34n ± 0%   -0.63% (p=0.000 n=20)
AppendFloat/32Shortest-8            77.05n ± 0%   76.28n ± 0%   -1.00% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-8          55.73n ± 0%   52.44n ± 0%   -5.91% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-8          64.80n ± 0%   60.57n ± 0%   -6.53% (p=0.000 n=20)
AppendFloat/64Fixed1-8              53.72n ± 0%   46.29n ± 0%  -13.84% (p=0.000 n=20)
AppendFloat/64Fixed2-8              52.64n ± 0%   46.79n ± 0%  -11.12% (p=0.000 n=20)
AppendFloat/64Fixed2.5-8            56.01n ± 0%   43.70n ± 0%  -21.98% (p=0.000 n=20)
AppendFloat/64Fixed3-8              53.38n ± 0%   47.23n ± 0%  -11.53% (p=0.000 n=20)
AppendFloat/64Fixed4-8              50.62n ± 0%   44.10n ± 0%  -12.89% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-8          98.94n ± 0%   51.82n ± 0%  -47.62% (p=0.000 n=20)
AppendFloat/64Fixed12-8             84.70n ± 0%   78.44n ± 0%   -7.40% (p=0.000 n=20)
AppendFloat/64Fixed16-8             71.68n ± 0%   65.16n ± 0%   -9.10% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-8         68.41n ± 0%   62.16n ± 0%   -9.14% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-8         79.31n ± 0%   73.92n ± 0%   -6.80% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-8         4.290µ ± 0%   4.285µ ± 0%   -0.12% (p=0.000 n=20)
AppendFloat/64FixedF1-8             216.0n ± 0%   216.1n ± 0%        ~ (p=0.090 n=20)
AppendFloat/64FixedF2-8             228.2n ± 0%   227.8n ± 0%   -0.18% (p=0.000 n=20)
AppendFloat/64FixedF3-8             208.8n ± 0%   208.2n ± 0%   -0.29% (p=0.000 n=20)
AppendFloat/Slowpath64-8            98.56n ± 0%   97.42n ± 0%   -1.16% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-8    95.81n ± 0%   94.77n ± 0%   -1.09% (p=0.000 n=20)
geomean                             93.81n        87.87n        -6.33%

host: local
goos: darwin
cpu: Apple M3 Pro
                                  │ 14b7e09f493 │             f9bf7fcb8e2              │
                                  │   sec/op    │    sec/op     vs base                │
AppendFloat/Decimal-12              21.14n ± 0%   21.15n ±  0%        ~ (p=0.963 n=20)
AppendFloat/Float-12                32.48n ± 1%   32.43n ±  0%        ~ (p=0.358 n=20)
AppendFloat/Exp-12                  31.85n ± 0%   31.94n ±  1%        ~ (p=0.634 n=20)
AppendFloat/NegExp-12               31.75n ± 0%   32.04n ±  0%        ~ (p=0.004 n=20)
AppendFloat/LongExp-12              33.55n ± 0%   33.81n ±  0%   +0.77% (p=0.000 n=20)
AppendFloat/Big-12                  35.62n ± 1%   35.73n ±  1%        ~ (p=0.888 n=20)
AppendFloat/BinaryExp-12            19.26n ± 0%   19.46n ±  1%   +1.06% (p=0.000 n=20)
AppendFloat/32Integer-12            21.41n ± 0%   21.46n ±  1%        ~ (p=0.733 n=20)
AppendFloat/32ExactFraction-12      31.23n ± 1%   31.30n ±  1%        ~ (p=0.857 n=20)
AppendFloat/32Point-12              31.39n ± 1%   31.02n ±  0%   -1.19% (p=0.000 n=20)
AppendFloat/32Exp-12                32.42n ± 1%   31.52n ±  1%   -2.79% (p=0.000 n=20)
AppendFloat/32NegExp-12             30.66n ± 1%   30.66n ±  1%        ~ (p=0.380 n=20)
AppendFloat/32Shortest-12           26.88n ± 1%   27.25n ±  1%   +1.36% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-12         19.52n ± 0%   17.09n ±  1%  -12.45% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-12         21.55n ± 2%   19.11n ±  1%  -11.35% (p=0.000 n=20)
AppendFloat/64Fixed1-12             18.64n ± 0%   15.49n ±  0%  -16.90% (p=0.000 n=20)
AppendFloat/64Fixed2-12             18.65n ± 0%   15.49n ±  0%  -16.97% (p=0.000 n=20)
AppendFloat/64Fixed2.5-12           19.23n ± 1%   15.24n ±  0%  -20.75% (p=0.000 n=20)
AppendFloat/64Fixed3-12             18.61n ± 0%   15.59n ±  1%  -16.21% (p=0.000 n=20)
AppendFloat/64Fixed4-12             17.55n ± 1%   15.38n ±  0%  -12.36% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-12         29.27n ± 1%   17.97n ±  0%  -38.59% (p=0.000 n=20)
AppendFloat/64Fixed12-12            28.26n ± 1%   28.17n ± 10%        ~ (p=0.941 n=20)
AppendFloat/64Fixed16-12            23.56n ± 0%   21.46n ±  0%   -8.95% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-12        21.85n ± 2%   20.70n ±  1%   -5.24% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-12        26.91n ± 1%   27.10n ±  0%        ~ (p=0.059 n=20)
AppendFloat/64Fixed18Hard-12        2.197µ ± 1%   2.169µ ±  1%        ~ (p=0.013 n=20)
AppendFloat/64FixedF1-12            103.7n ± 1%   103.3n ±  0%        ~ (p=0.035 n=20)
AppendFloat/64FixedF2-12            114.8n ± 1%   114.1n ±  1%        ~ (p=0.234 n=20)
AppendFloat/64FixedF3-12            107.8n ± 1%   107.1n ±  1%        ~ (p=0.180 n=20)
AppendFloat/Slowpath64-12           32.05n ± 1%   32.00n ±  0%        ~ (p=0.952 n=20)
AppendFloat/SlowpathDenormal64-12   29.98n ± 1%   30.20n ±  0%        ~ (p=0.004 n=20)
geomean                             33.83n        31.91n         -5.68%

host: linux-amd64
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                                  │ 14b7e09f493  │             f9bf7fcb8e2              │
                                  │    sec/op    │    sec/op     vs base                │
AppendFloat/Decimal-16               64.00n ± 1%    63.67n ± 1%        ~ (p=0.784 n=20)
AppendFloat/Float-16                 95.99n ± 1%    97.42n ± 1%   +1.50% (p=0.000 n=20)
AppendFloat/Exp-16                   97.59n ± 1%    97.72n ± 1%        ~ (p=0.984 n=20)
AppendFloat/NegExp-16                97.80n ± 1%   101.15n ± 1%   +3.43% (p=0.000 n=20)
AppendFloat/LongExp-16               103.1n ± 1%    104.5n ± 1%        ~ (p=0.006 n=20)
AppendFloat/Big-16                   110.8n ± 1%    108.5n ± 1%   -2.07% (p=0.000 n=20)
AppendFloat/BinaryExp-16             47.82n ± 1%    47.33n ± 1%        ~ (p=0.007 n=20)
AppendFloat/32Integer-16             63.65n ± 1%    63.51n ± 0%        ~ (p=0.560 n=20)
AppendFloat/32ExactFraction-16       91.81n ± 1%    97.03n ± 1%   +5.69% (p=0.000 n=20)
AppendFloat/32Point-16               89.84n ± 1%    92.16n ± 1%   +2.59% (p=0.000 n=20)
AppendFloat/32Exp-16                103.80n ± 1%    95.12n ± 1%   -8.36% (p=0.000 n=20)
AppendFloat/32NegExp-16              93.70n ± 1%    94.87n ± 1%        ~ (p=0.003 n=20)
AppendFloat/32Shortest-16            83.98n ± 1%    86.45n ± 1%   +2.94% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-16          61.91n ± 1%    57.81n ± 1%   -6.62% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-16          71.08n ± 0%    66.81n ± 1%   -6.01% (p=0.000 n=20)
AppendFloat/64Fixed1-16              59.27n ± 2%    51.49n ± 1%  -13.13% (p=0.000 n=20)
AppendFloat/64Fixed2-16              57.89n ± 1%    50.87n ± 1%  -12.13% (p=0.000 n=20)
AppendFloat/64Fixed2.5-16            61.04n ± 1%    49.40n ± 1%  -19.08% (p=0.000 n=20)
AppendFloat/64Fixed3-16              58.42n ± 1%    52.14n ± 1%  -10.75% (p=0.000 n=20)
AppendFloat/64Fixed4-16              56.52n ± 1%    50.27n ± 1%  -11.07% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-16          97.79n ± 1%    57.86n ± 1%  -40.83% (p=0.000 n=20)
AppendFloat/64Fixed12-16             90.78n ± 1%    83.01n ± 1%   -8.56% (p=0.000 n=20)
AppendFloat/64Fixed16-16             76.11n ± 1%    70.84n ± 0%   -6.92% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-16         73.56n ± 1%    68.98n ± 2%   -6.23% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-16         83.20n ± 1%    79.85n ± 1%   -4.03% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-16         4.947µ ± 1%    4.915µ ± 1%        ~ (p=0.229 n=20)
AppendFloat/64FixedF1-16             242.4n ± 1%    239.4n ± 1%        ~ (p=0.038 n=20)
AppendFloat/64FixedF2-16             257.7n ± 2%    252.6n ± 1%   -1.98% (p=0.000 n=20)
AppendFloat/64FixedF3-16             237.5n ± 0%    237.5n ± 1%        ~ (p=0.440 n=20)
AppendFloat/Slowpath64-16            99.75n ± 1%    99.78n ± 1%        ~ (p=0.995 n=20)
AppendFloat/SlowpathDenormal64-16    97.41n ± 1%    98.20n ± 1%        ~ (p=0.006 n=20)
geomean                              100.7n         95.60n        -5.05%

host: s7
cpu: AMD Ryzen 9 7950X 16-Core Processor
                                  │ 14b7e09f493 │             f9bf7fcb8e2             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-32              22.19n ± 0%   22.04n ± 0%   -0.68% (p=0.000 n=20)
AppendFloat/Float-32                34.59n ± 0%   34.88n ± 0%   +0.84% (p=0.000 n=20)
AppendFloat/Exp-32                  34.47n ± 0%   34.88n ± 0%   +1.20% (p=0.000 n=20)
AppendFloat/NegExp-32               34.85n ± 0%   35.32n ± 0%   +1.35% (p=0.000 n=20)
AppendFloat/LongExp-32              37.23n ± 0%   37.09n ± 0%        ~ (p=0.003 n=20)
AppendFloat/Big-32                  39.27n ± 0%   38.50n ± 0%   -1.97% (p=0.000 n=20)
AppendFloat/BinaryExp-32            17.38n ± 0%   17.61n ± 0%   +1.35% (p=0.000 n=20)
AppendFloat/32Integer-32            22.26n ± 0%   22.08n ± 0%   -0.79% (p=0.000 n=20)
AppendFloat/32ExactFraction-32      32.82n ± 0%   32.91n ± 0%        ~ (p=0.018 n=20)
AppendFloat/32Point-32              32.88n ± 0%   33.22n ± 0%   +1.03% (p=0.000 n=20)
AppendFloat/32Exp-32                34.95n ± 0%   34.62n ± 0%   -0.94% (p=0.000 n=20)
AppendFloat/32NegExp-32             33.23n ± 0%   33.55n ± 0%   +0.98% (p=0.000 n=20)
AppendFloat/32Shortest-32           30.19n ± 0%   30.12n ± 0%        ~ (p=0.122 n=20)
AppendFloat/32Fixed8Hard-32         22.94n ± 0%   22.88n ± 0%        ~ (p=0.124 n=20)
AppendFloat/32Fixed9Hard-32         26.20n ± 0%   25.94n ± 1%   -0.97% (p=0.000 n=20)
AppendFloat/64Fixed1-32             21.10n ± 0%   18.84n ± 0%  -10.71% (p=0.000 n=20)
AppendFloat/64Fixed2-32             20.75n ± 0%   18.70n ± 0%   -9.88% (p=0.000 n=20)
AppendFloat/64Fixed2.5-32           21.07n ± 0%   17.96n ± 0%  -14.74% (p=0.000 n=20)
AppendFloat/64Fixed3-32             21.24n ± 0%   19.64n ± 0%   -7.53% (p=0.000 n=20)
AppendFloat/64Fixed4-32             20.63n ± 0%   18.61n ± 0%   -9.79% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-32         34.48n ± 0%   21.70n ± 0%  -37.06% (p=0.000 n=20)
AppendFloat/64Fixed12-32            32.26n ± 0%   30.87n ± 1%   -4.31% (p=0.000 n=20)
AppendFloat/64Fixed16-32            27.95n ± 0%   26.86n ± 0%   -3.92% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-32        27.30n ± 0%   25.98n ± 1%   -4.82% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-32        30.80n ± 0%   29.93n ± 0%   -2.84% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-32        1.833µ ± 0%   1.831µ ± 0%        ~ (p=0.663 n=20)
AppendFloat/64FixedF1-32            83.42n ± 1%   84.00n ± 1%        ~ (p=0.003 n=20)
AppendFloat/64FixedF2-32            90.10n ± 0%   89.23n ± 1%   -0.95% (p=0.001 n=20)
AppendFloat/64FixedF3-32            84.42n ± 1%   84.39n ± 0%        ~ (p=0.878 n=20)
AppendFloat/Slowpath64-32           35.72n ± 0%   35.59n ± 0%        ~ (p=0.007 n=20)
AppendFloat/SlowpathDenormal64-32   35.36n ± 0%   35.05n ± 0%   -0.88% (p=0.000 n=20)
geomean                             36.05n        34.69n        -3.77%

host: linux-386
goarch: 386
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                                  │ 14b7e09f493 │             f9bf7fcb8e2             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-16              132.8n ± 0%   133.5n ± 0%   +0.49% (p=0.001 n=20)
AppendFloat/Float-16                242.6n ± 0%   241.7n ± 0%   -0.37% (p=0.000 n=20)
AppendFloat/Exp-16                  252.2n ± 0%   249.1n ± 0%   -1.27% (p=0.000 n=20)
AppendFloat/NegExp-16               253.6n ± 0%   247.7n ± 0%   -2.33% (p=0.000 n=20)
AppendFloat/LongExp-16              260.9n ± 0%   257.1n ± 0%   -1.48% (p=0.000 n=20)
AppendFloat/Big-16                  293.7n ± 0%   285.2n ± 0%   -2.89% (p=0.000 n=20)
AppendFloat/BinaryExp-16            89.63n ± 1%   89.06n ± 0%   -0.64% (p=0.000 n=20)
AppendFloat/32Integer-16            132.6n ± 0%   133.2n ± 0%        ~ (p=0.016 n=20)
AppendFloat/32ExactFraction-16      216.9n ± 0%   214.2n ± 0%   -1.24% (p=0.000 n=20)
AppendFloat/32Point-16              205.0n ± 0%   202.2n ± 0%   -1.37% (p=0.000 n=20)
AppendFloat/32Exp-16                250.2n ± 0%   235.9n ± 0%   -5.72% (p=0.000 n=20)
AppendFloat/32NegExp-16             213.5n ± 0%   210.6n ± 0%   -1.34% (p=0.000 n=20)
AppendFloat/32Shortest-16           198.3n ± 0%   197.8n ± 0%        ~ (p=0.147 n=20)
AppendFloat/32Fixed8Hard-16         114.9n ± 1%   136.0n ± 1%  +18.46% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-16         189.8n ± 0%   155.0n ± 1%  -18.31% (p=0.000 n=20)
AppendFloat/64Fixed1-16             175.8n ± 0%   132.7n ± 0%  -24.52% (p=0.000 n=20)
AppendFloat/64Fixed2-16             166.6n ± 0%   128.7n ± 0%  -22.73% (p=0.000 n=20)
AppendFloat/64Fixed2.5-16           176.5n ± 0%   126.8n ± 0%  -28.11% (p=0.000 n=20)
AppendFloat/64Fixed3-16             165.3n ± 0%   127.1n ± 0%  -23.11% (p=0.000 n=20)
AppendFloat/64Fixed4-16             141.3n ± 0%   120.8n ± 1%  -14.51% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-16         344.6n ± 0%   136.0n ± 0%  -60.51% (p=0.000 n=20)
AppendFloat/64Fixed12-16            184.2n ± 0%   158.7n ± 0%  -13.82% (p=0.000 n=20)
AppendFloat/64Fixed16-16            174.0n ± 0%   151.3n ± 0%  -12.99% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-16        169.7n ± 0%   146.7n ± 0%  -13.58% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-16        207.7n ± 0%   166.6n ± 0%  -19.81% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-16        10.66µ ± 0%   10.63µ ± 0%        ~ (p=0.030 n=20)
AppendFloat/64FixedF1-16            615.9n ± 0%   613.5n ± 0%   -0.40% (p=0.000 n=20)
AppendFloat/64FixedF2-16            846.6n ± 0%   847.4n ± 0%        ~ (p=0.551 n=20)
AppendFloat/64FixedF3-16            609.9n ± 0%   609.5n ± 0%        ~ (p=0.213 n=20)
AppendFloat/Slowpath64-16           254.1n ± 0%   252.6n ± 1%        ~ (p=0.048 n=20)
AppendFloat/SlowpathDenormal64-16   251.5n ± 0%   249.4n ± 0%   -0.83% (p=0.000 n=20)
geomean                             249.2n        225.4n        -9.54%

host: s7:GOARCH=386
cpu: AMD Ryzen 9 7950X 16-Core Processor
                                  │ 14b7e09f493 │             f9bf7fcb8e2             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-32              42.65n ± 0%   42.31n ± 0%   -0.79% (p=0.000 n=20)
AppendFloat/Float-32                71.56n ± 0%   71.06n ± 0%   -0.69% (p=0.000 n=20)
AppendFloat/Exp-32                  75.61n ± 1%   74.85n ± 1%   -1.01% (p=0.000 n=20)
AppendFloat/NegExp-32               74.36n ± 0%   74.30n ± 0%        ~ (p=0.482 n=20)
AppendFloat/LongExp-32              75.82n ± 0%   75.73n ± 0%        ~ (p=0.490 n=20)
AppendFloat/Big-32                  85.10n ± 0%   82.61n ± 0%   -2.93% (p=0.000 n=20)
AppendFloat/BinaryExp-32            33.02n ± 0%   32.48n ± 1%   -1.64% (p=0.000 n=20)
AppendFloat/32Integer-32            41.54n ± 1%   41.27n ± 1%   -0.66% (p=0.000 n=20)
AppendFloat/32ExactFraction-32      62.48n ± 0%   62.91n ± 0%   +0.69% (p=0.000 n=20)
AppendFloat/32Point-32              60.17n ± 0%   60.65n ± 0%   +0.80% (p=0.000 n=20)
AppendFloat/32Exp-32                73.34n ± 0%   68.99n ± 0%   -5.92% (p=0.000 n=20)
AppendFloat/32NegExp-32             63.29n ± 0%   62.83n ± 0%   -0.73% (p=0.000 n=20)
AppendFloat/32Shortest-32           58.97n ± 0%   59.07n ± 0%        ~ (p=0.029 n=20)
AppendFloat/32Fixed8Hard-32         37.42n ± 0%   41.76n ± 1%  +11.61% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-32         55.18n ± 0%   50.13n ± 1%   -9.16% (p=0.000 n=20)
AppendFloat/64Fixed1-32             50.89n ± 1%   41.25n ± 0%  -18.94% (p=0.000 n=20)
AppendFloat/64Fixed2-32             48.33n ± 1%   40.85n ± 1%  -15.48% (p=0.000 n=20)
AppendFloat/64Fixed2.5-32           52.46n ± 0%   39.39n ± 0%  -24.92% (p=0.000 n=20)
AppendFloat/64Fixed3-32             48.28n ± 1%   40.66n ± 0%  -15.78% (p=0.000 n=20)
AppendFloat/64Fixed4-32             44.57n ± 0%   38.58n ± 0%  -13.44% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-32         96.16n ± 0%   42.99n ± 1%  -55.29% (p=0.000 n=20)
AppendFloat/64Fixed12-32            56.84n ± 0%   51.95n ± 1%   -8.61% (p=0.000 n=20)
AppendFloat/64Fixed16-32            54.23n ± 0%   49.33n ± 0%   -9.03% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-32        53.47n ± 0%   48.67n ± 0%   -8.99% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-32        61.76n ± 0%   55.42n ± 1%  -10.27% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-32        3.998µ ± 1%   4.001µ ± 0%        ~ (p=0.449 n=20)
AppendFloat/64FixedF1-32            161.8n ± 0%   166.2n ± 1%   +2.72% (p=0.000 n=20)
AppendFloat/64FixedF2-32            223.4n ± 2%   226.2n ± 1%   +1.25% (p=0.000 n=20)
AppendFloat/64FixedF3-32            159.6n ± 0%   161.6n ± 1%   +1.22% (p=0.000 n=20)
AppendFloat/Slowpath64-32           76.69n ± 0%   75.03n ± 0%   -2.16% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-32   75.02n ± 0%   74.36n ± 1%        ~ (p=0.003 n=20)
geomean                             74.66n        69.39n        -7.06%

Change-Id: I9db46471a93bd2aab3c2796e563d154cb531d4cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/717182
Reviewed-by: Alan Donovan <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Russ Cox <[email protected]>
We re-slice the data being processed at the stat of each loop. If the
var that we use to calculate where to re-slice is < 0 or > the length
of the remaining data, return instead of attempting to re-slice.

Change-Id: I1d6c2b6c596feedeea8feeaace370ea73ba02c4c
Reviewed-on: https://go-review.googlesource.com/c/go/+/715260
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Roland Shoemaker <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
This lets us remove useAvg and useHmul from the division rules.
The compiler is simpler and the generated code is faster.

goos: wasip1
goarch: wasm
pkg: internal/strconv
                               │   old.txt   │               new.txt               │
                               │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal              192.8n ± 1%   194.6n ± 0%   +0.91% (p=0.000 n=10)
AppendFloat/Float                328.6n ± 0%   279.6n ± 0%  -14.93% (p=0.000 n=10)
AppendFloat/Exp                  335.6n ± 1%   289.2n ± 1%  -13.80% (p=0.000 n=10)
AppendFloat/NegExp               336.0n ± 0%   289.1n ± 1%  -13.97% (p=0.000 n=10)
AppendFloat/LongExp              332.4n ± 0%   285.2n ± 1%  -14.20% (p=0.000 n=10)
AppendFloat/Big                  348.2n ± 0%   300.1n ± 0%  -13.83% (p=0.000 n=10)
AppendFloat/BinaryExp            137.4n ± 0%   138.2n ± 0%   +0.55% (p=0.001 n=10)
AppendFloat/32Integer            193.3n ± 1%   196.5n ± 0%   +1.66% (p=0.000 n=10)
AppendFloat/32ExactFraction      283.3n ± 0%   268.9n ± 1%   -5.08% (p=0.000 n=10)
AppendFloat/32Point              279.9n ± 0%   266.5n ± 0%   -4.80% (p=0.000 n=10)
AppendFloat/32Exp                300.1n ± 0%   288.3n ± 1%   -3.90% (p=0.000 n=10)
AppendFloat/32NegExp             288.2n ± 1%   277.9n ± 1%   -3.59% (p=0.000 n=10)
AppendFloat/32Shortest           261.7n ± 0%   250.2n ± 0%   -4.39% (p=0.000 n=10)
AppendFloat/32Fixed8Hard         173.3n ± 1%   158.9n ± 1%   -8.31% (p=0.000 n=10)
AppendFloat/32Fixed9Hard         180.0n ± 0%   167.9n ± 2%   -6.70% (p=0.000 n=10)
AppendFloat/64Fixed1             167.1n ± 0%   149.6n ± 1%  -10.50% (p=0.000 n=10)
AppendFloat/64Fixed2             162.4n ± 1%   146.5n ± 0%   -9.73% (p=0.000 n=10)
AppendFloat/64Fixed2.5           165.5n ± 0%   149.4n ± 1%   -9.70% (p=0.000 n=10)
AppendFloat/64Fixed3             166.4n ± 1%   150.2n ± 0%   -9.74% (p=0.000 n=10)
AppendFloat/64Fixed4             163.7n ± 0%   149.6n ± 1%   -8.62% (p=0.000 n=10)
AppendFloat/64Fixed5Hard         182.8n ± 1%   167.1n ± 1%   -8.61% (p=0.000 n=10)
AppendFloat/64Fixed12            222.2n ± 0%   208.8n ± 0%   -6.05% (p=0.000 n=10)
AppendFloat/64Fixed16            197.6n ± 1%   181.7n ± 0%   -8.02% (p=0.000 n=10)
AppendFloat/64Fixed12Hard        194.5n ± 0%   181.0n ± 0%   -6.99% (p=0.000 n=10)
AppendFloat/64Fixed17Hard        205.1n ± 1%   191.9n ± 0%   -6.44% (p=0.000 n=10)
AppendFloat/64Fixed18Hard        6.269µ ± 0%   6.643µ ± 0%   +5.97% (p=0.000 n=10)
AppendFloat/64FixedF1            211.7n ± 1%   197.0n ± 0%   -6.95% (p=0.000 n=10)
AppendFloat/64FixedF2            189.4n ± 0%   174.2n ± 0%   -8.08% (p=0.000 n=10)
AppendFloat/64FixedF3            169.0n ± 0%   154.9n ± 0%   -8.32% (p=0.000 n=10)
AppendFloat/Slowpath64           321.2n ± 0%   274.2n ± 1%  -14.63% (p=0.000 n=10)
AppendFloat/SlowpathDenormal64   307.4n ± 1%   261.2n ± 0%  -15.03% (p=0.000 n=10)
AppendInt                        3.367µ ± 1%   3.376µ ± 0%        ~ (p=0.517 n=10)
AppendUint                       675.5n ± 0%   676.9n ± 0%        ~ (p=0.196 n=10)
AppendIntSmall                   28.13n ± 1%   28.17n ± 0%   +0.14% (p=0.015 n=10)
AppendUintVarlen/digits=1        20.70n ± 0%   20.51n ± 1%   -0.89% (p=0.018 n=10)
AppendUintVarlen/digits=2        20.43n ± 0%   20.27n ± 0%   -0.81% (p=0.001 n=10)
AppendUintVarlen/digits=3        38.48n ± 0%   37.93n ± 0%   -1.43% (p=0.000 n=10)
AppendUintVarlen/digits=4        41.10n ± 0%   38.78n ± 1%   -5.62% (p=0.000 n=10)
AppendUintVarlen/digits=5        42.25n ± 1%   42.11n ± 0%   -0.32% (p=0.041 n=10)
AppendUintVarlen/digits=6        45.40n ± 1%   43.14n ± 0%   -4.98% (p=0.000 n=10)
AppendUintVarlen/digits=7        46.81n ± 1%   46.03n ± 0%   -1.66% (p=0.000 n=10)
AppendUintVarlen/digits=8        48.88n ± 1%   46.59n ± 1%   -4.68% (p=0.000 n=10)
AppendUintVarlen/digits=9        49.94n ± 2%   49.41n ± 1%   -1.06% (p=0.000 n=10)
AppendUintVarlen/digits=10       57.28n ± 1%   56.92n ± 1%   -0.62% (p=0.045 n=10)
AppendUintVarlen/digits=11       60.09n ± 1%   58.11n ± 2%   -3.30% (p=0.000 n=10)
AppendUintVarlen/digits=12       62.22n ± 0%   61.85n ± 0%   -0.59% (p=0.000 n=10)
AppendUintVarlen/digits=13       64.94n ± 0%   62.92n ± 0%   -3.10% (p=0.000 n=10)
AppendUintVarlen/digits=14       65.42n ± 1%   65.19n ± 1%   -0.34% (p=0.005 n=10)
AppendUintVarlen/digits=15       68.17n ± 0%   66.13n ± 0%   -2.99% (p=0.000 n=10)
AppendUintVarlen/digits=16       70.21n ± 1%   70.09n ± 1%        ~ (p=0.517 n=10)
AppendUintVarlen/digits=17       72.93n ± 0%   70.49n ± 0%   -3.34% (p=0.000 n=10)
AppendUintVarlen/digits=18       73.01n ± 0%   72.75n ± 0%   -0.35% (p=0.000 n=10)
AppendUintVarlen/digits=19       79.27n ± 1%   79.49n ± 1%        ~ (p=0.671 n=10)
AppendUintVarlen/digits=20       82.18n ± 0%   80.43n ± 1%   -2.14% (p=0.000 n=10)
geomean                          143.4n        136.0n        -5.20%


Change-Id: I8245814a0259ad13cf9225f57db8e9fe3d2e4267
Reviewed-on: https://go-review.googlesource.com/c/go/+/717407
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
Everyone writes papers about fast shortest-output formatting.
Eventually we also sped up fixed-length formatting %e and %g.
But we've neglected %f, which falls back to the slow general code
even for relatively trivial things like %.2f on 1.23.

This CL uses the fast path fixedFtoa for %f when possible by
estimating the number of digits needed.

benchmark \ host                  linux-arm64    local  linux-amd64       s7  linux-386  s7:GOARCH=386
                                      vs base  vs base      vs base  vs base    vs base        vs base
AppendFloat/Decimal                         ~        ~            ~   +0.30%          ~              ~
AppendFloat/Float                      -0.45%        ~       -2.20%        ~     -2.19%              ~
AppendFloat/Exp                        +0.12%        ~       +4.11%        ~          ~              ~
AppendFloat/NegExp                     +0.53%        ~            ~        ~          ~              ~
AppendFloat/LongExp                    +0.41%   -1.42%       +4.50%        ~          ~              ~
AppendFloat/Big                             ~   -1.25%       +3.69%        ~          ~              ~
AppendFloat/BinaryExp                  +0.38%   +1.68%            ~        ~     +2.65%         +0.97%
AppendFloat/32Integer                       ~        ~            ~        ~          ~              ~
AppendFloat/32ExactFraction                 ~        ~       -2.61%        ~          ~              ~
AppendFloat/32Point                    -0.41%        ~       -2.65%        ~          ~              ~
AppendFloat/32Exp                           ~        ~       +5.35%        ~     +1.44%         +0.39%
AppendFloat/32NegExp                   +0.30%        ~       +2.31%        ~          ~         +0.82%
AppendFloat/32Shortest                 +0.28%   -0.85%            ~        ~     -3.20%              ~
AppendFloat/32Fixed8Hard               -0.29%        ~            ~   -1.75%          ~         +4.30%
AppendFloat/32Fixed9Hard                    ~        ~            ~        ~          ~         +1.52%
AppendFloat/64Fixed1                   +0.61%   -2.03%            ~        ~          ~         +4.36%
AppendFloat/64Fixed2                        ~   -3.43%            ~        ~          ~         +1.03%
AppendFloat/64Fixed2.5                 +0.57%   -2.23%            ~        ~          ~         +2.66%
AppendFloat/64Fixed3                        ~   -1.64%            ~   +0.31%     +2.32%         +2.10%
AppendFloat/64Fixed4                   +0.15%   -2.11%            ~        ~     +1.48%         +1.58%
AppendFloat/64Fixed5Hard               +0.45%        ~       +1.58%        ~          ~         +1.73%
AppendFloat/64Fixed12                  -0.16%        ~       +1.63%   -1.23%     +3.93%         +2.42%
AppendFloat/64Fixed16                  -0.33%   -0.49%            ~        ~     +3.67%         +2.33%
AppendFloat/64Fixed12Hard              -0.58%        ~            ~        ~     +4.98%         +0.62%
AppendFloat/64Fixed17Hard              +0.27%   -0.94%            ~        ~     +2.07%         +1.79%
AppendFloat/64Fixed18Hard                   ~        ~            ~        ~          ~              ~
AppendFloat/64FixedF1                 -69.59%  -76.08%      -70.94%  -68.26%    -75.27%        -69.88%
AppendFloat/64FixedF2                 -76.28%  -81.82%      -76.95%  -77.34%    -83.53%        -80.04%
AppendFloat/64FixedF3                 -77.30%  -84.51%      -77.82%  -77.81%    -78.77%        -73.69%
AppendFloat/Slowpath64                      ~   -1.30%       +1.64%        ~     -2.66%         -0.44%
AppendFloat/SlowpathDenormal64         +0.11%   -1.69%            ~        ~     -2.90%              ~

host: linux-arm64
goos: linux
goarch: arm64
pkg: internal/strconv
cpu: unknown
                                 │ 1cc918cc725  │             b66c604f523              │
                                 │    sec/op    │    sec/op     vs base                │
AppendFloat/Decimal-8               60.22n ± 0%    60.21n ± 0%        ~ (p=0.416 n=20)
AppendFloat/Float-8                 88.93n ± 0%    88.53n ± 0%   -0.45% (p=0.000 n=20)
AppendFloat/Exp-8                   93.09n ± 0%    93.20n ± 0%   +0.12% (p=0.000 n=20)
AppendFloat/NegExp-8                93.06n ± 0%    93.56n ± 0%   +0.53% (p=0.000 n=20)
AppendFloat/LongExp-8               99.79n ± 0%   100.20n ± 0%   +0.41% (p=0.000 n=20)
AppendFloat/Big-8                   103.9n ± 0%    104.0n ± 0%        ~ (p=0.004 n=20)
AppendFloat/BinaryExp-8             47.34n ± 0%    47.52n ± 0%   +0.38% (p=0.000 n=20)
AppendFloat/32Integer-8             60.43n ± 0%    60.40n ± 0%        ~ (p=0.006 n=20)
AppendFloat/32ExactFraction-8       86.21n ± 0%    86.24n ± 0%        ~ (p=0.634 n=20)
AppendFloat/32Point-8               83.20n ± 0%    82.87n ± 0%   -0.41% (p=0.000 n=20)
AppendFloat/32Exp-8                 89.43n ± 0%    89.45n ± 0%        ~ (p=0.193 n=20)
AppendFloat/32NegExp-8              87.31n ± 0%    87.58n ± 0%   +0.30% (p=0.000 n=20)
AppendFloat/32Shortest-8            76.28n ± 0%    76.49n ± 0%   +0.28% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-8          52.44n ± 0%    52.29n ± 0%   -0.29% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-8          60.57n ± 0%    60.54n ± 0%        ~ (p=0.285 n=20)
AppendFloat/64Fixed1-8              46.27n ± 0%    46.55n ± 0%   +0.61% (p=0.000 n=20)
AppendFloat/64Fixed2-8              46.77n ± 0%    46.80n ± 0%        ~ (p=0.060 n=20)
AppendFloat/64Fixed2.5-8            43.70n ± 0%    43.95n ± 0%   +0.57% (p=0.000 n=20)
AppendFloat/64Fixed3-8              47.22n ± 0%    47.19n ± 0%        ~ (p=0.008 n=20)
AppendFloat/64Fixed4-8              44.07n ± 0%    44.13n ± 0%   +0.15% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-8          51.81n ± 0%    52.04n ± 0%   +0.45% (p=0.000 n=20)
AppendFloat/64Fixed12-8             78.41n ± 0%    78.29n ± 0%   -0.16% (p=0.000 n=20)
AppendFloat/64Fixed16-8             65.14n ± 0%    64.93n ± 0%   -0.33% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-8         62.12n ± 0%    61.76n ± 0%   -0.58% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-8         73.93n ± 0%    74.13n ± 0%   +0.27% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-8         4.285µ ± 0%    4.283µ ± 0%        ~ (p=0.039 n=20)
AppendFloat/64FixedF1-8            216.10n ± 0%    65.71n ± 0%  -69.59% (p=0.000 n=20)
AppendFloat/64FixedF2-8            227.70n ± 0%    54.02n ± 0%  -76.28% (p=0.000 n=20)
AppendFloat/64FixedF3-8            208.20n ± 1%    47.25n ± 0%  -77.30% (p=0.000 n=20)
AppendFloat/Slowpath64-8            97.40n ± 0%    97.45n ± 0%        ~ (p=0.018 n=20)
AppendFloat/SlowpathDenormal64-8    94.75n ± 0%    94.86n ± 0%   +0.11% (p=0.000 n=20)
geomean                             87.86n         76.99n       -12.37%

host: local
goos: darwin
cpu: Apple M3 Pro
                                  │ 1cc918cc725  │             b66c604f523             │
                                  │    sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-12               21.05n ± 1%   20.91n ± 1%        ~ (p=0.051 n=20)
AppendFloat/Float-12                 32.13n ± 0%   32.04n ± 1%        ~ (p=0.457 n=20)
AppendFloat/Exp-12                   31.84n ± 0%   31.72n ± 0%        ~ (p=0.151 n=20)
AppendFloat/NegExp-12                31.78n ± 1%   31.79n ± 1%        ~ (p=0.867 n=20)
AppendFloat/LongExp-12               33.70n ± 0%   33.22n ± 1%   -1.42% (p=0.000 n=20)
AppendFloat/Big-12                   35.52n ± 1%   35.07n ± 1%   -1.25% (p=0.000 n=20)
AppendFloat/BinaryExp-12             19.32n ± 1%   19.64n ± 0%   +1.68% (p=0.000 n=20)
AppendFloat/32Integer-12             21.32n ± 0%   21.18n ± 1%        ~ (p=0.025 n=20)
AppendFloat/32ExactFraction-12       30.88n ± 0%   31.07n ± 0%        ~ (p=0.087 n=20)
AppendFloat/32Point-12               30.88n ± 0%   30.95n ± 1%        ~ (p=0.250 n=20)
AppendFloat/32Exp-12                 31.57n ± 0%   31.67n ± 2%        ~ (p=0.126 n=20)
AppendFloat/32NegExp-12              30.50n ± 1%   30.76n ± 1%        ~ (p=0.087 n=20)
AppendFloat/32Shortest-12            27.14n ± 0%   26.91n ± 1%   -0.85% (p=0.001 n=20)
AppendFloat/32Fixed8Hard-12          17.11n ± 0%   17.08n ± 0%        ~ (p=0.027 n=20)
AppendFloat/32Fixed9Hard-12          19.16n ± 1%   19.31n ± 1%        ~ (p=0.062 n=20)
AppendFloat/64Fixed1-12              15.50n ± 0%   15.18n ± 1%   -2.03% (p=0.000 n=20)
AppendFloat/64Fixed2-12              15.46n ± 0%   14.93n ± 0%   -3.43% (p=0.000 n=20)
AppendFloat/64Fixed2.5-12            15.28n ± 0%   14.94n ± 1%   -2.23% (p=0.000 n=20)
AppendFloat/64Fixed3-12              15.58n ± 0%   15.32n ± 1%   -1.64% (p=0.000 n=20)
AppendFloat/64Fixed4-12              15.39n ± 0%   15.06n ± 1%   -2.11% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-12          18.00n ± 0%   18.07n ± 1%        ~ (p=0.011 n=20)
AppendFloat/64Fixed12-12             27.97n ± 8%   29.05n ± 3%        ~ (p=0.107 n=20)
AppendFloat/64Fixed16-12             21.48n ± 0%   21.38n ± 0%   -0.49% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-12         20.79n ± 1%   21.05n ± 2%        ~ (p=0.784 n=20)
AppendFloat/64Fixed17Hard-12         27.21n ± 1%   26.95n ± 1%   -0.94% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-12         2.166µ ± 1%   2.182µ ± 1%        ~ (p=0.031 n=20)
AppendFloat/64FixedF1-12            103.35n ± 0%   24.72n ± 0%  -76.08% (p=0.000 n=20)
AppendFloat/64FixedF2-12            114.30n ± 1%   20.78n ± 0%  -81.82% (p=0.000 n=20)
AppendFloat/64FixedF3-12            107.10n ± 0%   16.58n ± 0%  -84.51% (p=0.000 n=20)
AppendFloat/Slowpath64-12            32.01n ± 0%   31.59n ± 0%   -1.30% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-12    30.21n ± 0%   29.70n ± 0%   -1.69% (p=0.000 n=20)
geomean                              31.84n        27.00n       -15.20%

host: linux-amd64
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                                  │ 1cc918cc725  │             b66c604f523              │
                                  │    sec/op    │    sec/op     vs base                │
AppendFloat/Decimal-16               63.62n ± 1%    64.05n ± 1%        ~ (p=0.753 n=20)
AppendFloat/Float-16                 97.12n ± 1%    94.98n ± 1%   -2.20% (p=0.000 n=20)
AppendFloat/Exp-16                   98.12n ± 1%   102.15n ± 1%   +4.11% (p=0.000 n=20)
AppendFloat/NegExp-16                101.1n ± 1%    101.5n ± 1%        ~ (p=0.089 n=20)
AppendFloat/LongExp-16               104.5n ± 1%    109.2n ± 1%   +4.50% (p=0.000 n=20)
AppendFloat/Big-16                   108.5n ± 0%    112.5n ± 1%   +3.69% (p=0.000 n=20)
AppendFloat/BinaryExp-16             47.68n ± 1%    47.44n ± 1%        ~ (p=0.143 n=20)
AppendFloat/32Integer-16             63.77n ± 2%    63.45n ± 1%        ~ (p=0.015 n=20)
AppendFloat/32ExactFraction-16       97.69n ± 1%    95.14n ± 1%   -2.61% (p=0.000 n=20)
AppendFloat/32Point-16               92.17n ± 1%    89.72n ± 1%   -2.65% (p=0.000 n=20)
AppendFloat/32Exp-16                 95.63n ± 1%   100.75n ± 1%   +5.35% (p=0.000 n=20)
AppendFloat/32NegExp-16              94.53n ± 1%    96.72n ± 0%   +2.31% (p=0.000 n=20)
AppendFloat/32Shortest-16            86.43n ± 0%    86.95n ± 0%        ~ (p=0.010 n=20)
AppendFloat/32Fixed8Hard-16          57.75n ± 1%    57.95n ± 1%        ~ (p=0.098 n=20)
AppendFloat/32Fixed9Hard-16          66.56n ± 2%    66.97n ± 1%        ~ (p=0.380 n=20)
AppendFloat/64Fixed1-16              51.02n ± 1%    50.99n ± 1%        ~ (p=0.473 n=20)
AppendFloat/64Fixed2-16              50.94n ± 1%    51.01n ± 1%        ~ (p=0.136 n=20)
AppendFloat/64Fixed2.5-16            49.27n ± 1%    49.37n ± 1%        ~ (p=0.218 n=20)
AppendFloat/64Fixed3-16              51.85n ± 1%    52.55n ± 1%        ~ (p=0.045 n=20)
AppendFloat/64Fixed4-16              50.30n ± 1%    50.43n ± 1%        ~ (p=0.794 n=20)
AppendFloat/64Fixed5Hard-16          57.57n ± 1%    58.48n ± 1%   +1.58% (p=0.000 n=20)
AppendFloat/64Fixed12-16             82.67n ± 1%    84.02n ± 1%   +1.63% (p=0.000 n=20)
AppendFloat/64Fixed16-16             71.10n ± 1%    70.94n ± 1%        ~ (p=0.569 n=20)
AppendFloat/64Fixed12Hard-16         68.36n ± 1%    68.64n ± 1%        ~ (p=0.155 n=20)
AppendFloat/64Fixed17Hard-16         80.16n ± 1%    80.10n ± 1%        ~ (p=0.836 n=20)
AppendFloat/64Fixed18Hard-16         4.916µ ± 1%    4.919µ ± 1%        ~ (p=0.507 n=20)
AppendFloat/64FixedF1-16            239.75n ± 1%    69.67n ± 1%  -70.94% (p=0.000 n=20)
AppendFloat/64FixedF2-16            252.50n ± 1%    58.20n ± 1%  -76.95% (p=0.000 n=20)
AppendFloat/64FixedF3-16            238.00n ± 1%    52.79n ± 1%  -77.82% (p=0.000 n=20)
AppendFloat/Slowpath64-16            100.4n ± 1%    102.0n ± 1%   +1.64% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-16    97.92n ± 1%    98.01n ± 1%        ~ (p=0.304 n=20)
geomean                              95.58n         84.00n       -12.12%

host: s7
cpu: AMD Ryzen 9 7950X 16-Core Processor
                                  │ 1cc918cc725 │             b66c604f523             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-32              22.00n ± 0%   22.06n ± 0%   +0.30% (p=0.001 n=20)
AppendFloat/Float-32                34.83n ± 0%   34.76n ± 0%        ~ (p=0.159 n=20)
AppendFloat/Exp-32                  34.91n ± 0%   34.89n ± 0%        ~ (p=0.188 n=20)
AppendFloat/NegExp-32               35.24n ± 0%   35.32n ± 0%        ~ (p=0.026 n=20)
AppendFloat/LongExp-32              37.02n ± 0%   37.02n ± 0%        ~ (p=0.317 n=20)
AppendFloat/Big-32                  38.51n ± 0%   38.43n ± 0%        ~ (p=0.060 n=20)
AppendFloat/BinaryExp-32            17.57n ± 0%   17.59n ± 0%        ~ (p=0.278 n=20)
AppendFloat/32Integer-32            22.06n ± 0%   22.09n ± 0%        ~ (p=0.762 n=20)
AppendFloat/32ExactFraction-32      32.91n ± 0%   33.00n ± 0%        ~ (p=0.055 n=20)
AppendFloat/32Point-32              33.24n ± 0%   33.18n ± 0%        ~ (p=0.068 n=20)
AppendFloat/32Exp-32                34.50n ± 0%   34.55n ± 0%        ~ (p=0.030 n=20)
AppendFloat/32NegExp-32             33.53n ± 0%   33.61n ± 0%        ~ (p=0.045 n=20)
AppendFloat/32Shortest-32           30.10n ± 0%   30.10n ± 0%        ~ (p=0.931 n=20)
AppendFloat/32Fixed8Hard-32         22.89n ± 0%   22.49n ± 0%   -1.75% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-32         25.82n ± 0%   25.75n ± 1%        ~ (p=0.143 n=20)
AppendFloat/64Fixed1-32             18.80n ± 0%   18.70n ± 0%        ~ (p=0.004 n=20)
AppendFloat/64Fixed2-32             18.64n ± 1%   18.54n ± 0%        ~ (p=0.001 n=20)
AppendFloat/64Fixed2.5-32           17.89n ± 0%   17.81n ± 0%        ~ (p=0.001 n=20)
AppendFloat/64Fixed3-32             19.62n ± 0%   19.68n ± 0%   +0.31% (p=0.000 n=20)
AppendFloat/64Fixed4-32             18.64n ± 0%   18.82n ± 0%        ~ (p=0.010 n=20)
AppendFloat/64Fixed5Hard-32         21.62n ± 0%   21.57n ± 0%        ~ (p=0.058 n=20)
AppendFloat/64Fixed12-32            30.98n ± 1%   30.61n ± 1%   -1.23% (p=0.000 n=20)
AppendFloat/64Fixed16-32            26.89n ± 0%   27.08n ± 1%        ~ (p=0.003 n=20)
AppendFloat/64Fixed12Hard-32        26.03n ± 0%   26.20n ± 1%        ~ (p=0.344 n=20)
AppendFloat/64Fixed17Hard-32        30.03n ± 1%   29.72n ± 1%        ~ (p=0.001 n=20)
AppendFloat/64Fixed18Hard-32        1.824µ ± 0%   1.825µ ± 1%        ~ (p=0.567 n=20)
AppendFloat/64FixedF1-32            83.58n ± 1%   26.52n ± 0%  -68.26% (p=0.000 n=20)
AppendFloat/64FixedF2-32            89.68n ± 1%   20.32n ± 1%  -77.34% (p=0.000 n=20)
AppendFloat/64FixedF3-32            84.84n ± 0%   18.82n ± 0%  -77.81% (p=0.000 n=20)
AppendFloat/Slowpath64-32           35.55n ± 0%   35.61n ± 0%        ~ (p=0.394 n=20)
AppendFloat/SlowpathDenormal64-32   35.03n ± 0%   35.02n ± 0%        ~ (p=0.733 n=20)
geomean                             34.67n        30.31n       -12.56%

host: linux-386
goarch: 386
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                                  │ 1cc918cc725 │             b66c604f523             │
                                  │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-16              133.6n ± 1%   130.5n ± 1%        ~ (p=0.002 n=20)
AppendFloat/Float-16                242.3n ± 1%   237.0n ± 1%   -2.19% (p=0.000 n=20)
AppendFloat/Exp-16                  249.1n ± 3%   252.5n ± 1%        ~ (p=0.005 n=20)
AppendFloat/NegExp-16               248.7n ± 3%   253.8n ± 2%        ~ (p=0.006 n=20)
AppendFloat/LongExp-16              258.4n ± 2%   253.0n ± 6%        ~ (p=0.185 n=20)
AppendFloat/Big-16                  285.6n ± 1%   279.2n ± 5%        ~ (p=0.012 n=20)
AppendFloat/BinaryExp-16            89.47n ± 1%   91.85n ± 2%   +2.65% (p=0.000 n=20)
AppendFloat/32Integer-16            133.5n ± 1%   129.9n ± 1%        ~ (p=0.004 n=20)
AppendFloat/32ExactFraction-16      213.7n ± 1%   212.2n ± 2%        ~ (p=0.071 n=20)
AppendFloat/32Point-16              202.0n ± 0%   200.4n ± 1%        ~ (p=0.223 n=20)
AppendFloat/32Exp-16                236.4n ± 1%   239.8n ± 1%   +1.44% (p=0.000 n=20)
AppendFloat/32NegExp-16             212.5n ± 1%   211.9n ± 1%        ~ (p=0.995 n=20)
AppendFloat/32Shortest-16           200.3n ± 1%   193.9n ± 1%   -3.20% (p=0.000 n=20)
AppendFloat/32Fixed8Hard-16         136.0n ± 1%   133.2n ± 4%        ~ (p=0.323 n=20)
AppendFloat/32Fixed9Hard-16         155.6n ± 1%   156.7n ± 2%        ~ (p=0.022 n=20)
AppendFloat/64Fixed1-16             132.8n ± 1%   133.0n ± 3%        ~ (p=0.199 n=20)
AppendFloat/64Fixed2-16             128.9n ± 1%   129.7n ± 3%        ~ (p=0.018 n=20)
AppendFloat/64Fixed2.5-16           127.0n ± 1%   126.5n ± 3%        ~ (p=0.825 n=20)
AppendFloat/64Fixed3-16             127.3n ± 1%   130.3n ± 4%   +2.32% (p=0.001 n=20)
AppendFloat/64Fixed4-16             121.4n ± 1%   123.2n ± 2%   +1.48% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-16         136.2n ± 1%   136.2n ± 3%        ~ (p=0.256 n=20)
AppendFloat/64Fixed12-16            159.0n ± 1%   165.2n ± 2%   +3.93% (p=0.000 n=20)
AppendFloat/64Fixed16-16            151.4n ± 0%   156.9n ± 1%   +3.67% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-16        146.5n ± 1%   153.8n ± 1%   +4.98% (p=0.000 n=20)
AppendFloat/64Fixed17Hard-16        166.3n ± 1%   169.8n ± 1%   +2.07% (p=0.001 n=20)
AppendFloat/64Fixed18Hard-16        10.59µ ± 2%   10.60µ ± 0%        ~ (p=0.499 n=20)
AppendFloat/64FixedF1-16            614.4n ± 1%   152.0n ± 1%  -75.27% (p=0.000 n=20)
AppendFloat/64FixedF2-16            845.0n ± 0%   139.1n ± 1%  -83.53% (p=0.000 n=20)
AppendFloat/64FixedF3-16            608.8n ± 1%   129.3n ± 1%  -78.77% (p=0.000 n=20)
AppendFloat/Slowpath64-16           251.7n ± 1%   245.0n ± 1%   -2.66% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-16   248.4n ± 1%   241.2n ± 1%   -2.90% (p=0.000 n=20)
geomean                             225.7n        193.8n       -14.14%

host: s7:GOARCH=386
cpu: AMD Ryzen 9 7950X 16-Core Processor
                                  │ 1cc918cc725  │             b66c604f523             │
                                  │    sec/op    │   sec/op     vs base                │
AppendFloat/Decimal-32               41.88n ± 0%   42.02n ± 1%        ~ (p=0.004 n=20)
AppendFloat/Float-32                 71.05n ± 0%   71.24n ± 0%        ~ (p=0.044 n=20)
AppendFloat/Exp-32                   74.91n ± 1%   74.80n ± 0%        ~ (p=0.433 n=20)
AppendFloat/NegExp-32                74.10n ± 0%   74.20n ± 0%        ~ (p=0.867 n=20)
AppendFloat/LongExp-32               75.73n ± 0%   75.84n ± 0%        ~ (p=0.147 n=20)
AppendFloat/Big-32                   82.47n ± 0%   82.36n ± 0%        ~ (p=0.490 n=20)
AppendFloat/BinaryExp-32             32.31n ± 1%   32.62n ± 0%   +0.97% (p=0.000 n=20)
AppendFloat/32Integer-32             41.38n ± 1%   41.40n ± 1%        ~ (p=0.106 n=20)
AppendFloat/32ExactFraction-32       62.72n ± 0%   62.92n ± 0%        ~ (p=0.009 n=20)
AppendFloat/32Point-32               60.36n ± 0%   60.33n ± 0%        ~ (p=0.050 n=20)
AppendFloat/32Exp-32                 68.97n ± 0%   69.24n ± 0%   +0.39% (p=0.000 n=20)
AppendFloat/32NegExp-32              62.63n ± 0%   63.15n ± 0%   +0.82% (p=0.000 n=20)
AppendFloat/32Shortest-32            58.76n ± 0%   58.87n ± 0%        ~ (p=0.053 n=20)
AppendFloat/32Fixed8Hard-32          41.67n ± 1%   43.46n ± 1%   +4.30% (p=0.000 n=20)
AppendFloat/32Fixed9Hard-32          49.78n ± 1%   50.53n ± 1%   +1.52% (p=0.000 n=20)
AppendFloat/64Fixed1-32              41.15n ± 0%   42.95n ± 1%   +4.36% (p=0.000 n=20)
AppendFloat/64Fixed2-32              40.83n ± 1%   41.24n ± 1%   +1.03% (p=0.000 n=20)
AppendFloat/64Fixed2.5-32            39.42n ± 0%   40.47n ± 1%   +2.66% (p=0.000 n=20)
AppendFloat/64Fixed3-32              40.73n ± 1%   41.58n ± 1%   +2.10% (p=0.000 n=20)
AppendFloat/64Fixed4-32              38.68n ± 0%   39.29n ± 0%   +1.58% (p=0.000 n=20)
AppendFloat/64Fixed5Hard-32          42.88n ± 1%   43.62n ± 1%   +1.73% (p=0.000 n=20)
AppendFloat/64Fixed12-32             51.67n ± 1%   52.92n ± 1%   +2.42% (p=0.000 n=20)
AppendFloat/64Fixed16-32             49.15n ± 0%   50.30n ± 0%   +2.33% (p=0.000 n=20)
AppendFloat/64Fixed12Hard-32         48.51n ± 0%   48.81n ± 0%   +0.62% (p=0.001 n=20)
AppendFloat/64Fixed17Hard-32         54.62n ± 1%   55.60n ± 1%   +1.79% (p=0.000 n=20)
AppendFloat/64Fixed18Hard-32         3.979µ ± 1%   3.980µ ± 1%        ~ (p=0.569 n=20)
AppendFloat/64FixedF1-32            165.90n ± 1%   49.97n ± 0%  -69.88% (p=0.000 n=20)
AppendFloat/64FixedF2-32            225.50n ± 0%   45.02n ± 1%  -80.04% (p=0.000 n=20)
AppendFloat/64FixedF3-32            160.20n ± 1%   42.16n ± 1%  -73.69% (p=0.000 n=20)
AppendFloat/Slowpath64-32            75.55n ± 0%   75.23n ± 0%   -0.44% (p=0.000 n=20)
AppendFloat/SlowpathDenormal64-32    74.84n ± 0%   75.00n ± 0%        ~ (p=0.268 n=20)
geomean                              69.22n        61.13n       -11.69%

Change-Id: I722d2e2621e74e32cb3fc34a2df5b16cc595715c
Reviewed-on: https://go-review.googlesource.com/c/go/+/717183
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Alan Donovan <[email protected]>
Now that the bootstrap compiler is 1.24, it's no longer needed.

Change-Id: I9b3d6b7176af10fbc580173d50130120b542e7f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/717060
Reviewed-by: David Chase <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
Auto-Submit: Ian Lance Taylor <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Propagate "unread" across OpMoves. If the addr of this auto is only used
by an OpMove as its source arg, and the OpMove's target arg is the addr
of another auto. If the 2nd auto can be eliminated, this one can also be
eliminated.

This CL eliminates unnecessary memory copies and makes the frame smaller
in the following code snippet:

func contains(m map[string][16]int, k string) bool {
        _, ok := m[k]
        return ok
}

These are the benchmark results followed by the benchmark code:

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
                │   old.txt   │              new.txt                │
                │   sec/op    │   sec/op     vs base                │
Map1Access2Ok-8   9.582n ± 2%   9.226n ± 0%   -3.72% (p=0.000 n=20)
Map2Access2Ok-8   13.79n ± 1%   10.24n ± 1%  -25.77% (p=0.000 n=20)
Map3Access2Ok-8   68.68n ± 1%   12.65n ± 1%  -81.58% (p=0.000 n=20)

package main_test

import "testing"

var (
        m1 = map[int]int{}
        m2 = map[int][16]int{}
        m3 = map[int][256]int{}
)

func init() {
        for i := range 1000 {
                m1[i] = i
                m2[i] = [16]int{15:i}
                m3[i] = [256]int{255:i}
        }
}

func BenchmarkMap1Access2Ok(b *testing.B) {
        for i := range b.N {
                _, ok := m1[i%1000]
                if !ok {
                        b.Errorf("%d not found", i)
                }
        }
}

func BenchmarkMap2Access2Ok(b *testing.B) {
        for i := range b.N {
                _, ok := m2[i%1000]
                if !ok {
                        b.Errorf("%d not found", i)
                }
        }
}

func BenchmarkMap3Access2Ok(b *testing.B) {
        for i := range b.N {
                _, ok := m3[i%1000]
                if !ok {
                        b.Errorf("%d not found", i)
                }
        }
}

Fixes golang#75398

Change-Id: If75e9caaa50d460efc31a94565b9ba28c8158771
Reviewed-on: https://go-review.googlesource.com/c/go/+/702875
Reviewed-by: Keith Randall <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Keith Randall <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
Change-Id: I9c5a8f8a031e368bda312c830dc266f5986e8b1a
GitHub-Last-Rev: 23145e8
GitHub-Pull-Request: golang#76160
Reviewed-on: https://go-review.googlesource.com/c/go/+/717341
Reviewed-by: Keith Randall <[email protected]>
Auto-Submit: Keith Randall <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
We don't need to check that the bit patterns of the constants
match, it is sufficient to just check the constant is equal to the
given value.

While we're here also change the FCLASSD rules to use a bit pattern
for the mask. I think this improves readability, particularly as
more uses of FCLASSD get added (e.g. CL 717560).

These changes should not affect codegen.

Change-Id: I92a6338dc71e6a71e04306f67d7d86016c6e9c47
Reviewed-on: https://go-review.googlesource.com/c/go/+/717580
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Keith Randall <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
@austinderek austinderek force-pushed the master branch 6 times, most recently from 0e1bd8b to c37e118 Compare November 5, 2025 06:02
@austinderek austinderek force-pushed the master branch 22 times, most recently from 9f3a108 to c37e118 Compare November 5, 2025 17:03
@austinderek austinderek closed this Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.