Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: large literal maps excessively contribute to binary size #71937

Open
tdewolff opened this issue Feb 24, 2025 · 5 comments
Open

cmd/compile: large literal maps excessively contribute to binary size #71937

tdewolff opened this issue Feb 24, 2025 · 5 comments
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime.

Comments

@tdewolff
Copy link

Go version

go version go1.24.0 linux/amd64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/taco/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/taco/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2465725287=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/taco/go/src/github.com/tdewolff/crm/go.mod'
GOMODCACHE='/home/taco/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/taco/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/taco/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.0'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

Compile a program with dependencies with large literal maps (global or not)

What did you see happen?

A literal map with ~2000 entries inflated the final binary size with ~525kB

What did you expect to see?

Assuming that a gob encoded version of the map is an efficient representation of the map, I assume that the binary should inflate by approximately that much. The gob encoded version of that map was 52kB, or a 1/10th of what actually happens.

What is the reason a map cannot be stored literally in the final binary instead of initializing each value separately and causing a huge (implicit) init() function? I'm sure there's a good reason, but is there room for improvement? Besides, would inserting a large number of entries "at once" be algorithmically faster than inserting one by one (thinking about parallels with priority queues / heaps)?

See #20095 for a similar issue. Also related partly to #6853. May fix or improve #19751.

See yuin/goldmark#469 (comment) for a case study.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 24, 2025
@randall77
Copy link
Contributor

Maps have a hash seed that is chosen at startup dynamically (to avoid collision attacks). So the layout of a map changes from run to run; we cannot lay it out statically.

If the large literal map has all constant entries, it should be loaded from a []struct{key;value} array. That's about as compact as we could get (without a decompression step).
If the entries are not constant, then it is initialized with code (a bunch of m[k] = v assignments), which uses substantially more binary space.

Can you post your example (or a shrunken version, say ~100 representative entries)? It would be interesting to see what case you're hitting above.

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Feb 24, 2025
@adonovan
Copy link
Member

adonovan commented Feb 24, 2025

Even large arrays can contribute to large binary sizes. For example, this 104 entry table in the edge package adds 120 bytes of ARM code per line. 5KB of source generates 13KB of code.

The total information content of each row is a triple (index, type argument, string). The indices are worth zero bits, because they are sequential; the type argument is a choice among 55, and the strings are a choice among 47 values, so a total of 12 bits. If it wasn't important for each line to have a unique breakpoint, the compiler could reduce the entire thing to a loop over a compact table of about 200 bytes with two side tables of 55 reflect.Types and 47 strings respectively, for a total of 1.8KB.

var fieldInfos = [...]fieldInfo{
	Invalid:               {},
	ArrayType_Elt:         info[*ast.ArrayType]("Elt"),
	ArrayType_Len:         info[*ast.ArrayType]("Len"),
	AssignStmt_Lhs:        info[*ast.AssignStmt]("Lhs"),
	AssignStmt_Rhs:        info[*ast.AssignStmt]("Rhs"),
	BinaryExpr_X:          info[*ast.BinaryExpr]("X"),
...
	0x0128 00296 	L0193	PCDATA	$0, $-1
	0x0128 00296 	L0193	MOVD	$golang.org/x/tools/internal/astutil/edge..dict.info[*go/ast.AssignStmt](SB), R0
	0x0130 00304 	L0193	MOVD	$go:string."Lhs"(SB), R1
	0x0138 00312 	L0193	MOVD	$3, R2
	0x013c 00316 	L0193	CALL	golang.org/x/tools/internal/astutil/edge.info[go.shape.*uint8](SB)
	0x0140 00320 	L0193	MOVD	R0, golang.org/x/tools/internal/astutil/edge..autotmp_3-56(SP)
	0x0144 00324 	L0193	STP	(R1, R2), golang.org/x/tools/internal/astutil/edge..autotmp_3-48(SP)
	0x0148 00328 	L0193	STP	(R3, R4), golang.org/x/tools/internal/astutil/edge..autotmp_3-32(SP)
	0x014c 00332 	L0193	STP	(R5, R6), golang.org/x/tools/internal/astutil/edge..autotmp_3-16(SP)
	0x0150 00336 	L0193	PCDATA	$0, $-3
	0x0150 00336 	L0193	MOVWU	runtime.writeBarrier(SB), R3
	0x0158 00344 	L0193	PCDATA	$0, $-1
	0x0158 00344 	L0193	PCDATA	$0, $-2
	0x0158 00344 	L0193	CBZW	R3, 372
	0x015c 00348 	L0193	MOVD	$type:golang.org/x/tools/internal/astutil/edge.fieldInfo(SB), R0
	0x0164 00356 	L0193	MOVD	$golang.org/x/tools/internal/astutil/edge.fieldInfos+168(SB), R1
	0x016c 00364 	L0193	MOVD	$golang.org/x/tools/internal/astutil/edge..autotmp_3-56(SP), R2
	0x0170 00368 	L0193	CALL	runtime.wbMove(SB)
	0x0174 00372 	L0193	LDP	golang.org/x/tools/internal/astutil/edge..autotmp_3-56(SP), (R3, R4)
	0x0178 00376 	L0193	LDP	golang.org/x/tools/internal/astutil/edge..autotmp_3-40(SP), (R5, R6)
	0x017c 00380 	L0193	LDP	golang.org/x/tools/internal/astutil/edge..autotmp_3-24(SP), (R7, R8)
	0x0180 00384 	L0193	STP	(R3, R4), golang.org/x/tools/internal/astutil/edge.fieldInfos+168(SB)
	0x018c 00396 	L0193	STP	(R5, R6), golang.org/x/tools/internal/astutil/edge.fieldInfos+184(SB)
	0x0198 00408 	L0193	STP	(R7, R8), golang.org/x/tools/internal/astutil/edge.fieldInfos+200(SB)
	0x01a4 00420 	L0193	MOVD	golang.org/x/tools/internal/astutil/edge..autotmp_3-8(SP), R3
	0x01a8 00424 	L0193	MOVD	R3, golang.org/x/tools/internal/astutil/edge.fieldInfos+216(SB)
...

@randall77
Copy link
Contributor

Yes, if each entry (in a map, or a slice) involves a function call to get the entry's contents, then it is going to be a lot of generated code.
(In Alan's example, it would be more compact to make a slice of (idx,func,string) triples (which are constants and the compiler can put it in a data section) and put a loop in an init that calls the function with the string argument and writes the result to the entry. I don't think the compiler can do that transformation automatically.)

@adonovan
Copy link
Member

I don't think the compiler can do that transformation automatically.

Yeah, I meant a fantasy compiler could implement a loop "un-unrolling" optimization when it sees a sequence of statements each of the form table[i] = f(k1, k2), and build the table of k1 and k2. Of course, you would lose the ability to step through the sequence, and the backtraces would suffer too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime.
Projects
None yet
Development

No branches or pull requests

5 participants