Skip to content

Compile time+memory regression between 1.49.0 and 1.50.0 #84873

Open
@olix0r

Description

@olix0r

Our compile times and memory footprint regressed substantially between Rust 1.49.0 and 1.50.0.

This regression is a hard blocker for us to be able to upgrade our project from 1.49.0. As this is a severe issue for us, please let us know if there's anything we can do to provide more diagnostics, test changes, etc.

This may be related to other recent issues that describe similar behavior:

Code

This regression is observed on recent versions of the Linkerd proxy. It's known that the proxy can manifest large type signatures, so builds disable debug symbols by default

This regression is not obvious with other, smaller projects that we maintain, so I'm unable to provide a smaller repro.

Version it worked on

With Rust 1.49.0, the binary compiles in a little over two minutes, using a little over 1GB of memory:

cargo clean && /usr/bin/time -v cargo +1.49.0 build -p linkerd2-proxy

...
    Finished dev [unoptimized] target(s) in 2m 12s
        Command being timed: "cargo +1.49.0 build -p linkerd2-proxy"
        User time (seconds): 326.00
        System time (seconds): 14.55
        Percent of CPU this job got: 256%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:12.78
...
        Maximum resident set size (kbytes): 1308088
...
        Exit status: 0

Version with regression

Using Rust 1.50.0, rustc runs for about 40 minutes before exhausting the system's memory:

cargo clean && /usr/bin/time -v cargo +1.50.0 build -p linkerd2-proxy

  process didn't exit successfully: `rustc --crate-name linkerd2_proxy --edition=2018 linkerd2-proxy/src/main.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type bin --emit=dep-info,link -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="multicore"' --cfg 'feature="num_cpus"' -C metadata=c2caa01c24c285f7 -C extra-filename=-c2caa01c24c285f7 --out-dir /home/ver/b/linkerd2-proxy/target/debug/deps -C incremental=/home/ver/b/linkerd2-proxy/target/debug/incremental -L dependency=/home/ver/b/linkerd2-proxy/target/debug/deps --extern futures=/home/ver/b/linkerd2-proxy/target/debug/deps/libfutures-01f6240a9e4201d3.rlib --extern linkerd_app=/home/ver/b/linkerd2-proxy/target/debug/deps/liblinkerd_app-dd18fcabeef913ae.rlib --extern linkerd_signal=/home/ver/b/linkerd2-proxy/target/debug/deps/liblinkerd_signal-14263c92fcb0fc0b.rlib --extern num_cpus=/home/ver/b/linkerd2-proxy/target/debug/deps/libnum_cpus-74ee144d51bc047e.rlib --extern tokio=/home/ver/b/linkerd2-proxy/target/debug/deps/libtokio-de277ade05efd960.rlib --extern tracing=/home/ver/b/linkerd2-proxy/target/debug/deps/libtracing-03fd6826b817dabe.rlib -L native=/home/ver/b/linkerd2-proxy/target/debug/build/ring-a8b802a4aa425398/out` (signal: 9, SIGKILL: kill)
Command exited with non-zero status 101
	Command being timed: "cargo +1.50.0 build -p linkerd2-proxy"
	User time (seconds): 2986.18
	System time (seconds): 101.73
	Percent of CPU this job got: 133%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 38:29.28
...
	Maximum resident set size (kbytes): 62953524
...
	Exit status: 101

rustc --version --verbose:

rustc 1.50.0 (cb75ad5db 2021-02-10)
binary: rustc
commit-hash: cb75ad5db02783e8b0222fee363c5f63f7e2cf5b
commit-date: 2021-02-10
host: x86_64-unknown-linux-gnu
release: 1.50.0

We see similar behavior with more recent versions of Rust as well, including 1.51.0 and nightly (cargo 1.53.0-nightly (0ed318d18 2021-04-23)) as well.

cc @hawkw, who can provide some more details. I believe we've tested with lto=off without any changes in behavior.

@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged

Activity

added
C-bugCategory: This is a bug.
regression-untriagedUntriaged performance or correctness regression.
on May 3, 2021
added
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.
and removed
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
regression-untriagedUntriaged performance or correctness regression.
on May 3, 2021
jyn514

jyn514 commented on May 3, 2021

@jyn514
Member

@olix0r can you paste the output of cargo rustc -- -Z time-passes? It should have some output before it hangs.

added
I-compilememIssue: Problems and improvements with respect to memory usage during compilation.
I-compiletimeIssue: Problems and improvements with respect to compile times.
on May 3, 2021
jyn514

jyn514 commented on May 3, 2021

@jyn514
Member

If you can come up with a smaller example than "linkerd" that would also be helpful, but it will be more difficult than just running time-passes.

added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on May 3, 2021
olix0r

olix0r commented on May 3, 2021

@olix0r
Author
error: could not compile `linkerd2-proxy`

Caused by:
  process didn't exit successfully: `rustc --crate-name linkerd2_proxy --edition=2018 linkerd2-proxy/src/main.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type bin --emit=dep-info,link -C embed-bitcode=no -Z time-passes --cfg 'feature="default"' --cfg 'feature="multicore"' --cfg 'feature="num_cpus"' -C metadata=4b636c9abf0080e5 -C extra-filename=-4b636c9abf0080e5 --out-dir /home/ver/b/linkerd2-proxy/target/debug/deps -C incremental=/home/ver/b/linkerd2-proxy/target/debug/incremental -L dependency=/home/ver/b/linkerd2-proxy/target/debug/deps --extern futures=/home/ver/b/linkerd2-proxy/target/debug/deps/libfutures-ba905e7fd290ac0d.rlib --extern linkerd_app=/home/ver/b/linkerd2-proxy/target/debug/deps/liblinkerd_app-413b736e1b748ce1.rlib --extern linkerd_signal=/home/ver/b/linkerd2-proxy/target/debug/deps/liblinkerd_signal-e33579c6c7162bd5.rlib --extern num_cpus=/home/ver/b/linkerd2-proxy/target/debug/deps/libnum_cpus-372d7745cb8ee5a2.rlib --extern tokio=/home/ver/b/linkerd2-proxy/target/debug/deps/libtokio-6439a3cdc075b537.rlib --extern tracing=/home/ver/b/linkerd2-proxy/target/debug/deps/libtracing-9cda1e0af1393c64.rlib -L native=/home/ver/b/linkerd2-proxy/target/debug/build/ring-de36c98917ce89f3/out` (signal: 9, SIGKILL: kill)
Command exited with non-zero status 101
	Command being timed: "cargo +nightly rustc -- -Z time-passes"
	User time (seconds): 1210.51
	System time (seconds): 102.65
	Percent of CPU this job got: 128%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 16:58.61
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 62990612
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 5127
	Minor (reclaiming a frame) page faults: 99760499
	Voluntary context switches: 38121
	Involuntary context switches: 30180
	Swaps: 0
	File system inputs: 595200
	File system outputs: 2482104
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 101
time-passes
time:   0.000; rss:   55MB ->   55MB (   +0MB)	parse_crate
time:   0.000; rss:   55MB ->   55MB (   +0MB)	attributes_injection
time:   0.000; rss:   55MB ->   55MB (   +0MB)	incr_comp_prepare_session_directory
time:   0.000; rss:   55MB ->   55MB (   +0MB)	incr_comp_garbage_collect_session_directories
time:   0.000; rss:   55MB ->   58MB (   +2MB)	recursion_limit
time:   0.000; rss:   58MB ->   58MB (   +0MB)	plugin_loading
time:   0.000; rss:   58MB ->   58MB (   +0MB)	plugin_registration
time:   0.000; rss:   58MB ->   58MB (   +0MB)	crate_injection
time:   0.063; rss:   58MB ->  139MB (  +81MB)	expand_crate
time:   0.000; rss:  139MB ->  139MB (   +0MB)	check_unused_macros
time:   0.063; rss:   58MB ->  139MB (  +81MB)	macro_expand_crate
time:   0.000; rss:  139MB ->  139MB (   +0MB)	maybe_building_test_harness
time:   0.000; rss:  139MB ->  139MB (   +0MB)	AST_validation
time:   0.000; rss:  139MB ->  139MB (   +0MB)	maybe_create_a_macro_crate
time:   0.000; rss:  139MB ->  139MB (   +0MB)	finalize_imports
time:   0.000; rss:  139MB ->  139MB (   +0MB)	finalize_macro_resolutions
time:   0.002; rss:  139MB ->  142MB (   +3MB)	late_resolve_crate
time:   0.000; rss:  142MB ->  142MB (   +0MB)	resolve_check_unused
time:   0.000; rss:  142MB ->  142MB (   +0MB)	resolve_report_errors
time:   0.000; rss:  142MB ->  142MB (   +0MB)	resolve_postprocess
time:   0.003; rss:  139MB ->  142MB (   +3MB)	resolve_crate
time:   0.000; rss:  142MB ->  142MB (   +0MB)	complete_gated_feature_checking
time:   0.066; rss:   58MB ->  142MB (  +84MB)	configure_and_expand
time:   0.000; rss:  142MB ->  142MB (   +0MB)	prepare_outputs
time:   0.000; rss:  142MB ->  142MB (   +0MB)	blocked_on_dep_graph_loading
time:   0.002; rss:  142MB ->  144MB (   +3MB)	hir_lowering
time:   0.000; rss:  144MB ->  144MB (   +0MB)	early_lint_checks
time:   0.000; rss:  144MB ->  144MB (   +0MB)	setup_global_ctxt
time:   0.000; rss:  144MB ->  144MB (   +0MB)	create_global_ctxt
time:   0.000; rss:  148MB ->  148MB (   +0MB)	looking_for_entry_point
time:   0.000; rss:  148MB ->  148MB (   +0MB)	looking_for_plugin_registrar
time:   0.000; rss:  148MB ->  148MB (   +0MB)	looking_for_derive_registrar
time:   0.002; rss:  148MB ->  151MB (   +3MB)	misc_checking_1
time:   0.001; rss:  151MB ->  155MB (   +4MB)	type_collecting
time:   0.000; rss:  155MB ->  155MB (   +0MB)	impl_wf_inference
time:   0.000; rss:  155MB ->  155MB (   +0MB)	unsafety_checking
time:   0.000; rss:  155MB ->  155MB (   +0MB)	orphan_checking
time:   0.000; rss:  155MB ->  155MB (   +0MB)	coherence_checking
time:   0.002; rss:  155MB ->  159MB (   +4MB)	wf_checking
time:   0.026; rss:  159MB ->  197MB (  +38MB)	item_types_checking
time:   0.074; rss:  197MB ->  228MB (  +31MB)	item_bodies_checking
time:   0.102; rss:  151MB ->  228MB (  +77MB)	type_check_crate
time:   0.002; rss:  228MB ->  229MB (   +1MB)	match_checking
time:   0.001; rss:  229MB ->  230MB (   +1MB)	liveness_and_intrinsic_checking
time:   0.004; rss:  228MB ->  230MB (   +2MB)	misc_checking_2
time:   0.052; rss:  230MB ->  248MB (  +18MB)	MIR_borrow_checking
time:   0.001; rss:  248MB ->  248MB (   +0MB)	MIR_effect_checking
time:   0.000; rss:  248MB ->  248MB (   +0MB)	layout_testing
time:   0.000; rss:  248MB ->  248MB (   +0MB)	death_checking
time:   0.000; rss:  248MB ->  248MB (   +0MB)	unused_lib_feature_checking
time:   0.005; rss:  248MB ->  248MB (   +0MB)	crate_lints
time:   0.001; rss:  248MB ->  248MB (   +0MB)	module_lints
time:   0.006; rss:  248MB ->  248MB (   +0MB)	lint_checking
time:   0.002; rss:  248MB ->  249MB (   +1MB)	privacy_checking_modules
time:   0.009; rss:  248MB ->  249MB (   +1MB)	misc_checking_3
time:   0.000; rss:  249MB ->  249MB (   +0MB)	monomorphization_collector_root_collections
time:  63.851; rss:  249MB ->  878MB ( +629MB)	monomorphization_collector_graph_walk
time: 385.658; rss:  878MB -> 34473MB (+33594MB)	partition_and_assert_distinct_symbols
time:   0.000; rss: 34462MB -> 34462MB (   +0MB)	write_allocator_module
time:   0.000; rss: 34462MB -> 34462MB (   +0MB)	find_cgu_reuse
time:   0.000; rss: 34475MB -> 34475MB (   +0MB)	LLVM_module_optimize_function_passes(23w1nf295xb6m3dz)
time:   0.000; rss: 34481MB -> 34481MB (   +0MB)	LLVM_module_optimize_function_passes(3wiubnia116xo13k)
time:   0.000; rss: 34481MB -> 34481MB (   +0MB)	LLVM_module_optimize_module_passes(3wiubnia116xo13k)
time:   0.027; rss: 34475MB -> 34485MB (  +10MB)	LLVM_module_optimize_module_passes(23w1nf295xb6m3dz)
time:   0.000; rss: 35405MB -> 35405MB (   +0MB)	LLVM_module_optimize_function_passes(6nf2a3p991xv6ss)
time:   0.000; rss: 35405MB -> 35405MB (   +0MB)	LLVM_module_optimize_function_passes(qnwxkbovt0otq4i)
time:   0.000; rss: 35405MB -> 35405MB (   +0MB)	LLVM_module_optimize_module_passes(qnwxkbovt0otq4i)
time:   0.028; rss: 35405MB -> 35411MB (   +6MB)	LLVM_module_optimize_module_passes(6nf2a3p991xv6ss)
time:   0.000; rss: 35420MB -> 35420MB (   +0MB)	LLVM_module_optimize_function_passes(3l9ye29vj5xd30gg)
time:   0.022; rss: 35420MB -> 35433MB (  +13MB)	LLVM_module_optimize_module_passes(3l9ye29vj5xd30gg)
time:   0.000; rss: 45480MB -> 45480MB (   +0MB)	LLVM_module_optimize_function_passes(47zf6aqohnqlwwkn)
time:   0.000; rss: 45480MB -> 45480MB (   +0MB)	LLVM_module_optimize_module_passes(47zf6aqohnqlwwkn)
time:   0.001; rss: 61521MB -> 61522MB (   +0MB)	LLVM_module_optimize_function_passes(2shzcrj12b6yuzi2)
time:   0.027; rss: 61522MB -> 61529MB (   +7MB)	LLVM_module_optimize_module_passes(2shzcrj12b6yuzi2)
time:   0.000; rss: 62206MB -> 62206MB (   +0MB)	LLVM_module_optimize_function_passes(22god1f9uaf09skc)
time:   0.000; rss: 62206MB -> 62206MB (   +0MB)	LLVM_module_optimize_module_passes(22god1f9uaf09skc)
time:   0.000; rss: 62231MB -> 62231MB (   +0MB)	LLVM_module_optimize_function_passes(4k1c83m91mx8hb0j)
time:   0.000; rss: 62246MB -> 62246MB (   +0MB)	LLVM_module_optimize_function_passes(5p5qsa0r7t1m74p)
time:   0.000; rss: 62246MB -> 62247MB (   +1MB)	LLVM_module_optimize_module_passes(5p5qsa0r7t1m74p)
time:   0.017; rss: 62231MB -> 62248MB (  +17MB)	LLVM_module_optimize_module_passes(4k1c83m91mx8hb0j)
time:   0.001; rss: 62089MB -> 62089MB (   +0MB)	LLVM_module_optimize_function_passes(47yf53g8fyfg1095)
time:   0.029; rss: 62090MB -> 62101MB (  +11MB)	LLVM_module_optimize_module_passes(47yf53g8fyfg1095)
time:   0.000; rss: 62109MB -> 62109MB (   +0MB)	LLVM_module_optimize_function_passes(3io72qv49rwp317)
time:   0.000; rss: 62109MB -> 62109MB (   +0MB)	LLVM_module_optimize_module_passes(3io72qv49rwp317)
time:   0.000; rss: 62547MB -> 62547MB (   +1MB)	LLVM_module_optimize_function_passes(1s1n98zkzp6om5cq)
time:   0.004; rss: 62547MB -> 62558MB (  +11MB)	LLVM_module_optimize_module_passes(1s1n98zkzp6om5cq)
time:   0.000; rss: 62640MB -> 62640MB (   +0MB)	LLVM_module_optimize_function_passes(3qs0oaz67heipbqh)
time:   0.000; rss: 62640MB -> 62640MB (   +0MB)	LLVM_module_optimize_module_passes(3qs0oaz67heipbqh)
olix0r

olix0r commented on May 3, 2021

@olix0r
Author

We're going to try to see if we can avoid this with boxing, which may help us identify a smaller repro, but this may take some time...

jyn514

jyn514 commented on May 3, 2021

@jyn514
Member

Wow, that is a lot of memory in LLVM and a lot of time in partition_and_assert_distinct_symbols. I wouldn't expect that to be so expensive:

let (codegen_units, _) = tcx.sess.time("partition_and_assert_distinct_symbols", || {
sync::join(
|| {
&*tcx.arena.alloc_from_iter(partition(
tcx,
&mut items.iter().cloned(),
tcx.sess.codegen_units(),
&inlining_map,
))
},
|| assert_symbols_are_distinct(tcx, items.iter()),
)
});

Not sure who to ask about that - maybe @wesleywiser has ideas what's going on?

28 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

C-bugCategory: This is a bug.I-compilememIssue: Problems and improvements with respect to memory usage during compilation.I-compiletimeIssue: Problems and improvements with respect to compile times.P-highHigh priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @pnkfelix@olix0r@lqd@wesleywiser@audunska

      Issue actions

        Compile time+memory regression between 1.49.0 and 1.50.0 · Issue #84873 · rust-lang/rust