trace-core: Dispatchers unset themselves #1033

hawkw · 2019-04-04T00:20:41Z

This branch changes dispatcher::get_default to unset the thread's
current dispatcher while the reference to it is held by the closure.
This prevents infinite loops if the subscriber calls code paths which
emit events or construct spans.

Note that this also means that nested calls to get_default inside of a
get_default closure will recieve a None dispatcher rather than the
"actual" dispatcher. However, it was necessary to unset the default in
get_default rather than in dispatch methods such as Dispatch::enter,
as when those functions are called, the current state has already been
borrowed.

This introduces a little more overhead into span creation and events.

Before:


     Running target/release/deps/no_subscriber-09f0b39a00181a97

running 5 tests
test bench_1_atomic_load              ... bench:           0 ns/iter (+/- 0)
test bench_costly_field_no_subscriber ... bench:           2 ns/iter (+/- 0)
test bench_log_no_logger              ... bench:           0 ns/iter (+/- 0)
test bench_no_span_no_subscriber      ... bench:           0 ns/iter (+/- 0)
test bench_span_no_subscriber         ... bench:           2 ns/iter (+/- 0)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out

     Running target/release/deps/subscriber-992535c5d1c348ad

running 5 tests
test enter_span              ... bench:           3 ns/iter (+/- 0)
test span_no_fields          ... bench:          38 ns/iter (+/- 8)
test span_repeatedly         ... bench:       4,560 ns/iter (+/- 663)
test span_with_fields        ... bench:          49 ns/iter (+/- 17)
test span_with_fields_record ... bench:         363 ns/iter (+/- 71)

After:

     Running target/release/deps/no_subscriber-09f0b39a00181a97

running 5 tests
test bench_1_atomic_load              ... bench:           0 ns/iter (+/- 0)
test bench_costly_field_no_subscriber ... bench:           2 ns/iter (+/- 0)
test bench_log_no_logger              ... bench:           0 ns/iter (+/- 0)
test bench_no_span_no_subscriber      ... bench:           0 ns/iter (+/- 0)
test bench_span_no_subscriber         ... bench:           2 ns/iter (+/- 0)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out

     Running target/release/deps/subscriber-992535c5d1c348ad

running 5 tests
test enter_span              ... bench:           3 ns/iter (+/- 0)
test span_no_fields          ... bench:          48 ns/iter (+/- 9)
test span_repeatedly         ... bench:       5,274 ns/iter (+/- 596)
test span_with_fields        ... bench:          55 ns/iter (+/- 11)
test span_with_fields_record ... bench:         346 ns/iter (+/- 71)

Closes #1032

Signed-off-by: Eliza Weisman [email protected]

carllerche · 2019-04-04T05:01:20Z

Looking good 👍 I only skimmed it so far. I will review it in closer detail tomorrow.

Have you considered flagging the dispatcher as entered instead of swapping out then back? I wonder if that would make an impact in the benches you run.

Roughly, Dispatcher would have a bool entered. If that flag is set, then it does not forward any events to the subscriber.

hawkw · 2019-04-04T21:22:29Z

@carllerche We still need to store that bool as part of the thread-local state, rather than on Dispatch, since Dispatch needs to be Sync (precluding the use of a Cell) as it's a field on Span (which we are now asserting is Sync).

Using a flag stored in the thread-local state rather than replacing the dispatcher doesn't seem to result in a big performance improvement:


running 5 tests
test enter_span              ... bench:           3 ns/iter (+/- 0)
test span_no_fields          ... bench:          47 ns/iter (+/- 6)
test span_repeatedly         ... bench:       5,063 ns/iter (+/- 957)
test span_with_fields        ... bench:          57 ns/iter (+/- 6)
test span_with_fields_record ... bench:         361 ns/iter (+/- 37)

However, it does mean that calls to dispatcher::get_default can be nested, as long as calls to the functions on Dispatch that need to be guarded are not also nested. This at least lets us avoid a potentially surprising edge case...

tokio-trace/tokio-trace-core/src/dispatcher.rs

carllerche · 2019-04-09T19:35:00Z

@hawkw looking at this, the version w/ the bool flag needs to add a lot of conditionals where the original one did not... I do wonder what made the original (Cell and swap) slower.

hawkw · 2019-04-10T19:31:44Z

@carllerche personally, I would prefer to move forward with the bool flag rather than Cell and swap, because it means that nested calls to get_default don't behave surprisingly --- in the bool flag approach, the dispatcher is only disabled when we're actually inside it.

hawkw · 2019-04-15T18:14:14Z

@carllerche commit b0c6b34 moves the check back into dispatcher::get_default, rather than guarding individual functions on the dispatcher.

This means that we don't have to access the TLS var twice, and we don't have to access it at all when entering a span handle that already exists. This pretty much entirely negates the performance hit caused by adding the check.

hawkw · 2019-04-15T18:16:03Z

Looks like the build failure is due to Azure Pipelines flaking out again; I restarted it.

hawkw · 2019-04-15T18:46:17Z

Benchmark results from b0c6b34:

Before:

test enter_span              ... bench:           3 ns/iter (+/- 0)
test span_no_fields          ... bench:          51 ns/iter (+/- 12)
test span_repeatedly         ... bench:       5,073 ns/iter (+/- 1,528)
test span_with_fields        ... bench:          56 ns/iter (+/- 49)
test span_with_fields_record ... bench:         363 ns/iter (+/- 61)

After:

test enter_span              ... bench:           3 ns/iter (+/- 0)
test span_no_fields          ... bench:          35 ns/iter (+/- 12)
test span_repeatedly         ... bench:       4,165 ns/iter (+/- 298)
test span_with_fields        ... bench:          48 ns/iter (+/- 12)
test span_with_fields_record ... bench:         363 ns/iter (+/- 91)

Note that the "after" benchmarks are pretty much in line with the perf on master before adding the check. :D

hawkw · 2019-04-15T18:48:38Z

FreeBSD CI is also failing due to an unrelated (and possibly transient?) failure in the threadpool tests:

failures:
---- blocking_thread_does_not_take_over_shutdown_worker_thread stdout ----
thread 'blocking_thread_does_not_take_over_shutdown_worker_thread' panicked at 'called `Result::unwrap()` on an `Err` value: RecvError', src/libcore/result.rs:997:5
failures:
    blocking_thread_does_not_take_over_shutdown_worker_thread
test result: FAILED. 6 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

I restarted it.

Signed-off-by: Eliza Weisman <[email protected]>

@carllerche

This reduces the performance overhead of this change significantly. The benchmarks are once again on par with `master`: ``` Running target/release/deps/no_subscriber-1afb0f83feae37f5 running 5 tests test bench_1_atomic_load ... bench: 0 ns/iter (+/- 0) test bench_costly_field_no_subscriber ... bench: 1 ns/iter (+/- 0) test bench_log_no_logger ... bench: 0 ns/iter (+/- 0) test bench_no_span_no_subscriber ... bench: 0 ns/iter (+/- 0) test bench_span_no_subscriber ... bench: 2 ns/iter (+/- 0) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out Running target/release/deps/subscriber-7e8d6a743535dbee running 5 tests test enter_span ... bench: 3 ns/iter (+/- 0) test span_no_fields ... bench: 34 ns/iter (+/- 4) test span_repeatedly ... bench: 3,868 ns/iter (+/- 380) test span_with_fields ... bench: 45 ns/iter (+/- 7) test span_with_fields_record ... bench: 333 ns/iter (+/- 48) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out ```` Thanks @carllerche for suggesting this! Signed-off-by: Eliza Weisman <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>

Rather than guarding every dispatcher function with an entered check, this commit moves the check back into `get_default`. This negates most of the performance hit caused by the entered check. Note that it may be possible to bypass the check by sneaking a span handle into the subscriber, but AFAICT this is enough work that it would have to be done intentionally. The purpose of this check is to guard against *accidental* infinite loops caused by trace instrumentation in code a subscriber calls into, so I'm much less concerned about preventing deliberate misuse. Before: ``` test enter_span ... bench: 3 ns/iter (+/- 0) test span_no_fields ... bench: 51 ns/iter (+/- 12) test span_repeatedly ... bench: 5,073 ns/iter (+/- 1,528) test span_with_fields ... bench: 56 ns/iter (+/- 49) test span_with_fields_record ... bench: 363 ns/iter (+/- 61) ``` After: ``` test enter_span ... bench: 3 ns/iter (+/- 0) test span_no_fields ... bench: 35 ns/iter (+/- 12) test span_repeatedly ... bench: 4,165 ns/iter (+/- 298) test span_with_fields ... bench: 48 ns/iter (+/- 12) test span_with_fields_record ... bench: 363 ns/iter (+/- 91) ``` Signed-off-by: Eliza Weisman <[email protected]>

This branch changes `dispatcher::get_default` to unset the thread's current dispatcher while the reference to it is held by the closure. This prevents infinite loops if the subscriber calls code paths which emit events or construct spans. Note that this also means that nested calls to `get_default` inside of a `get_default` closure will receive a `None` dispatcher rather than the "actual" dispatcher. However, it was necessary to unset the default in `get_default` rather than in dispatch methods such as `Dispatch::enter`, as when those functions are called, the current state has already been borrowed. Before: ``` test enter_span ... bench: 3 ns/iter (+/- 0) test span_no_fields ... bench: 51 ns/iter (+/- 12) test span_repeatedly ... bench: 5,073 ns/iter (+/- 1,528) test span_with_fields ... bench: 56 ns/iter (+/- 49) test span_with_fields_record ... bench: 363 ns/iter (+/- 61) ``` After: ``` test enter_span ... bench: 3 ns/iter (+/- 0) test span_no_fields ... bench: 35 ns/iter (+/- 12) test span_repeatedly ... bench: 4,165 ns/iter (+/- 298) test span_with_fields ... bench: 48 ns/iter (+/- 12) test span_with_fields_record ... bench: 363 ns/iter (+/- 91) ``` Closes #1032 Signed-off-by: Eliza Weisman <[email protected]>

@samscott89

The `tracing-fmt` subscriber currently fails to increment a span's reference count when a child of that span is created, leading to a situation where child spans can be incorrectly missing their parents. This is due to an interaction between the changes in PR #31 and the changes in tokio-rs/tokio#1033. PR #31 changed the `FmtSubscriber` to round-trip calls to `clone_span` and `drop_span` through the dispatcher, so that when a `FmtSubscriber` is wrapped by another subscriber, the whole stack would see the ref-count updates. tokio-rs/tokio#1033 changed the dispatcher to unset itself when servicing a dispatch call, so as to avoid infinite loops from events within functions called by the subscriber. In combination, this means any ref-count changes from within the dispatcher (i.e. when `new_span` constructs a new span that clones its parent's ID) are lost. This branch changes `FmtSubscriber` back to calling ref-count methods on itself directly. Although this means that it will no longer call the ref-counting methods on a wrapper subscriber, the `Layer` type added in PR #136 will soon obviate the need to write such subscribers, and as far as I can tell, they are not currently being used in production. I've also added some assertions that should help catch situations where the `FmtSubscriber`'s span store is corrupted. Previously, we would silently ignore missing spans, in case a method was called in a `drop` impl (i.e., when exiting a span or dropping an ID), to avoid double panics. I've changed these methods to check if the thread is panicking, and only panic if we are running in debug mode and not already panicking. Thanks to @samscott89 for finding this bug! Signed-off-by: Eliza Weisman <[email protected]>

hawkw added C-bug Category: This is a bug. tokio-trace labels Apr 4, 2019

hawkw requested a review from carllerche April 4, 2019 00:20

carllerche reviewed Apr 9, 2019

View reviewed changes

tokio-trace/tokio-trace-core/src/dispatcher.rs Outdated Show resolved Hide resolved

hawkw requested a review from carllerche April 10, 2019 19:29

hawkw mentioned this pull request Apr 10, 2019

trace-core: prepare for 0.2 release #1047

Merged

hawkw force-pushed the eliza/unset-dispatch branch from 0986338 to b0c6b34 Compare April 15, 2019 18:12

carllerche approved these changes Apr 15, 2019

View reviewed changes

hawkw added 11 commits April 16, 2019 11:13

trace-core: Dispatchers unset themselves

becacca

Signed-off-by: Eliza Weisman <[email protected]>

trace: Replace RefCell with Cell

e9bb74a

Signed-off-by: Eliza Weisman <[email protected]>

trace-core: Add test that dispatch doesn't infinite loop

3c79939

Signed-off-by: Eliza Weisman <[email protected]>

trace-core: Make test for infinite loops panic rather than hanging

bc18c4b

Signed-off-by: Eliza Weisman <[email protected]>

trace-core: Add test for spans too

22f1b44

Signed-off-by: Eliza Weisman <[email protected]>

trace-core: Clean up code, add comments

c6d3a49

Signed-off-by: Eliza Weisman <[email protected]>

rustfmt

9088410

update after rebase

6b857e2

trace: Test event macros in the subscriber

39431d8

Signed-off-by: Eliza Weisman <[email protected]>

hawkw force-pushed the eliza/unset-dispatch branch from b0c6b34 to 84481e8 Compare April 16, 2019 18:13

hawkw merged commit 4bfa4ff into master Apr 16, 2019

hawkw added a commit that referenced this pull request Apr 16, 2019

Add #1033 to the changelog

b26043f

carllerche deleted the eliza/unset-dispatch branch April 22, 2019 21:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

trace-core: Dispatchers unset themselves #1033

trace-core: Dispatchers unset themselves #1033

Uh oh!

hawkw commented Apr 4, 2019

Uh oh!

carllerche commented Apr 4, 2019

Uh oh!

hawkw commented Apr 4, 2019

Uh oh!

Uh oh!

carllerche commented Apr 9, 2019

Uh oh!

hawkw commented Apr 10, 2019

Uh oh!

hawkw commented Apr 15, 2019

Uh oh!

hawkw commented Apr 15, 2019

Uh oh!

hawkw commented Apr 15, 2019

Uh oh!

hawkw commented Apr 15, 2019

Uh oh!

Uh oh!

Uh oh!

trace-core: Dispatchers unset themselves #1033

trace-core: Dispatchers unset themselves #1033

Uh oh!

Conversation

hawkw commented Apr 4, 2019

Uh oh!

carllerche commented Apr 4, 2019

Uh oh!

hawkw commented Apr 4, 2019

Uh oh!

Uh oh!

carllerche commented Apr 9, 2019

Uh oh!

hawkw commented Apr 10, 2019

Uh oh!

hawkw commented Apr 15, 2019

Uh oh!

hawkw commented Apr 15, 2019

Uh oh!

hawkw commented Apr 15, 2019

Uh oh!

hawkw commented Apr 15, 2019

Uh oh!

Uh oh!