-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove is_shutdown flag from processors. And fix logger::emit() to check for the flag before emit. #2462
base: main
Are you sure you want to change the base?
Remove is_shutdown flag from processors. And fix logger::emit() to check for the flag before emit. #2462
Conversation
@@ -268,6 +268,14 @@ impl opentelemetry::logs::Logger for Logger { | |||
|
|||
/// Emit a `LogRecord`. | |||
fn emit(&self, mut record: Self::LogRecord) { | |||
if self.provider.inner.is_shutdown.load(Ordering::Relaxed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with this is, this can affect throughput due to the contention introduced here. (Logs so far has no contention when using etw/user_events)....
Can you check stress test before and after?
I am unsure of a solution. Maybe don't check shutdown anywhere except in stdout like non-prod processors, and rely on mechanisms like export client/ etw etc. returning errors..
what harm can it cause 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, it doesn’t make sense to introduce the contention in the hot path, even if the contention is at atomic level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are only reading the atomic variable and the variable's value does not change for the most part of application lifetime, it most likely should not have any visible effect on the throughput.
We should be able to confirm that with the stress test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the effect of uncontested atomic would be negligible enough to be noticed over stress test. It is slightly more than normal bool-check, but much less than the perf associated in case of contention. I added the benchmark over logger:emit() in this PR, which shows the latency of 1-2ns:
main branch
:
logger_emit time: [37.916 ns 37.977 ns 38.077 ns]
change: [-0.2733% -0.0648% +0.1328%] (p = 0.58 > 0.05)
No change in performance detected.
PR branch
:
logger_emit time: [38.941 ns 39.027 ns 39.172 ns]
change: [+2.6756% +2.9861% +3.3292%] (p = 0.00 < 0.05)
Performance has regressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure of a solution. Maybe don't check shutdown anywhere except in stdout like non-prod processors, and rely on mechanisms like export client/ etw etc. returning errors..
what harm can it cause
The custom exporter which can be connected to reentrant and simple processor need to handle the shutdown properly in that case. As of now, etw and user_events don't do anything, so they will continue to emit even after shutdown invoked by user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding this "if check" in Logger::emit()
is not only a more efficient short-circuiting operation but it also simplifies things. It lets us offer a more robust guarantee:
"SDK would not do any wasteful computation once its shutdown (regardless of what processors/exporters are used)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it simplifies things. How do we determine the best place to do the is_shutdown check? Logger::emit() is one place.. how about the processor? how about the exporter? or even at the appender itself?
"SDK would not do any wasteful computation once its shutdown (regardless of what processors/exporters are used)"
I don't know if we need to make such a statement? If user want to achieve that, they can drop the providers. (even then, we need to redesign something in Metrics to make it truly no-ops(#2442)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it simplifies things.
The simplification I'm referring to is the effort required to achieving the expected shutdown behavior. Why call emit()
on each processor, if the provider is shutdown. With having the "if check" in Loggger::emit()
we effectively make it a no-op SDK, when the provider is shutdown.
How do we determine the best place to do the is_shutdown check? Logger::emit() is one place.. how about the processor? how about the exporter? or even at the appender itself?
Since the shutdown is an SDK concept, it would only make sense to have the check in an SDK component. Appenders do not depend on SDK, so they are out of question. Maybe we can update the LoggerProvider
trait to add another method is_shutdown
which the appenders can use to avoid creating log records altogether. That's a different discussion though as it involves updating the public trait from API.
With the current state of code, among the SDK components: Logger
, Processor
, and Exporter
, I think the best place to check for shutdown would be Logger
as that's the first SDK component that interacts with the appender. By checking it in the Logger
, we can avoid making calls to processors altogether.
I don't know if we need to make such a statement?
We should have some stance about shutdown behavior. It could either be
- The effectiveness of the shutdown depends on the processors/exporters used
- Or that shutdown works regardless, of the configured pipeline.
I think it's nice to have # 2 and having the "if check" is a very easy way to achieve that. I don't see any disadvantage to doing that.
If user want to achieve that, they can drop the providers.
I'm not sure if it's as straightforward. Since the provider is essentially an Arc<LoggerProviderInner>
, the inner value would not be dropped until there are no more references to it. The appender layer keeps a clone of the provider which then moved to the tracing_sbuscriber::registry()
. So simply dropping the provider might not help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have some stance about shutdown behavior. It could either be
The effectiveness of the shutdown depends on the processors/exporters used
Or that shutdown works regardless, of the configured pipeline.
I think it's nice to have # 2 and having the "if check" is a very easy way to achieve that. I don't see any disadvantage to doing that.
Thanks, these are good discussions.
I am also inclined with #2, as it ensures consistent shutdown behavior regardless of pipeline configuration or internal component details. Relying on assumptions, such as the batch processor closing its channel during shutdown, creates unnecessary dependencies that could break with future changes. An explicit provider/logger level check provides a more reliable solution, more importantly since cost of adding it is small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplification I'm referring to is the effort required to achieving the expected shutdown behavior. Why call emit() on each processor, if the provider is shutdown. With having the "if check" in Loggger::emit() we effectively make it a no-op SDK, when the provider is shutdown.
Without further redesign/refactoring, this won't make it a no-op sdk. There is non-trivial amount of work done even before emit() is invoked, to prepare LogRecord, and that'll continue even after shutdown. Yes, we save some work, but not quite no-op.
We should have some stance about shutdown behavior. It could either be
1 The effectiveness of the shutdown depends on the processors/exporters used
2 Or that shutdown works regardless, of the configured pipeline.
So far, we have been mostly sticking with 1 - we expect components to behave as expected. Example: We expect ObservableCallbacks to return in reasonable time, and if they don't return, Metrics SDK never recovers. Similarly we expect Exporters to return in reasonable time.
Not a strict comparison, but if we want to make a stance that Otel SDK works predictable, irrespective of the components plugged in, then it need a lot of redesign.
If we generally expect other components to behave properly, then why not continue rely on them to deal with shutdown too?
Another interesting thing we can do here (Similar to what Otel .NET does for its console exporter). If we make shutdown a pure exporter problem, we can have Stdout exporter continue to work even after shutdown is invoked, but can do something in the Stdout exporter to let user know that they have incorrect lifecycle management. IMHO, this is better than short-circuiting at up-stream sdk.(Of course internal logs can help, but unless users take care of avoiding self-logging, it can cause stack overflow due to cyclic loop)
I'm not sure if it's as straightforward. Since the provider is essentially an Arc, the inner value would not be dropped until there are no more references to it. The appender layer keeps a clone of the provider which then moved to the tracing_sbuscriber::registry(). So simply dropping the provider might not help.
Excellent point! This is something we need to have some guidance. (Even more challenging is the Metric case referred earlier).
If user decided to setup a tracing::subscriber globally (via init()), then they won't be able to really drop anything, unless tracing::subscriber provides a way. In other words, user is signing up to a global subscriber, so they should not expect to have the ability to drop things. tracing has alternate solutions to set a subscriber for a scope only, and they can drop the scope which in-turn can drop the logger_provider.
(#1660 demonstrates this issue already)
@@ -201,15 +187,6 @@ impl Debug for BatchLogProcessor { | |||
|
|||
impl LogProcessor for BatchLogProcessor { | |||
fn emit(&self, record: &mut LogRecord, instrumentation: &InstrumentationScope) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if an emit is called after shutdown, even if we don't do is_shutdown check, the channel would be already closed, so the error it returned is good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like
- Don't check for is_shutdown.
- Just export as usual.
- Since the channel is closed, it'll error out.
- Log that error.
No contention/perf cost for normal path. If logs are still emitted after shutdown, it clearly indicates some issue with user managing the lifetimes.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2462 +/- ##
=====================================
Coverage 77.2% 77.2%
=====================================
Files 124 124
Lines 23009 22976 -33
=====================================
- Hits 17765 17745 -20
+ Misses 5244 5231 -13 ☔ View full report in Codecov by Sentry. |
Changes
To discuss the change suggested be @cijothomas here - #2381 (comment)
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes