Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional async API for non-lossy non-blocking writes #3238

Open
Amberley-Sz opened this issue Mar 20, 2025 · 4 comments
Open

Add optional async API for non-lossy non-blocking writes #3238

Amberley-Sz opened this issue Mar 20, 2025 · 4 comments

Comments

@Amberley-Sz
Copy link

Amberley-Sz commented Mar 20, 2025

Feature Request

Optional Async API for Non-lossy Non-blocking Writes

Crates

tracing-appender

Motivation

Currently, the NonBlocking writer's non-lossy mode blocks when the channel is full. Users have two sub-optimal choices:

  1. Use lossy mode and potentially lose messages
  2. Use non-lossy mode and accept blocking behavior

We want to provide a third option for users who:

  • Already use Tokio in their application
  • Are willing to add Tokio as a dependency to gain non-blocking behavior

Users who want to avoid async runtime dependencies and are ok with the blocking behavior can stick to the current flow.

Proposal

Add an optional Tokio-based async API behind a feature flag while maintaining the existing synchronous implementation:

  1. Add minimal Tokio dependency for async channel support:
    In Cargo.toml file, add:
[dependencies]
tokio = { version = "1", optional = true, features = ["sync"], default-features = false }

[features]
default = []
async = ["tokio"]
  1. Add new async methods to NonBlocking:
  • Implement AsyncWrite trait behind the "async" feature flag
  • Keep existing std::io::Write implementation unchanged
  • Allow applications to explicitly opt-in to async behavior when desired
  • Add separate Tokio async channel for async operations when async enabled
  • Accept runtime handle via builder to control where async worker runs

Sample usage:

// Existing sync behavior remains unchanged
let (writer, guard) = NonBlocking::new(std::fs::File::create("app.log")?);
writer.write(b"sync write")?; // Uses std::io::Write

// New async API available when feature enabled
#[cfg(feature = "async")]
writer.write_async(b"async write").await?; // Uses AsyncWrite
  1. Add documentation warnings to existing implementation, something like
/// # Warning
/// In non-lossy mode, this implementation will block when the channel is full.
/// For truly non-blocking behavior, enable the "async" feature and use [`AsyncNonBlocking`].
@hawkw
Copy link
Member

hawkw commented Mar 20, 2025

I don't think what you're describing here makes sense.

How does just switching the NonBlocking writer to use Tokio's MPSC channel make it both non-blocking and non-lossy? The Tokio MPSC is "non-lossy" because the caller may wait asynchronously when the channel is full (by .awaiting the mpsc::Sender::send call). This is not "blocking" in the sense of blocking the entire thread, but it does require the sending task to wait. This is only possible when the sender is in an async context.

Synchronous (non-async fn) callers cannot await the mpsc::Sender::send future, They must instead use the fallible mpsc::Sender::try_send, which returns an error when the channel is full --- i.e., it's lossy for synchronous callers. And, the tracing::Subscriber methods in which we would send a message to the channel are synchronous functions --- they cannot be made into async fns without a breaking change that makes all of tracing's APIs async (i.e., recording an event or creating/entering/exiting a span would be operations that you have to .await). We're not going to do this, as it would make tracing unusable in non-async code, including non-async functions in projects that do use an async runtime such as tokio.

So, unfortunately, I don't think this is something that we could do.

@jlizen
Copy link

jlizen commented Mar 20, 2025

Thanks for the eyes @hawkw !

I work with Amberly/@rcoh so jumping in to share some context.

To be clear - she/I are glad to spend the time to make any such contributions, make sure they are groomed to reduce overhead for you and other maintainers, etc. And if you don't have time to discuss / review this right now, completely understood! Mostly just wanted to float the idea to get feedback :)

Intended use case

We're not going to do this, as it would make tracing unusable in non-async code, including non-async functions in projects that do use an async runtime such as tokio.

The use case this is targeting is less about interfacing with the tracing APIs, which can't be async. It's more about cases where somebody is using tracing_appender as a generic 'flush to file or other destination' library, and they are calling the BlockingAppender APIs directly.

I see this use case frequently in my company outside of logging contexts, where there are some other sorts of records being written to disk. That also tends to be when the non-lossy one is used, since log records you generally want to drop in the case of backpressure, but other records might be more critical.

How it would be used

The usage mode would need to be a net new API (presumably AsyncWrite-trait-related, presumably behind a feature flag) that can be called instead of the std::io::write one, that uses the tokio channel and expects an async context. Which needs to be called directly since all of the integration points today expect sync APIs / std::io::write.

Meaning, the default behavior of BlockingAppender wouldn't change, and this is an opt-in behavior used directly by calling applications if you have a use case where you'd prefer to use non-lossy and sleep instead of block. (Presumably it would also be possible to use for lossy mode given the current singular struct in current implementation, but not as useful).

I personally have projects that would use this, where I currently am using non-lossy mode to flush non-log records in an async context that is important but in the background compared to request/response flow. When my application is under heavy load, I am nervous about backpressure degrading latency on tasks that would introduce end to end latency, but I can't afford to drop records, so I accept that for now.

So, this API would be useful for me! I imagine others. We could clearly document the cases where non-lossy NonBlockingAppender will block, what API that eg a tokio feature flag would unlock, what the tradeoff is (extra tokio/sync dependency, extra tokio::mpsc channel), and what the limitations of using it are (ie, doesn't interop with tracing ecosystem, needs to be manually used by end caller).

Is this crate the right place for this?

It seems like even though tracing-appender, while in the tracing umbrella, it is the widely used 'off the shelf' option for flushing to disk/other io today. We could certainly instead write this sort of code in a different library that is more focused on non-logging/visibility use cases.

But, it just felt like it would fragment the ecosystem / not be very useful for other consumers of tracing-appender that are less likely to migrate, but might still share this use case.

Clearly we shouldn't do anything that adds overhead / dependencies to the existing usage or otherwise worsens the tracing integration. But this seems like an additive change if it is opt-in under feature flag?

Future extensions (that do interop with tracing)

I'd love to see something along the lines of, another usage mode for LOSSY appender that is just a generic sync on-full callback that allows the caller to inject custom metrics, have some sort of dead letter queue, send the message to a context where it can be handled with eg an async sleep, or otherwise. That seems like that would play more nicely with tracing, and also could be done additively?

@hawkw
Copy link
Member

hawkw commented Mar 20, 2025

The use case this is targeting is less about interfacing with the tracing APIs, which can't be async. It's more about cases where somebody is using tracing_appender as a generic 'flush to file or other destination' library.

I see this frequently in my company outside of logging contexts, where there are some other sorts of records being written to disk. That also tends to be when the non-lossy one is used, since log records you generally want to drop in the case of backpressure, but other records might be more critical.

Oh, I see, I hadn't realized people were using just the non-blocking writer part of tracing-appender on its own, outside of a tracing subscriber. That's surprising! In that case, I can imagine a use case for this, although I'm not convinced that it's best served by tracing-appender rather than a standalone library...

@jlizen
Copy link

jlizen commented Mar 20, 2025

I'm not convinced that it's best served by tracing-appender rather than a standalone library...

Yeah, this is really the key question. We kind of went in circles on it.

At the end of the day we felt, tracing-appender is written flexibly enough that it does work just fine for this use case. My anecdotal experience is that it is already widely used that way in some places - but my view is limited.

If we were going to go the standalone library route, it would probably start as a fork of this package. Which to me was a yellow flag that it might be nice to figure out how to play nicely with the existing API, rather than fragment.

Since, that way we can continue to contribute to tracing-appender, and then probably some contributions WOULD be relevant to the logging use case as well. A good example would be better telemetry or callbacks based on lossy appenders dropping logs, which would probably be useful both for generic records and tracing records.

It also would mean that we are keeping an eye on the PR/issue queue and could try to help with triage or other maintenance overhead (if that is useful to you).

Of course would defer to you and other maintainers on whether any of this makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants