Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to integrate tower retry with tonic because http::Request is not Clone #733

Open
vitarb opened this issue Jul 30, 2021 · 12 comments
Open
Assignees
Labels
A-tonic C-enhancement Category: New feature or request

Comments

@vitarb
Copy link

vitarb commented Jul 30, 2021

Hello community, I'm looking for some guidance.

I'm trying to integrate retry layer from tower into my tonic setup in order to enable gRPC retries for some common retryable errors.

My initial setup is almost identical to this example with retry policy like this one.

The issue I'm facing is that in order to work correctly, one must implement tower's policy trait including clone_request function. If you don't implement it and return None (as documentation suggests in the case when Clone is not implemented for the request) then whole retry logic will not be executed. Now the problem is that under the hood tonic uses http::Request, which is NOT Clone.

One one hand I see reasoning behind why http::Request should not be clone, but on the other it makes whole retry mechanism unusable.

Is there any work-around for this issue or are there any improvements we can make in tonic or tower in order to eliminate it in the first place (maybe put request behind an Arc?)?

@davidpdrsn
Copy link
Member

Making the requests themselves Clone is hard due to streaming.

There are tricks you can pull like buffering the http_body::Body as it's being generated and then "replay" it when retrying. That is what linkerd-proxy does. It does introduce quite a bit of complexity though and isn't something we can build into tonic.

It's instead recommended to handle retries at a higher level in your application than at the tower level. Exactly how to do that depends on your app.

@vitarb
Copy link
Author

vitarb commented Jul 30, 2021

Thanks, I was hoping that we could stay consistent across the ecosystem and be able to put retry functionality into the middleware layer (which tower is), like for example in go which allows adding an interceptor that handles retry logic.
Can client generator be customizable so that non-streaming APIs can have this type of retry integration?

@davidpdrsn
Copy link
Member

Can client generator be customizable so that non-streaming APIs can have this type of retry integration?

No I don't think so. http2 is always streaming whether you want to or not. In gRPC, unary requests are streams that happen to only contain one message.

@vitarb
Copy link
Author

vitarb commented Aug 1, 2021

I agree, having to clone http request sounds like a big hustle.
Can you explain why is it a bad idea to store request behind an Arc, making it clonable without having to clone it?

@davidpdrsn
Copy link
Member

davidpdrsn commented Aug 1, 2021

Can you explain why is it a bad idea to store request behind an Arc, making it clonable without having to clone it?

Prost requires owned buffers for encoding. There has been some discussion about supporting Arcs but don't anyone has worked on it yet.

It's also hard to say how supporting Request and Arc<Request> would impact tonics design.

@vitarb
Copy link
Author

vitarb commented Aug 2, 2021

Got it, should we keep this conversation open and try to find a potential long term solution, or would you prefer to shelve it?
Personally I think that aiming for feature parity with other languages, such as go, which has retry interceptors, should be important.

@davidpdrsn
Copy link
Member

davidpdrsn commented Aug 2, 2021

I would love for tonic to support tower::retry out of the box. I just don't know how to make it work 😞 But we should keep the issue open.

EDIT: I wasn't trying to dismiss the idea of retrys, I was just trying to state why we haven't supported it thus far.

For reference linkerd-proxy's implementation of a cloneable body type is here https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/http-retry/src/lib.rs. I don't know enough about the details to know if we could copy/paste that into tonic. Maybe @hawkw has more insight.

@LucioFranco
Copy link
Member

Yeah, I would like this and I have some ideas on how to accomplish it. Maybe it will make it into the next release but not sure. Ill try to be optimistic :D

@LucioFranco LucioFranco added A-tonic C-enhancement Category: New feature or request labels Oct 13, 2021
@LucioFranco LucioFranco added this to the 0.6 milestone Oct 13, 2021
@LucioFranco LucioFranco self-assigned this Oct 13, 2021
@LucioFranco LucioFranco modified the milestones: 0.6, 0.7 Oct 20, 2021
abbec pushed a commit to goodbyekansas/firm that referenced this issue Dec 12, 2021
All endpoints that would require an interactive login will now return
the gRPC status code for `unauthenticated`. This is a signal to the
client that they can try to call `login` if they have an interactive
context (think UI, CLI). `login` will generate a stream of login
commands (currently only supports Browser to signal to the client to
open a browser).

The interactive login will currently do as many oidc logins as needed to
cover all auth scopes.

Whats left:

I started implementing a tower layer in Bendini (`src/auth.rs`) that
checks for `unauthenticated` and in that case tries to perform an
interactive login process. However, it turns out not to be as nice as I
had hoped. After logging in, we would want to retry the initial request
but the request is not cloneable and therefore this seems
impossible. See hyperium/tonic#733. This is
also the source of the current compilation error that I left in there as
reference.

We should probably do something like David Pedersen suggests and handle
it on a "higher level". Not sure how to make that nice though as the
nice thing of what was started here was that it was 100% automatic and
invisible to higher layers and putting it higher up makes it dependent
on generated types and functions, which tower layers are not since they
run after/before protobuf encoding/decoding respectively. Potentially we
could use some sort of `Deref` implementation where the client is
wrapped in a `Retry` type?

Implement the `wait_for_approval` endpoint in avery.
@LucioFranco LucioFranco removed this from the 0.7 milestone Feb 22, 2022
@fmassot
Copy link

fmassot commented Sep 30, 2022

I'm interested in the retry feature for our project as we heavily use tonic there (and thanks a ton for what you do :)).
@LucioFranco what did you have in mind exactly? We may give it a try so it could be nice to be aware of a good design.

@LucioFranco
Copy link
Member

There is a tracking issue in tower here tower-rs/tower#682

@stan-irl
Copy link

im trying to write some auth middleware that listens for UNAUTHENTICATED and refreshes the token then retries the request. this is also not possible while a request cant be cloned which is unfortunate because its a pretty common usecase.

Is there a quick any dirty way to clone a request even if its not performant and doesnt support stream? I've been trying for hours to clone the body as bytes but I cant get anything going:

async fn clone_request(req: &Request<BoxBody>) -> Request<BoxBody> {
    let bytes = hyper::body::to_bytes(req.body_mut()).await.unwrap();
    let body = hyper::Body::from(bytes);
    let box_body = BoxBody::new(body);
    Request::builder()
        .method(req.method())
        .uri(req.uri())
        .version(req.version())
        .body(box_body)
        .unwrap()
}

throws error:

error[E0271]: type mismatch resolving `<Body as HttpBody>::Error == Status`
  --> src/components/grpc/src/middleware/auth.rs:88:33
   |
88 |     let box_body = BoxBody::new(body);
   |                    ------------ ^^^^ expected `Status`, found `Error`
   |                    |
   |                    required by a bound introduced by this call
   |
note: required by a bound in `http_body::combinators::box_body::UnsyncBoxBody::<D, E>::new`
  --> ~/.cargo/registry/src/github.com-1ecc6299db9ec823/http-body-0.4.5/src/combinators/box_body.rs:82:27
   |
82 |         B: Body<Data = D, Error = E> + Send + 'static,
   |                           ^^^^^^^^^ required by this bound in `UnsyncBoxBody::<D, E>::new`


daniel-savu added a commit to hyperlane-xyz/hyperlane-monorepo that referenced this issue Feb 5, 2024
### Description

Implements grpc fallback provider logic for cosmos

- initially tried implementing the fallback provider deprioritization
logic at middleware level like in the EVM. The difference between ethers
and cosmrs is that in the latter, middleware can only live at the
transport layer (`tower` crate level).
- based on this github
[issue](hyperium/tonic#733), that actually
doesn't look possible, because the http::Request type isn't `Clone` so
it can't be submitted to multiple providers
- ended up implementing the fallback provider at the application layer,
by keeping an array of grpc channels
- There is now a `call` method in `hyperlane_core::FallbackProvider`
which I'm actually really happy with. This method handles the
fallbackprovider-specific logic by taking in an async closure, running
it on each provider, and iterating providers if the closure call fails.
In `grpc.rs` you can see how this is slightly verbose but I think it's
quite manageable. The only part that bugs me is having to duplicate
`Pin::from(Box::from(future))`, but that's need afaict because the
regular closure returns an anonymous type
- adds `grpcUrls` and `customGrpcUrls` config items
- tests the cosmos fallback provider e2e

### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

- Fixes: hyperlane-xyz/issues#998

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
ltyu pushed a commit to ltyu/hyperlane-monorepo that referenced this issue Mar 13, 2024
### Description

Implements grpc fallback provider logic for cosmos

- initially tried implementing the fallback provider deprioritization
logic at middleware level like in the EVM. The difference between ethers
and cosmrs is that in the latter, middleware can only live at the
transport layer (`tower` crate level).
- based on this github
[issue](hyperium/tonic#733), that actually
doesn't look possible, because the http::Request type isn't `Clone` so
it can't be submitted to multiple providers
- ended up implementing the fallback provider at the application layer,
by keeping an array of grpc channels
- There is now a `call` method in `hyperlane_core::FallbackProvider`
which I'm actually really happy with. This method handles the
fallbackprovider-specific logic by taking in an async closure, running
it on each provider, and iterating providers if the closure call fails.
In `grpc.rs` you can see how this is slightly verbose but I think it's
quite manageable. The only part that bugs me is having to duplicate
`Pin::from(Box::from(future))`, but that's need afaict because the
regular closure returns an anonymous type
- adds `grpcUrls` and `customGrpcUrls` config items
- tests the cosmos fallback provider e2e

### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

- Fixes: hyperlane-xyz/issues#998

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
@fraillt
Copy link

fraillt commented Aug 21, 2024

Since this issue was created 1 thing has changed. In http crate, version "1" request object become cloneable. So this fixes half of the problem.
However, I don't think that it's possible to make tonic request cloneable, because by design it uses http2 which is streaming protocol. So no one should expect a tonic request to be cloneable.
However, I think retry middleware in tower crate is a right place to solve this issue.
I have create PR to retry middleware in order to make tonic requests work with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tonic C-enhancement Category: New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants