-
Notifications
You must be signed in to change notification settings - Fork 463
Create a custom thrift protocol to ensure client/server compatibility #5398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Created this in draft since, for now, it only addresses the task outlined in the ticket:
I'm not sure if any of the other proposed tasks should be added here or as follow-on PRs. |
public static class AccumuloProtocol extends TCompactProtocol { | ||
|
||
private static final int MAGIC_NUMBER = 0x41434355; // "ACCU" in ASCII | ||
private static final byte PROTOCOL_VERSION = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably add a comment to AccumuloDataVersion.CURRENT_VERSION that references this and mentions that when changing CURRENT_VERSION may also need to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about that... whether the protocol version should be independent, so we can version the serialization format of the protocol header.
We will probably want to include the AccumuloDataVersion.CURRENT_VERSION
in the protocol header directly, though. We could also include the actual Accumulo release version in here, too (major.minor.patch; e.g. 4.0.0). I think to verify, we'd probably want to ensure the same major and same minor version, if we want to be even more restrictive than the data version. We really shouldn't have Accumulo servers or clients/servers talking across major.minor versions (different patch releases are okay, and is important for rolling upgrades).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it separate for now just in case we wanted to have multiple protocol versions able to talk to each other but am not sure about that.
The data version and protocol version definitely seem related but I'm not sure how closely tied we want to have those versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the protocol version can be independently numbered, not connected to the data version. I think it's sufficient to check the Accumulo version (Constants.VERSION) major.minor. We don't need to use the data version in here at all. The protocol version is the version of the RPC protocol... the data version is the version of the data persisted in ZK and HDFS... and the Accumulo version is the overall version of the software. Only the RPC protocol version and the Accumulo version are relevant here.
core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we actually need to distinguish between client and server for the spans.
core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java
Outdated
Show resolved
Hide resolved
Marking this as ready for review. I tried to address the following two suggestions from the ticket:
There are a lot of changed files in this PR since I had to update the thrift code. There are also a lot of places where the only change is the removal of the |
core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/rpc/ThriftUtil.java
Outdated
Show resolved
Hide resolved
@kevinrr888 your suggestions should all be addressed as of 29c1234 in that commit I...
|
core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/accumulo/core/rpc/AccumuloProtocolTest.java
Outdated
Show resolved
Hide resolved
The ServerContext and/or ClientContext should be able to provide the InstanceId, and the protocol factory could be constructed with it. I'm not sure how much code that touches. I can try to take a look if you hit a wall here. |
core/src/main/java/org/apache/accumulo/core/trace/TraceUtil.java
Outdated
Show resolved
Hide resolved
…olTest.java Co-authored-by: Christopher Tubbs <[email protected]>
Co-authored-by: Christopher Tubbs <[email protected]
Okay I added the instance ID validation in 769d38f. Had to reach pretty far up the chain to get the instanceId so there were quite a few files changed but all of those are just parameter changes. |
Can you provide an example of the opentelemetry tracing output (screenshot or log sequences or something)? I'd be curious what they look like when a client performs a traced operation over the new RPC protocol. |
@DomGarguilo and @ctubbsii curious what the status of this PR is. Would be nice to get this merged in if possible. |
Ready to be merged unless anyone else has blockers here. |
I think this is probably ready to be merged, but I'd like to go over the visual with @DomGarguilo to make sure I understand what I'm looking at and everything is working as it should. Sorry for holding this up. I think it can be merged next week. @DomGarguilo could you give me a call next week to go over it first, if you have time? |
I am actually not sure that traces are being propagated correctly. I'm looking into things but I don't think this should be merged yet. |
After debugging. The issue is fixed via a683811 and I can see spans from client -> server in Jaeger. The issue was that we were not closing the scope or span on In the screenshot below, in a shell I ran This demonstrates that cross-component tracing is working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code again today, after that fix for the readMessageEnd...
I think the stuff with the version checks, and all that is all good, and we should definitely merge that stuff in soon. And, I really wanted the changes to the tracing stuff also, because I was really hoping it would allow us to simplify a lot of the Thrift code.
However, I'm starting to wonder if pushing the tracing stuff into this is actually feasible. When I looked at it again today, I think I'm starting to realize that readMessageEnd might finish before the actual action that the RPC was requesting is executed. If that's the case, then the span is already closed, and any subsequent spans on the server side are going to be disconnected. That seems to match what I'm seeing in the second screenshot, where we only see the handleRpcMessage span, but no child spans for it on the server.
The current code creates a span that wraps the entire execution of the RPC call, but I think in the protocol, it's only wrapping the message parsing, not the execution. I am not too familiar with Thrift internals, but I suspect a subsequent writeMessage is done on the server side to send a message back to the client with the results of the execution. If so, we really want to close the handleRpcMessage span at that time, and not before.
It may still be possible to salvage these tracing changes, and close the span at the end of the RPC execution inside the protocol factory. However, if that's not possible, we may have to abandon the thrift tracing changes, and continue doing that the way it is currently done, so it wraps the whole RPC operation, and only keep the portions that do the version/compatibility checks.
Addresses #5390