- 
                Notifications
    You must be signed in to change notification settings 
- Fork 4.6k
internal/xds: move the LDS and RDS watchers to dependency manager #8651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| Codecov Report❌ Patch coverage is  Additional details and impacted files@@            Coverage Diff             @@
##           master    #8651      +/-   ##
==========================================
+ Coverage   81.21%   82.97%   +1.76%     
==========================================
  Files         416      421       +5     
  Lines       41002    32464    -8538     
==========================================
- Hits        33298    26936    -6362     
+ Misses       6226     4129    -2097     
+ Partials     1478     1399      -79     
 🚀 New features to boost your workflow:
 | 
| The tests are failing. Is this ready for review? | 
| // XDSConfig holds the complete and resolved xDS resource configuration | ||
| // including LDS, RDS, CDS and endpoints. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // XDSConfig holds the complete and resolved xDS resource configuration | |
| // including LDS, RDS, CDS and endpoints. | |
| // XDSConfig holds the complete gRPC client-side xDS configuration | |
| // containing all necessary resources. | 
| // including LDS, RDS, CDS and endpoints. | ||
| type XDSConfig struct { | ||
| // Listener is the listener resource update | ||
| Listener ListenerUpdate | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are moving the ResourceChanged methods on the resource watchers to accept a pointer to the update struct (instead of accepting the update by value). So, I think it would make sense for us to store them as pointers here as well.
See: #8652
| // XDSConfig holds the complete and resolved xDS resource configuration | ||
| // including LDS, RDS, CDS and endpoints. | ||
| type XDSConfig struct { | ||
| // Listener is the listener resource update | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Let's try to consistently use the word configuration or config instead of update in these docstrings.
So, maybe something like:
// Listener holds the listener configuration.
| // RouteConfig is the route configuration resource update. It will be | ||
| // populated even if RouteConfig is inlined into the Listener resource. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // RouteConfig is the route configuration resource update. It will be | |
| // populated even if RouteConfig is inlined into the Listener resource. | |
| // RouteConfig holds the route configuration. It will be | |
| // populated even if the route configuration was inlined into the Listener resource. | 
| // VirtualHost is the virtual host from the route configuration matched with | ||
| // dataplane authority . | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe?
| // VirtualHost is the virtual host from the route configuration matched with | |
| // dataplane authority . | |
| // VirtualHost selected from the route configuration whose domain field | |
| // offers the best match against the provided dataplane authority. | 
| // Clusters maps the cluster name with the ClusterResult which will have | ||
| // either the cluster configuration or error. It will have an error status | ||
| // if either | ||
| // | ||
| // (a) there was an error and we did not already have a valid resource or | ||
| // | ||
| // (b) the resource does not exist. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making it much simpler and leaving more of the documentation to the individual structs.
// Clusters is a map from cluster name to its configuration.
| Clusters map[string]*ClusterResult | ||
| } | ||
|  | ||
| // ClusterResult contains either a cluster's configuration or an error. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like?
// ClusterResult contains a cluster's configuration when we receive a 
// valid resource from the management server. It contains an error when:
// - we receive an invalid resource from the management server and
//   we did not already have a valid resource or
// - the cluster resource does not exist on the management server
| Err error | ||
| } | ||
|  | ||
| // ClusterConfig contains cluster configuration for a single cluster. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ClusterConfig contains configuration for a single cluster.
| // ClusterConfig contains cluster configuration for a single cluster. | ||
| type ClusterConfig struct { | ||
| Cluster ClusterUpdate // Cluster configuration. Always present. | ||
| EndpointConfig EndpointConfig // Endpoint configuration for leaf clusters which will of type EDS or DNS. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just "Endpoint configuration for leaf clusters" should suffice.
| AggregateConfig AggregateConfig // List of children for aggregate clusters. | ||
| } | ||
|  | ||
| // AggregateConfig contains a list of leaf cluster names. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need not technically be all leaf clusters. Aggregate clusters can have children that are aggregate clusters as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right!
| LeafClusters []string | ||
| } | ||
|  | ||
| // EndpointConfig contains resolved endpoints for a leaf cluster either from DNS | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this contains more than just resolved endpoints, at least for the EDS case. So, maybe the comment can be more generic.
// EndpointConfig contains configuration corresponding to the endpoints in a cluster.
And we should also clarify that only one of three fields can be populated at any given point in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it the case that ResolutionNote can have a non-nil error even when one of EDSUpdate or DNSEndpoints is set? If so, we need to clarify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes , the Resolution note will also be set when we have an ambient error along with the old endpoints.
| // including LDS, RDS, CDS and EDS and sends update once we have all the | ||
| // resources and sends an error when we get error in listener or route | ||
| // resources. | ||
| func New(listenername, dataplaneAuthority string, xdsClient xdsclient.XDSClient, watcher ConfigWatcher) *DependencyManager { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/listenername/listenerName
| // resources and sends an error when we get error in listener or route | ||
| // resources. | ||
| func New(listenername, dataplaneAuthority string, xdsClient xdsclient.XDSClient, watcher ConfigWatcher) *DependencyManager { | ||
| // Builds the dependency manager and starts the listener watch. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: nix this comment as this is very obvious that you are creating the struct here. And the listener watch is not started here though.
| // ConfigWatcher is notified of the XDSConfig resource updates and errors that | ||
| // are received by the xDS client from the management server. It only receives a | ||
| // XDSConfig update after all the xds resources have been received. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
// ConfigWatcher is the interface for consumers of aggregated xDS configuration
// from the DependencyManager. The only consumer of this configuration is
// currently the xDS resolver.
| // ConfigWatcher is notified of the XDSConfig resource updates and errors that | ||
| // are received by the xDS client from the management server. It only receives a | ||
| // XDSConfig update after all the xds resources have been received. | ||
| type ConfigWatcher interface { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider moving this to the top of the file so that the methods of the DependencyManager stay together and are not mixed in with another type's definition.
|  | ||
| func (m *DependencyManager) maybeSendUpdate() { | ||
| if m.logger.V(2) { | ||
| m.logger.Infof("Sending update to watcher: Listener: %v, RouteConfig: %v", pretty.ToJSON(m.currentListenerUpdate), pretty.ToJSON(m.currentRouteConfig)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had many performance problems with using pretty.JSON for printing structs. I would recommend using %+v or some other native formatting directive instead.
Another thing to consider is also whether the xDS resolver also outputs this log. If so, we don't want the same information being repeated twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think adding this is resolver looks better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the tests yet. But I guess these comments will give you enough to make progress.
| type ConfigWatcher interface { | ||
| // OnUpdate is invoked by the dependency manager to provide a new, | ||
| // validated xDS configuration to the watcher. | ||
| OnUpdate(xdsresource.XDSConfig) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we are changing the resource watcher APIs to accept pointers to resource update structs, it would make more sense to store pointers to them in the XDSConfig struct as well.
And continuing in that same vein, we could return a pointer to the XDSConfig struct from here. Also, it would make sense to document that the watcher must not modify the returned XDSConfig and that it should read-only for the watcher.
| // OnError is invoked when an error is received in listener or route | ||
| // resource. This includes cases where: | ||
| // - The listener or route resource watcher reports a resource error. | ||
| // - The received listener resource is a socket listener, not an API listener - TODO : This is not yet implemented, tracked here #8114 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could generalize this and specify that any resource validations performed at the DependencyManager that fail, also lead to OnError being invoked on the watcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not necessarily true, since cluster error are stored separately in a struct and endpoint error are stored in resolution note. Only errors in Listener and route resource are sent using OnError function.
| OnError(error) | ||
| } | ||
|  | ||
| func (m *DependencyManager) maybeSendUpdate() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this called maybeSendUpdate? Under what conditions will it not send an update? Can this be captured in its docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to check if the whole cluster tree is resolved , even if we have one leaf endpoint missing, we might not send the update and the checks for that will go in this function, and also c++ and java both have the same name so that got stuck in my head... Let me know if we should change it?
| // Only executed in the context of a serializer callback. | ||
| func (m *DependencyManager) onListenerResourceUpdate(update *xdsresource.ListenerUpdate) { | ||
| if m.logger.V(2) { | ||
| m.logger.Infof("Received update for Listener resource %q: %v", m.ldsResourceName, pretty.ToJSON(update)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all these usages of pretty.JSON, please consider switching them to native formatting directives. Experiment with a few of them like %v, %+v, %#v, %+V, %#V and see which one provides the best output and use that.
| } | ||
|  | ||
| func (m *DependencyManager) applyRouteConfigUpdate(update xdsresource.RouteConfigUpdate) { | ||
| matchVh := xdsresource.FindBestMatchingVirtualHost(m.dataplaneAuthority, update.VirtualHosts) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/matchVh/matchVH to comply with Go initialisms.
| // Only executed in the context of a serializer callback. | ||
| func (m *DependencyManager) onListenerResourceError(err error) { | ||
| if m.logger.V(2) { | ||
| m.logger.Infof("Received resource error for Listener resource %q: %v", m.ldsResourceName, err) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have some code in the xDS client to ensure that the returned errors contain the xDS node ID. Could you please ensure that that property still holds. Thanks.
| m.rdsResourceName = "" | ||
| if m.routeConfigWatcher != nil { | ||
| m.routeConfigWatcher.stop() | ||
| m.routeConfigWatcher = nil | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have to set the matching virtual host to nil here?
It would be nice if we have a method to do all the cleanup when a listener resource error or a listener resource update invalidates the previously received route config. I see similar code in onListenerResourceError, but that one sets the matching virtual host to nil as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont set to virtual host to nil because we are going to update the virtual host in the function below using the inline route resource that we get. And we set it to nil in `OnListenerResourceError because we want to invalidate the route resource. Here we are just updating it and cancelling just the watchers since we get the resource inline.
| m.rdsResourceName = "" | ||
| m.currentVirtualHost = nil | ||
| m.routeConfigWatcher = nil | ||
| m.watcher.OnError(status.Errorf(codes.Unavailable, "Listener resource error : %v", err)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if the watcher is going to be given status errors, we need to document that clearly along with what status codes are returned when. And why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. We do not need status code. I looked at C++ code and got confused between absl status codes and gRPC status codes.
| if m.rdsResourceName != resourceName { | ||
| // Drop updates from canceled watchers. | ||
| return | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing we are going need code like this for cluster and endpoint watchers as well. Can we make this part of the watcher instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I have changed the listener and route watchers let me know if it looks good.
| //If update is not for the current watcher | ||
| if m.rdsResourceName != resourceName { | ||
| return | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
| AggregateConfig AggregateConfig // List of children for aggregate clusters. | ||
| } | ||
|  | ||
| // AggregateConfig contains a list of leaf cluster names. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right!
| LeafClusters []string | ||
| } | ||
|  | ||
| // EndpointConfig contains resolved endpoints for a leaf cluster either from DNS | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes , the Resolution note will also be set when we have an ambient error along with the old endpoints.
|  | ||
| // RouteConfig is the route configuration resource update. It will be | ||
| // populated even if RouteConfig is inlined into the Listener resource. | ||
| RouteConfig RouteConfigUpdate | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change other resources to pointer after #8652 is merged
|  | ||
| func (l *listenerWatcher) stop() { | ||
| l.cancel() | ||
| l.parent.logger.Infof("Canceling watch on Listener resource %q", l.resourceName) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed but I wanted to iknow where the log should be printed unconditionally where we should put a V(2) check, because I thought shutdown and cancel should be default becuase its useful information.
| serializer *grpcsync.CallbackSerializer | ||
| serializerCancel context.CancelFunc | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| OnError(error) | ||
| } | ||
|  | ||
| func (m *DependencyManager) maybeSendUpdate() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to check if the whole cluster tree is resolved , even if we have one leaf endpoint missing, we might not send the update and the checks for that will go in this function, and also c++ and java both have the same name so that got stuck in my head... Let me know if we should change it?
|  | ||
| func (m *DependencyManager) maybeSendUpdate() { | ||
| if m.logger.V(2) { | ||
| m.logger.Infof("Sending update to watcher: Listener: %v, RouteConfig: %v", pretty.ToJSON(m.currentListenerUpdate), pretty.ToJSON(m.currentRouteConfig)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think adding this is resolver looks better.
| m.rdsResourceName = "" | ||
| if m.routeConfigWatcher != nil { | ||
| m.routeConfigWatcher.stop() | ||
| m.routeConfigWatcher = nil | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont set to virtual host to nil because we are going to update the virtual host in the function below using the inline route resource that we get. And we set it to nil in `OnListenerResourceError because we want to invalidate the route resource. Here we are just updating it and cancelling just the watchers since we get the resource inline.
| m.rdsResourceName = "" | ||
| m.currentVirtualHost = nil | ||
| m.routeConfigWatcher = nil | ||
| m.watcher.OnError(status.Errorf(codes.Unavailable, "Listener resource error : %v", err)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. We do not need status code. I looked at C++ code and got confused between absl status codes and gRPC status codes.
| if m.rdsResourceName != resourceName { | ||
| // Drop updates from canceled watchers. | ||
| return | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I have changed the listener and route watchers let me know if it looks good.
| m.rdsResourceName = "" | ||
| m.currentVirtualHost = nil | ||
| m.routeConfigWatcher = nil | ||
| m.watcher.OnError(fmt.Errorf("listener resource error : %v", err)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we annotate the errors passed to other components too, or just make sure the are annotated with node id when they are actually printed?
This PR moves the LDS and RDS watchers to dependency manager without chaning the current functionality or behaviour. This is a part of implementation of gRFC A74.
RELEASE NOTES: None