Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(RFC): open core module system v1 #1063

Draft
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

StarpTech
Copy link
Contributor

Motivation and Context

TODO

@jensneuse
Copy link
Member

jensneuse commented Aug 11, 2024

A few goals I'd like to see covered:

  • load my own config from my own config provider
  • modify any config after load but before using it for a new mix
  • allow the user to customize the transport (we're currently wrapping the default transport, let the user bring their own)
  • hook into events like normalization, validation, parse (before, after)
  • allow the user to define additional validation rules/steps
  • Ideally, calling hooks is built in a way that automatically generates meaningful traces so that we're able to understand if and how much time the Router spends in a hook
  • it should be easy to create a custom module / plugin as a go module which can then be imported and configured easily, making it simple to re-use and share custom extensions
  • it would be nice if each module had it's own namespace (mandatory), like a prefix, so that the module can easily extend the existing Router config.yaml with it's own configuration without conflicts
  • it should be possible to define custom authentication hooks
  • it should be possible to define custom authorization hooks
  • it should be possible to define custom rate limit hooks

Additional comments:
Should we call it Gateway or Router? We talk a lot about Router, should we stick to this convention?

I'll look more into the RFC and give more specific feedback later/tomorrow.

rfc/custom-modules-v1.md Outdated Show resolved Hide resolved
rfc/custom-modules-v1.md Outdated Show resolved Hide resolved
@StarpTech StarpTech changed the title chore(RFC): custom module system v1 chore(RFC): open core module system v1 Aug 17, 2024
@github-actions github-actions bot removed the router label Aug 19, 2024
- GraphQLOperationHooks: Provides hooks for parsed, normalized, and planned GraphQL operations.
- `GraphQLOperationParseHook`: Called when an operation is parsed.
- `GraphQLOperationNormalizeHook`: Called when an operation is normalized.
- `GraphQLOperationPlanHook`: Called when an operation is planned.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will user be able to use GraphQLOperationPlanHook for monitoring the planCache or a separate GraphqlOperationPlanCacheHook is needed? It could be useful for monitoring the cache hit/miss and maybe even modify the caching policy based on their needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a possible way. The current direction is to provide additional information in the context e.g. if a plan was fetched from the cache could be on the context of the PostGraphQLOperationPlanHook

type RouterRequestHook interface {
// OnRouterRequest is called when a request is made to the router and after all GraphQL information is available
// Returning an error will result in a GraphQL error being returned to the client.
OnRouterRequest(ctx *core.RouterRequest, err error) error
Copy link

@kaialang kaialang Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For custom metrics, what's the plan for users to measure end-to-end latency of a request? Maybe something like the following? Do we have any concerns on context propagation throughout the entire request lifecycle? In case of early exit or errors, OnRouterResponse never get called, we lose those latency metrics.
func (m *MyTallyModule) OnRouterRequest(req *core.RouterRequest, err error) error {
req.Context = context.WithValue(req.Context, startTimeKey, time.Now())
return nil
}

func (m *MyTallyModule) OnRouterResponse(res *core.RouterResponse, err error) error {
startTime, ok := res.Context.Value(startTimeKey).(time.Time)
if !ok {
return nil
}

latency := time.Since(startTime)
timer := m.TallyScope.Timer("request_latency")
timer.Record(latency)

return nil

}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think about it, my solution has the assumption of OnRouterRequest is "PreRouterRequest". Without knowing exact point on OnRouteRequest, this may or may not work.

Schema *graphql.Schema
}

type Operation struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we are using Content() from OperationContext. How to get it in the new version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This RFC tries to approach the overall picture of the system. Everything what was possible in the old system must be possible in the new system just in a different way. The original and normalized query will be available on the operation object as well.

type SubgraphRequestHook interface {
// OnSubgraphRequest is called when a subgraph request is made
// Returning an error will result in a GraphQL error being returned to the client.
OnSubgraphRequest(ctx *core.SubgraphRequest, err error) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to use request *http.Request? In the current version we make headers propagation ourselves without using router configs, we also add custom headers.

The same applies to subgraph responses.

Plus to avoid getting a race we put a mutex in the context and use it when changing req/resp


type RouterErrorHook interface {
// OnError is called when an error occurs during the router lifecycle
OnRouterError(err error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to use it yet. I would like to be able to replace the status code in the subgraph responses, to decide which are good and which are not, which should be retrained and which should not. And then modify the router's final response after gluing and downloading everything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface will have to be driven to your error type somehow? Or how to change it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that what I described will be done in OnRouterResponse https://github.com/wundergraph/cosmo/pull/1063/files#r1778364936? And this method will be triggered after OnRouterResponse?

// OnMetric is called when a metric is recorded
// Returning an error will result in a telemetry error
OnMetric(metric *metric.Metric) error
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see OnSpan() and OnMetric() can be useful in some cases. But adding attributes can be achieved by adding keyValue pairs in request context (please correct me if I'm wrong).

This is a different level of abstraction. I would strongly prefer a more flexible hook system, eg, using Pre/Post, that allows to metric/tracing plugins. We can still provide default Prometheus, OTel metrics/tracings by building default plugins on top of the hook system.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 11, 2024
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Sep 25, 2024
@StarpTech
Copy link
Contributor Author

Closed accidentally from bot


// Rewrite errors in the response
if len(res.Response.Errors) > 0 {
for _, err := range res.Response.Errors {
Copy link
Contributor

@flymedllva flymedllva Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.1 We are interested in changing the message, as I understand we will now have this access.
1.2 It is not clear how we will work with untyped extensions, through fastjson? It's important for us not to change custom extensions that we didn't expect.
2. We would also like to understand which subgraph the error belongs to, so that we can change it with this logic in mind. Let's say the status code returned a subgraph
3. We want to do ctx.Set, so that we can get the modified errors in other methods (for example, for logging).

Request *core.GraphQLRequest
// The parsed, normaliazed and planned operation with all the information like name, variables, type, document representation,
// client name version, uploaded files, plan, normalization, persisted operation etc.
Operation *core.Operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere in here I wanted to be able to get

	Id string
	Name string
	Url *url.URL
	UrlString string
}

So that this information can be used when changing the error.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 14 days.

As of today, customers can extend the router with custom modules. These modules can be used to change the behavior of the router, add custom logic, or integrate with other systems. The current module system has several limitations that we want to address with this RFC:

- The current module system is not native to GraphQL. It is based on HTTP middleware and does not provide a GraphQL-specific API.
- The current module system is inconsistent and hard to use. It does not provide a clear API for developers to intercept and modify GraphQL requests and responses.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Great stuff here. One thing that I didn't see is the possibility to hook into the EDFS lifecycle.
Would it be possible with the Custom Module System to intercept subscription requests to subgraphs and inject a custom transport, similar to the Kafka or NATS that you already provide.

I would be particularly interested in adding Redis PubSub support to EDFS.

Thx,
Goran

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ghorak-happening, that's a great idea. The goal of this RFC is to propose an architecture that is flexible enough to add those extensions later. The first version will focus on the fundamentals. I'm going to add your requirements to the list so as not to forget them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great @StarpTech, thank you!

@github-actions github-actions bot removed the Stale label Oct 23, 2024
@github-actions github-actions bot added the Stale label Nov 14, 2024
@StarpTech StarpTech removed the Stale label Nov 14, 2024
@wundergraph wundergraph deleted a comment from github-actions bot Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants