Early execution with @defer, and why we shouldn't specify it #68

benjie · 2023-06-12T16:13:30Z

benjie
Jun 12, 2023
Maintainer

Hi folks, I can't remember if I'd raised this anywhere else (and failed to track it down) so am starting a new discussion for it.

When executing a selection set, I see that we have three options for handling deferred fields within the selection set:

Call the resolver straight away, and push the resulting promise onto a queue (or similar).
Wait a moment (e.g. a runloop tick or similar) and then do the same as 1.
Don't even call the resolver until the entire non-deferred result is sent to the user.

To me (1) and (2) are essentially the same thing, so I'll group them together as "early execution", and number (3) I'll call "deferred execution". In deferred execution, execution is split into layers at defer boundaries, and the next layer of defers do not start executing until the previous (or initial) layer is complete. In early execution, execution starts straight away (or very soon), but the results are not awaited until the previous layer has been sent.

Why allow early execution?

It seems fairly obvious: the earlier we start executing something, the sooner it can finish... right?

Michael laid out some really good points with early testing of stream/defer, indicating that early testers of the technology found that it was not useful to them if it deferred execution of the @defer'd fields because it increased latency of the entire request being completed significantly. (We should keep in mind this datapoint was with the early version of stream/defer that had significant result duplication.)

Note that even in option (3), early execution is allowed, it's simply not specified - the observable result according to an external viewer should appear the same as if execution had been deferred.

Why not specify early execution?

Resolvers contain arbitrary logic, with arbitrary interactions. We cannot know what a resolver will try to do, we cannot figure out whether it's likely to hold up other resolvers or not.

Imagine you have GraphQL resolvers and a query like this:

// Schema:
const resolvers = {
  Query: {
    superExpensive(_, __, {db}) {
      return db.query('select expensive_calculation() from massive_table where expensive_conditions');
    },
    cheap(_, {id}, {db}) {
      const { username } = await db.query('select username from users where id = $1', [id]);
      return await db.query('select * from friends where username = $1', [username]);
    }
  }
};

// Context:
const context = {
  db: databaseConnection,
};

// Query:
const query =
  `{
    ... @defer { superExpensive }
    cheap
  }`;

Note that our resolvers use the db on context, which is a connection to our database. Imagine our database uses standard query-response protocol (no multiplexing).

When executing this request with (1), the result might be the following queries being issued:

-- `superExpensive` query:
select expensive_calculation() from massive_table where expensive_conditions;

-- [ 5 second delay waiting for the results from the `superExpensive` query]

-- `cheap` first query:
select username from users where id = 1;

-- `cheap` second query:
select * from friends where username = '...';

-- [ send result of `cheap` to user ]
-- [ send result of `superExpensive` to user ]

With (2), it might be:

-- `cheap` first query:
select username from users where id = 1;

-- [ wait a tick or similar before kicking off `superExpensive` query ]

-- `superExpensive` query:
select expensive_calculation() from massive_table where expensive_conditions;

-- [ 5 second delay waiting for the results from the `superExpensive` query]

-- `cheap` second query:
select * from friends where username = '...';

-- [ send result of `cheap` to user ]
-- [ send result of `superExpensive` to user ]

Either way, the result of the cheap field cannot be returned until the superExpensive field has completed - so the entire response would be delayed 5 seconds, and then the superExpensive and cheap fields would arrive at basically the same time - defer has served no purpose.

With (3), however, the execution would look like:

-- `cheap` first query:
select username from users where id = 1;

-- `cheap` second query:
select * from friends where username = '...';

-- [ send result of `cheap` to user ]

-- `superExpensive` query:
select expensive_calculation() from massive_table where expensive_conditions;

-- [ 5 second delay waiting for the results from the `superExpensive` query]
-- [ send result of `superExpensive` to user ]

Note that this is what a user would expect - the result of cheap comes near instantly to the user, and then superExpensive arrives 5 seconds later after it has been calculated.

Trade-offs of deferred execution

The main trade-off of deferred execution is that it can concretely increase the execution time. If you think of execution as a Gantt chart, deferred execution would make the critical path longer - it would add together the latency of each "defer layer" of execution, not allowing for concurrency (overlapping) between parent and child defers.

What I think we should do

Given we do not control or really have any say over the content of resolvers in a user's schema, I don't think we can safely specify that you should always use early execution - the result is that many user's schemas may fall into this trap where defer turns out to be useless noise.

Specifying both early execution and deferred execution would make the spec significantly more complex, and I am personally of the belief that adding both to the specification is undesirable.

What I propose is that we specify deferred execution only, and that we add a non-normative note to the spec indicating that should the implementer want to, they may start execution of deferred fields early, but they should be aware of the above issue. This non-normative note would be enabled by this part of the spec:

Conformance requirements expressed as algorithms can be fulfilled by an implementation of this specification in any way as long as the perceived result is equivalent. Algorithms described in this document are written to be easy to understand. Implementers are encouraged to include equivalent but optimized implementations.
-- https://spec.graphql.org/draft/#sel-EABDLDFAACHAo3V

Other issues

"nested mutations" (mutations one layer deeper than the Mutation type fields) could have unpredictable results (no-one should be using these anyway, but people do and some even do conference talks on it 😬)
error handling - much easier to handle errors in the deferred execution case (see Rob's post below)

robrichard · 2023-06-12T20:17:02Z

robrichard
Jun 12, 2023
Maintainer

For early execution there are some precautions the implementing server will need to take.

Example:

query {
  foo {
    nonNullableFieldThatErrors
    ... @defer {
      bar
    }
}

If a server implements "early execution" it may begin executing the defer wrapping the bar field before it receives the error from nonNullableFieldThatErrors. In that case it needs to cancel or filter the deferred result for bar or else it will deliver a result like this:

[
  {
    "data": { "foo": "null" },
    "hasNext": true
  },
  {
    "incremental": [
      {
        // WRONG! invalid path, ["foo"] is null in previous response
        "path": ["foo"],
        "data": { "bar": "BAR" },
      }
    ]
  }
]

(This was discussed in #45)

There's non-obvious and non-trivial logic needed to cancel/filter these invalid results due to error bubbling with "early execution". If a server does "deferred execution" this logic is a no-op because its not possible for this scenario to be encountered.

My current thinking is that we should write the spec with "early execution" if that's what we think most implementations will end up doing. In that case we can capture the logic needed for handling error bubbling in spec algorithms. "Deferred execution" would still be allowed due to the conformance clause, and those implementations can simply skip the filtering functions.

If we only specify "deferred execution" we leave open the possibility of implementors missing this non-obvious and non-trivial logic and creating buggy implementations.

2 replies

benjie Jun 13, 2023
Maintainer Author

I think we should only support deferred execution, and that's what implementations should do by default including the reference implementation, and if implementations choose to implement early execution (potentially through opt-in on a per-resolver basis?) then they must ensure that they remain compliant and their externally observable results are unchanged.

michaelstaib Jun 16, 2023

I am unsure on this. I would like to first see some perf characteristics from a real-world app to see how this fairs. In many cases such blocking behaviour would not happen. But true this is a possibility that also could happen when semaphores are in play. We need to do a couple of benchmarks that show a bit what the impact is of deferred execution vs early execution.

yaacovCR · 2023-06-13T10:30:46Z

yaacovCR
Jun 13, 2023

This is a great write-up of a potentially serious pitfall with early execution (or what I've been calling semi-concurrent execution)¹.

A solution hinted at above would be to open up separate connections to whatever resource is shared between the initial and deferred fields to avoid contention. Naively, this could be done for each deferred payload, whereas for a more complex solution, we could have a generic pool that ingests the overall priority of the current field (c.f. generic-pool) which would need to be passed down to resolvers -- and to the underlying business logic that resolvers call! In a limiting case, perhaps the pool could have a maximum of 2 connections, one for initial results, one for all deferred results.

Just as setting up dataloaders on the context to defeat the 1 + n problem is unspecified, we would not need to specify the exact solution -- but additional information in a non-normative note would certainly be helpful.

However, the above fails on the theoretical level to solve the issue. At some point, we will inevitably "lose control" over the concurrency of the end-resources we are calling. In particular, those end-resources may call out to other even lower-level resources and either be unaware of the need to manage this concurrency with multiple connections and priority queues, or, perhaps more likely (?), run into a resource which cannot physically or otherwise multiplex. At that point, work for deferred fields which has begun semi-concurrently will have forced its way into a queue prior to later scheduled work for the initial result (or lower-order defers) and will begin blocking, with ripple effects all the way up to our nicely managed higher-level resources.

Will all or any of the above be a common failure mode in practice? I would love to hear from real world users of defer, in particular thinking of the folks at Meta (cc @mjmahone). If their use of incremental delivery somehow avoided this concern, was it because of some optimizations that are available "at Facebook scale," but not at the scale of the typical GraphQL implementation? Real world experience seems key here.

Parenthetically, while it is true that implementations can do multiple things to prioritize the initial non-deferred work over the later deferred work besides (2) or (3), such as setting thread priorities in a multi-threaded architecture, those changes would not impact the issues described above and are appropriately excluded from @benjie 's analysis. ↩

7 replies

robrichard Jun 13, 2023
Maintainer

Is there a fourth option where the GraphQL execution provides additional implementation specific information to resolvers, so resolvers can understand if the current field is deferred? Then the backends that have made these architectural choices can build ways to address the relevant issues?

For example, GraphQL-JS has GraphQLResolveInfo which is passed as an argument to every resolver. If it was updated to contain enough information to determine that the current field is being deferred, developers can build ways to use lower priority queues, lower priority queue access, etc without forcing increased execution time on servers that do not have these architectural constraints.

yaacovCR Jun 13, 2023

I think that is essentially a form of option 1

benjie Jun 15, 2023
Maintainer Author

A simple way to do that @robrichard might be await resolveInfo.parentDefer; or similar (this would be a shared promise that represents the parent stage, so memory overhead would be minimal until we get into lists of lists). This would allow people to get themselves out of trouble if it turns out that one or more of their resolvers suffers from this problem. I'd call this an "opt-out" in that early execution is performed by default, but the resolver can "opt out" of it by waiting for the parent layer to complete.

This to me is option 2: we (the spec WG) have accepted the trade off, and we're telling schema designers that if they have code that suffers from this problem then it's their problem (rather than a bug in the spec), and they should address it by awaiting the parentDefer completion before continuing execution. To me it isn't option 1 because it doesn't put any constraints on their business logic, it's entirely handled by the resolver (which delays calling the business logic until the previous layer has completed).

yaacovCR Jun 16, 2023

Edited, updated to match current state of: graphql/graphql-js#3911

I have implemented exposing to resolvers both:

(A) the deferPriority of the current field: this is a number that begins at 0, and increases by 1 for each wrapping Deferred Fragment, taking into account, of course, whether the fields exist both in deferred and non-deferred fragments. A separate streamPriority increases by one also for each stream items result; this is kept separate because the deferPriority will exactly match the now-exposed field details of the collected fields for the given field group, exposed for look-ahead purposes¹.

(B) the published² status of the containing defer: a fragment is considered published if it’s parent has completed and so the pending notification has/can be sent. this is exposed as a Boolean “true” value in the case that the parent has already completed, so that the resolver does not have to wait a tick, or as a promise³ that will resolve when the parent completes and the given defer’s pending may be published.

Solution (A) is for when you believe your business logic can handle the potentially concurrency clashes itself given the info on what to prioritize. Solution (B) is for when it can't.

The info argument used to just expose the collected field nodes for look-ahead purposed. It now exposes these field nodes annotated by the details for each field, i.e. exposing the defer hierarchy including the priorities of the given defer targets. The deferPriority of the resolver will be exactly equal to the lowest numerical value of the priorities of the collected field nodes, i.e. the highest priority. AstreamPriority greater than zero will indicate that this is part of a streamed response, but without interfering with the correlation between the exposed field information and the deferPriority. ↩
I called this published rather than parentCompleted because I wanted a word that made sense for the resolvers for fields on the initial result. I don't love published, but parentCompleted seemed worse. Looking for suggestions! ↩
An additional option is to provide a declarative way to disable early execution for a given resolver. This would save an additional tick because in the current implementation the IncrementalPublisher could immediately activate resolvers opting for delayed execution rather than the resolvers having to await a promise. ↩

yaacovCR Jun 20, 2023

strawman dataloader PR that disentangles initial results from deferred payloads graphql/dataloader#343 — a more advanced feature would reorder scheduling of batches by priority so that high-priority batches would be sent prior to low... cc @IvanGoncharov

IvanGoncharov · 2023-06-13T19:53:15Z

IvanGoncharov
Jun 13, 2023
Maintainer

@benjie Great write-up!
I need to think more about what is a reasonable solution.

That said, can we change a question a bit?
For me a question here: "what is deferred by the @defer directive?"

From the point of the particular field, there are three main stages:

Value resolution https://spec.graphql.org/draft/#sec-Value-Resolution
Value completion https://spec.graphql.org/draft/#sec-Value-Completion
Value delivery - sending completion result over the network

So there are three options on how we can define what @defer does:

deferring delivery (minimal option)
deferring completion
deferring resolution (maximal option)

@benjie As I understand your initial comment:

"early execution" means just deferring delivery.
"deferred execution" means deferring resolution.
Is it correct?

1 reply

benjie Jun 15, 2023
Maintainer Author

Yes that is correct 👍

yaacovCR · 2023-06-20T10:03:01Z

yaacovCR
Jun 20, 2023

My current thinking:

The decision facing us is narrower than it first appears. In deciding whether to "specify," early execution what we mean is whether the new execution algorithm in the specification should show how to do it, NOT whether to allow or disallow early execution. [By the "algorithmic conformance" rule, even if we specify early execution, an implementation can choose to always (or sometimes) disable it, and even if we don't specify it, an implementation can choose to always (or sometimes) enable it => as order/result format is preserved.]
Moreover, we are not even deciding whether to "reference" it within the spec. Even if we are not algorithmically specifying early execution, we would be including a note about (A) the option, (B) potential pitfalls, and (C) suggestions for managing them; even if we are algorithmically specifying early execution, we would include a note about the (B) potential pitfalls, (C) suggestions for managing them, and (D) the option to disable it.
So in terms of the debate, we are left with the narrower decision of whether the incremental delivery execution algorithm included within the spec should demonstrate early execution. What are the specific pros and cons of that more limited question? For exampe, it's not fair to say that a con is that the initial result may be delayed; that's a con of USING early execution, not of specifying it! On that narrower question, see below:

Question: Should we have the incremental delivery execution algorithm included within the spec demonstrate handling of early execution?

Pros:

The spec will demonstrate to other implementors how to implement incremental delivery with early execution, which is non-trivial (basic pro of including any advanced algorithm)
Allowing early execution is the more generic algorithm; the algorithm will also work with delayed execution.
graphql-js users sometimes get to use early execution

Cons:

The spec changes are significant and distracting for implementors that choose to NEVER enable early execution.

My verdict:

I think most implementors would include the option to at least SOMETIMES enable early execution, and so I think the con is limited.

A thought:

Another option to mitigate the main con is that we specify both algorithms within the spec, early execution and delayed execution, with one in the main text, and another in an appendix.

1 reply

yaacovCR Dec 12, 2023

UPDATE: the spec PR for deduplicated incremental delivery has been significantly reworked such that the differences within the algorithm between early and deferred execution are extremely limited:

There are only three points of divergence:

3 lines describing what to do if early execution of streamed fields is desired:

If early execution of streamed fields is desired:

Following any implementation specific deferral of further execution,
initiate {future}.

3 lines describing what to do if early execution of deferred fields is desired:

Otherwise, if early execution of deferred fields is desired:

Following any implementation specific deferral of further execution,
initiate {future}.

And 4 extra lines later on to deal with the fact that early execution may have occurred:

Let {maybeCompletedFutures} be a new empty list.

For each {maybeUninitiatedFuture} in {maybeUninitiatedFutures}:

If {maybeUninitiatedFuture} has not been initiated, initiate it.

Append {maybeUninitiatedFuture} to {maybeCompletedFutures}.

The above con of specifying early execution has been significantly reduced, as early execution no longer significantly complicates the algorithm.

Keweiqu · 2023-06-28T18:43:18Z

Keweiqu
Jun 28, 2023

Also sharing some practical evidence that early execution of deferred is needed from Meta's usage of @defer -- what we found is that frequent incremental updates from the server lead to more battery consumption, more stalls on video play and reduced scroll performance and touch responsiveness (mostly likely due to increased sync and async update (consistency update, re-render, and re-mount etc).

There is also some(although very limited) evidence to suggest that frequent flushes (i.e. breaking responses into too many pieces) result in memory fragmentation on the client side, hence increase the chances of foreground app death (FADs)/app crashes.

So I agree with @yaacovCR that implementors should be allowed the option to enable early execution, as long as they can tell client in the response that they've fulfilled @defer eagerly so that client won't wait for the defer portion that will never come.

7 replies

Keweiqu Jun 30, 2023

@benjie honestly the algorithm is much simpler in Meta. We decide on whether we execute now or defer a "deferrable" part of the query depending solely on how critical it is to the latency, not on how expensive the execution is. It's very hard to get a sense of how expensive a portion might be before execution (theoretically we can get it from past profiling data, but that's too much trouble and we don't do that.)

So for the first payload/flush, we defer everything marked with @defer because we know the first payload is blocking client to do other stuff, so even it's a super cheap @defer, we don't execute it. For rest of the @deferred payloads (either coming from @stream or @defer), we just eagerly execute them.

I know we are not focusing on @stream right now, but we also encourage product teams to use an exponential backoff strategy as the default batching strategy because the later the element is, the less critical it is to latency. So if we wanna fetch 10 news feed stories with @stream. We do 1 + 2 + 4 + 3 (it would be 8 if the page limit is not 10.)

benjie Jul 3, 2023
Maintainer Author

The exponential back-off on stream is a really great idea; thanks for sharing!

So for the first payload/flush, we defer everything marked with @defer because we know the first payload is blocking client to do other stuff, so even it's a super cheap @defer, we don't execute it.

@Keweiqu Just to confirm that I understand you, you're saying that if you decide to honour the @defer then you do not even start executing the deferred field's resolver until after the initial payload is delivered? (This is what we've termed "deferred execution" - where the resolver is not called until after the previous "layer" is delivered.)

yaacovCR Jul 3, 2023

@Keweiqu thanks again for sharing, echoing @benjie 's request for clarification, also with respect to this part:

For rest of the @deferred payloads (either coming from @stream or @defer), we just eagerly execute them.

That sounds like to me (in the context of your earlier comment) that at Meta, defers spawned by a deferred/streamed payload are always inlined, i.e. the defer is ignored/not honored. Is that right?

Keweiqu Jul 3, 2023

@benjie @yaacovCR

if you decide to honour the @defer then you do not even start executing the deferred field's resolver until after the initial payload is delivered? (This is what we've termed "deferred execution" - where the resolver is not called until after the previous "layer" is delivered.)

Yes! Since our server is single threaded, even the cheapest @defer will incur some latency delay. The reason is most functions are CPU blocked first, then they make some I/O calls and become IO blocked. So by trying to eagerly execute a cheap defer along with other fields in the selection set, we add to the CPU constraint at the beginning.

defers spawned by a deferred/streamed payload are always inlined, i.e. the defer is ignored/not honored. Is that right?

This is what we are trying to move to. Development ongoing. But the metric movement from tests are pretty strong and suggest this will be the ultimate direction.

yaacovCR Jul 4, 2023

Thanks for the clarification! That is definitely a surprise for me. Just to make it explicit, it looks like Meta is (or will soon be) following @benjie approach of delayed execution for the first defer and then inlining/not deferring at all, so early execution (or what I have sometimes been calling semi-concurrent parallel execution) is not being used. Early execution could be used instead of inlining to reduce the number of payloads, but inlining reduces it predictably and to its minimum!

I did imagine that nested defers would be useful as an analog to nested React Suspense components. Does Meta not have those? Or prefer to send wholly separate requests in that case?

yaacovCR · 2023-07-06T19:31:14Z

yaacovCR
Jul 6, 2023

that you do early execution (i.e. even if you plan to defer something you start executing it right away) and if it happens to complete early enough (i.e. before the batch goes out) then you inline it? If so, I guess you would have to track completion of that entire tree, I’d love to see your algorithm for that

fyi for those interested, a follow-on PR at graphql/graphql-js#3895 does exactly that, performing in-place inlining of all of the "executed-early-and completed" children of a given incremental result as that result is sent.

we do indeed have to keep track of the tree of all results in progress (although we were tracking all of those anyway)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early execution with @defer, and why we shouldn't specify it #68

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 18 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Early execution with @defer, and why we shouldn't specify it #68

benjie Jun 12, 2023 Maintainer

Why allow early execution?

Why not specify early execution?

Trade-offs of deferred execution

What I think we should do

Other issues

Replies: 6 comments · 18 replies

robrichard Jun 12, 2023 Maintainer

benjie Jun 13, 2023 Maintainer Author

Footnotes

robrichard Jun 13, 2023 Maintainer

benjie Jun 15, 2023 Maintainer Author

Footnotes

IvanGoncharov Jun 13, 2023 Maintainer

benjie Jun 15, 2023 Maintainer Author

benjie Jul 3, 2023 Maintainer Author

benjie
Jun 12, 2023
Maintainer

Replies: 6 comments 18 replies

robrichard
Jun 12, 2023
Maintainer

benjie Jun 13, 2023
Maintainer Author

robrichard Jun 13, 2023
Maintainer

benjie Jun 15, 2023
Maintainer Author

IvanGoncharov
Jun 13, 2023
Maintainer

benjie Jun 15, 2023
Maintainer Author

benjie Jul 3, 2023
Maintainer Author