Tests contain non-zero timers #548

ricea · 2016-10-20T07:08:29Z

Short timers make tests flaky. From the Testharness.js API Documentation:

Note that timeouts generally need to be a few seconds long in order to produce stable results in all test environments.

Long timeouts slow down all developers by making the the tests take longer to run. They also waste resources on automatic testing infrastructure.

Mostly timers are being used to ensure that the implementation never takes some action that it shouldn't. For example (from writable-streams/start.js):

promise_test(() => {
  const ws = recordingWritableStream({
    start() {
      return Promise.reject();
    }
  });

  // Wait and verify that write or close aren't called.
  return delay(100)
      .then(() => assert_array_equals(ws.events, [], 'write and close should not be called'));
}, 'underlying sink\'s write or close should not be invoked if the promise returned by start is rejected');

This test never[1] flakes on a correct implementation. Suppose hypothetically that there existed an implementation which called close() here after a asynchronously performing an action which takes an arbitrary amount of time. Then this test would be flaky on that implementation: it would fail when the action took <100ms, and pass otherwise.

I propose that we make the simplifying assumption that no reasonable implementation will do the wrong thing after a delay. We can then disregard the above hypothetical implementation as being unreasonable, and so consider its false positive on the above test to be acceptable.

I furthermore propose that we consider two times around the event loop to be long enough to see whether an implementation is going to do the wrong thing or not. My reasoning is that the implementation may run a callback after the user code, which then posts another async callback which finally does the wrong thing.

I think we should have a function flushAsyncEvents() which is equivalent to delay(0).then(delay(0)). The above test would then look like

promise_test(() => {
  const ws = recordingWritableStream({
    start() {
      return Promise.reject();
    }
  });

  // Verify that write or close aren't called.
  return flushAsyncEvents()
      .then(() => assert_array_equals(ws.events, [], 'write and close should not be called'));
}, 'underlying sink\'s write or close should not be invoked if the promise returned by start is rejected');

[1] Test frameworks usually have a hard timeout, which means that even passing tests will fail occasionally. The longer a test takes, the more likely it is to hit the hard timeout. This is seen in practice on the Blink layout tests.

The text was updated successfully, but these errors were encountered:

domenic · 2016-10-20T14:33:03Z

This sounds good to me. Minor correction: delay(0).then(() => delay(0)).

It doesn't take care of some remaining cases where we're actually using the timers for sequencing, but we can have a separate discussion about those.

Add a util function flushAsyncEvents() which spins the event loop to ensure that the implementation doesn't do something it shouldn't asynchronously. Use this function to replace trivial uses of delay(100) and similar to improve test determinism. This change relates to whatwg#548. Tests which use multiple delays to express ordering requirements have not been changed. Those tests require individual analysis.

Add a util function flushAsyncEvents() which spins the event loop to ensure that the implementation doesn't do something it shouldn't asynchronously. Use this function to replace trivial uses of delay(100) and similar to improve test determinism. This change relates to #548. Some non-zero delays were just there to trigger asynchronous behaviour. Replace these with delay(0). Tests which use multiple delays to express ordering requirements have not been changed. Those tests require individual analysis.

Test 'Piping to a WritableStream that does not consume the writes fast enough exerts backpressure on the ReadableStream' in streams/piping/flow-control.js is flaky on Chromium memory sanitizer bots due to its use of timers. Change to use an interleaved step design. Each step does not start until the previous one has completed. Relevant to whatwg/streams#548.

ricea · 2017-11-10T08:15:31Z

Some notes from recently porting some tests to not use non-zero delays:

If there is only one delay() in the test, then it's probably only being used to test async behaviour, and can just be replaced by delay(0). If there are negative assertions (assertions that something has not happened) in the test then you may need to use flushAsyncEvents().

Even many tests that use delay(n) for sequencing can be easily rewritten without non-zero delays. Here's four examples, only one of which results in a non-deterministic ordering:

// Example (1)
delay(10).then(a);
delay(20).then(b);

// Example (2)
delay(10).then(a);
someOtherEvent.then(() => delay(20)).then(b);

// Example (3)
someOtherEvent.then(() => delay(10)).then(a);
delay(20).then(b);

// Example (4)
delay(10).then(a);
delay(10).then(b);

In examples (1), (2) and (4), a happens deterministically before b, and so they can be rewritten using chained promises. In example (3) the order may be nondeterministic.

In the general case, tests with non-zero delays implicitly test the ordering of events and should be rewritten to explicitly test ordering instead. Usually the recording stream test helpers in streams/resources/recording-streams.js are the right tools for this.

The tricky part is identifying those implicitly tested ordering expectations. It's easy to rewrite the test so it looks the same and passes, but accidentally reduce the power of the test in identifying incorrect implementations.

Some tests for TransformStream used the delay() function with non-zero delays. Mostly this was done to make things happen asynchronously in a particular order, and has been replaced with equivalent code using promise chains. The test "TransformStream flush is called after all queued writes finish, once the writable is closed" was passing accidentally as it was using test() where it should have been using promise_test(). In addition to removing non-zero delays from the test, also fix it to use promise_test() and remove the backpressure that was causing transform() to never be called. See whatwg/streams#548 for background on this clean-up.

ricea mentioned this issue Nov 9, 2016

Remove trivial non-zero delays in wpts #605

Merged

ricea mentioned this issue Apr 24, 2017

Deflake ReadableStream piping flow control test web-platform-tests/wpt#5662

Merged

ricea mentioned this issue Nov 10, 2017

Fix all non-zero delays in transform stream tests web-platform-tests/wpt#8147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests contain non-zero timers #548

Tests contain non-zero timers #548

ricea commented Oct 20, 2016

domenic commented Oct 20, 2016

ricea commented Nov 10, 2017

Tests contain non-zero timers #548

Tests contain non-zero timers #548

Comments

ricea commented Oct 20, 2016

domenic commented Oct 20, 2016

ricea commented Nov 10, 2017