Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(subscriber): add initial integration tests #452

Merged
merged 21 commits into from
Sep 6, 2023
Merged

Conversation

hds
Copy link
Collaborator

@hds hds commented Jul 18, 2023

The console-subscriber crate has no integration tests. There are some
unit tests, but without very high coverage of features.

Recently, we've found or fixed a few errors which probably could have
been caught by a medium level of integration testing.

However, testing console-subscriber isn't straight forward. It is
effectively a tracing subscriber (or layer) on one end, and a gRPC
server on the other end.

This change adds enough of a testing framework to write some initial
integration tests. It is the first step towards closing #450.

Each test comprises 2 parts:

  • One or more "expcted tasks"
  • A future which will be driven to completion on a dedicated Tokio runtime.

Behind the scenes, a console subscriber layer is created and it's server
part is connected to a duplex stream. The client of the duplex stream
then records incoming updates and reconstructs "actual tasks". The layer
itself is set as the default subscriber for the duration of block_on
which is used to drive the provided future to completioin.

The expected tasks have a set of "matches", which is how we find the
actual task that we want to validate against. Currently, the only value
we match on is the task's name.

The expected tasks also have a set of expectations. These are other
fields on the actual task which are validated once a matching task is
found. Currently, the two fields which can have expectations set on them
are the wakes and self_wakes fields.

So, to construct an expected task, which will match a task with the name
"my-task" and then validate that the matched task gets woken once, the
code would be:

ExpectedTask::default()
    .match_name("my-task")
    .expect_wakes(1);

A future which passes this test could be:

async {
    task::Builder::new()
        .name("my-task")
        .spawn(async {
            tokio::time::sleep(std::time::Duration::ZERO).await
        })
}

The full test would then look like:

fn wakes_once() {
    let expected_task = ExpectedTask::default()
        .match_name("my-task")
        .expect_wakes(1);

    let future = async {
        task::Builder::new()
            .name("my-task")
            .spawn(async {
                tokio::time::sleep(std::time::Duration::ZERO).await
            })
    };

    assert_task(expected_task, future);
}

The PR depends on 2 others:

This change contains some initial tests for wakes and self wakes which
would have caught the error fixed in #430. Additionally there are tests
for the functionality of the testing framework itself.

@hds hds requested a review from a team as a code owner July 18, 2023 15:39
@hds hds force-pushed the hds/subscriber-tests branch 2 times, most recently from aac5675 to b68ef76 Compare July 18, 2023 15:40
The `console-subscriber` crate has no integration tests. There are some
unit tests, but without very high coverage of features.

Recently, we've found or fixed a few errors which probably could have
been caught by a medium level of integration testing.

However, testing `console-subscriber` isn't straight forward. It is
effectively a tracing subscriber (or layer) on one end, and a gRPC
server on the other end.

This change adds enough of a testing framework to write some initial
integration tests. It is the first step towards closing #450.

Each test comprises 2 parts:
- One or more "expcted tasks"
- A future which will be driven to completion on a dedicated Tokio runtime.

Behind the scenes, a console subscriber layer is created and it's server
part is connected to a duplex stream. The client of the duplex stream
then records incoming updates and reconstructs "actual tasks". The layer
itself is set as the default subscriber for the duration of `block_on`
which is used to drive the provided future to completioin.

The expected tasks have a set of "matches", which is how we find the
actual task that we want to validate against. Currently, the only value
we match on is the task's name.

The expected tasks also have a set of expectations. These are other
fields on the actual task which are validated once a matching task is
found. Currently, the two fields which can have expectations set on them
are the `wakes` and `self_wakes` fields.

So, to construct an expected task, which will match a task with the name
`"my-task"` and then validate that the matched task gets woken once, the
code would be:

```rust
ExpectedTask::default()
    .match_name("my-task")
    .expect_wakes(1);
```

A future which passes this test could be:

```rust
async {
    task::Builder::new()
        .name("my-task")
        .spawn(async {
            tokio::time::sleep(std::time::Duration::ZERO).await
        })
}
```

The full test would then look like:

```rust
fn wakes_once() {
    let expected_task = ExpectedTask::default()
        .match_name("my-task")
        .expect_wakes(1);

    let future = async {
        task::Builder::new()
            .name("my-task")
            .spawn(async {
                tokio::time::sleep(std::time::Duration::ZERO).await
            })
    };

    assert_task(expected_task, future);
}
```

The PR depends on 2 others:
 - #447 which fixes an error in the logic that determines whether a task
   is retained in the aggregator or not.
 - #451 which exposes the server parts and is necessary to allow us to
   connect the instrument server and client via a duplex channel.

This change contains some initial tests for wakes and self wakes which
would have caught the error fixed in #430. Additionally there are tests
for the functionality of the testing framework itself.
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, this looks great, thanks for working on this! i left a bunch of relatively minor comments, but overall, I'd be happy to merge this change!

console-subscriber/tests/support/state.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/state.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/state.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/state.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/state.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/task.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/task.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/task.rs Show resolved Hide resolved
console-subscriber/tests/support/task.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/task.rs Outdated Show resolved Hide resolved
@hds
Copy link
Collaborator Author

hds commented Aug 1, 2023

overall, this looks great, thanks for working on this! i left a bunch of relatively minor comments, but overall, I'd be happy to merge this change!

@hawkw Thanks for the review!

Unfortunately, I just realised today that the CI is telling me that I've got a race condition (and I really thought I'd removed all of them) which I'm not seeing locally and is also behaving differently on different platforms - some platforms fail, some platform actually hang forever it seems. So this PR is going to need a bit of work to find and fix that issue.

After the test ends, we were waiting for a single further update before
evaluating the actual tasks (vs. the expected tasks). Now we wait for 2
updates.
@hawkw
Copy link
Member

hawkw commented Aug 23, 2023

@hds is this branch ready for a review now?

@hds
Copy link
Collaborator Author

hds commented Aug 23, 2023

@hawkw unfortunately not yet. I've still got a problem with the test run on CI when executing on Ubuntu. It hangs because the message that the test has been run never gets to the instrumentation client.

Work has been busy lately, so getting to the bottom of this problem has been slow going.

@hawkw
Copy link
Member

hawkw commented Aug 23, 2023

@hds okay, cool, thanks for letting me know!

hds and others added 9 commits August 24, 2023 13:38
This commit has the same changeset as the original commit.
Rather than relying on all the tasks becoming visible N update
iterations after the test ends, we spawn a signal task which we then
look for. Once the test has completed (which will almost certainly
happen first) and the signal task has been read, we finish parsing the
current update and then finish immediately.
The one in the instrumentation client.
@hds
Copy link
Collaborator Author

hds commented Sep 5, 2023

@hawkw This one is ready for re-review when you have a moment. Thanks!

@hawkw
Copy link
Member

hawkw commented Sep 5, 2023

@hawkw This one is ready for re-review when you have a moment. Thanks!

awesome, thanks for all the time you've spent on this!

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, this looks good to me! i left a bunch of small suggestions, but none of them are major blockers.

console-subscriber/tests/framework.rs Show resolved Hide resolved
console-subscriber/Cargo.toml Outdated Show resolved Hide resolved
console-subscriber/tests/support/mod.rs Show resolved Hide resolved
Comment on lines +41 to +47
pub(crate) fn assert_tasks<Fut>(expected_tasks: Vec<ExpectedTask>, future: Fut)
where
Fut: Future + Send + 'static,
Fut::Output: Send + 'static,
{
run_test(expected_tasks, future)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit, take it or leave it: is there a reason this is a whole additional function, rather than just being a re-export of run_test? we could

pub use subscriber::run_test as assert_tasks;

if we want it to be named assert_tasks (but also, we could just name the original function that...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more a case of putting the "internal public" functions together to make the documentation clearer. Otherwise we'd have "public" docs here and on the run_test function. Not sure what the best practice is in this case to be honest.

console-subscriber/tests/support/state.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/subscriber.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/subscriber.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/subscriber.rs Outdated Show resolved Hide resolved
console-subscriber/tests/support/task.rs Outdated Show resolved Hide resolved
@hds
Copy link
Collaborator Author

hds commented Sep 6, 2023

overall, this looks good to me! i left a bunch of small suggestions, but none of them are major blockers.

@hawkw Thank you so much for all these suggestions! I really appreciate the effort, the PR is much better for them.

@hds hds merged commit 90ae016 into main Sep 6, 2023
11 checks passed
@hds hds deleted the hds/subscriber-tests branch September 6, 2023 13:42
hawkw added a commit that referenced this pull request Sep 29, 2023
The `console-subscriber` crate has no integration tests. There are some
unit tests, but without very high coverage of features.

Recently, we've found or fixed a few errors which probably could have
been caught by a medium level of integration testing.

However, testing `console-subscriber` isn't straight forward. It is
effectively a tracing subscriber (or layer) on one end, and a gRPC
server on the other end.

This change adds enough of a testing framework to write some initial
integration tests. It is the first step towards closing #450.

Each test comprises 2 parts:
- One or more "expected tasks"
- A future which will be driven to completion on a dedicated Tokio runtime.

Behind the scenes, a console subscriber layer is created and its server
part is connected to a duplex stream. The client of the duplex stream
then records incoming updates and reconstructs "actual tasks". The layer
itself is set as the default subscriber for the duration of `block_on`
which is used to drive the provided future to completioin.

The expected tasks have a set of "matches", which is how we find the
actual task that we want to validate against. Currently, the only value
we match on is the task's name.

The expected tasks also have a set of "expectations". These are other
fields on the actual task which are validated once a matching task is
found. Currently, the two fields which can have expectations set on them
are `wakes` and `self_wakes`.

So, to construct an expected task, which will match a task with the name
`"my-task"` and then validate that the matched task gets woken once, the
code would be:

```rust
ExpectedTask::default()
    .match_name("my-task")
    .expect_wakes(1);
```

A future which passes this test could be:

```rust
async {
    task::Builder::new()
        .name("my-task")
        .spawn(async {
            tokio::time::sleep(std::time::Duration::ZERO).await
        })
}
```

The full test would then look like:

```rust
fn wakes_once() {
    let expected_task = ExpectedTask::default()
        .match_name("my-task")
        .expect_wakes(1);

    let future = async {
        task::Builder::new()
            .name("my-task")
            .spawn(async {
                tokio::time::sleep(std::time::Duration::ZERO).await
            })
    };

    assert_task(expected_task, future);
}
```

The PR depends on 2 others:
 - #447 which fixes an error in the logic that determines whether a task
   is retained in the aggregator or not.
 - #451 which exposes the server parts and is necessary to allow us to
   connect the instrument server and client via a duplex channel.

This change contains some initial tests for wakes and self wakes which
would have caught the error fixed in #430. Additionally there are tests
for the functionality of the testing framework itself.

Co-authored-by: Eliza Weisman <[email protected]>
hds added a commit that referenced this pull request Oct 11, 2023
A flakiness problem has been discovered with the `console-subscriber`
integration tests introduced in #452. Issue #473 is tracking the issue.

It has been observed that we only "miss" the wake operation event when
it comes from `yield_now()`, but not when it comes from a task that
yielded due to `sleep`, even when the duration is zero. it is likely
that this is due to nature of the underlying race condition.

This change removes all the calls to `yield_now()` from the `framework`
tests, except those where we wish to actually test self wakes.
hds added a commit that referenced this pull request Oct 25, 2023
A flakiness problem has been discovered with the `console-subscriber`
integration tests introduced in #452. Issue #473 is tracking the issue.

It has been observed that we only "miss" the wake operation event when
it comes from `yield_now()`, but not when it comes from a task that
yielded due to `sleep`, even when the duration is zero. it is likely
that this is due to nature of the underlying race condition.

This change removes all the calls to `yield_now()` from the `framework`
tests, except those where we wish to actually test self wakes.
Additionally, all the sleeps have been moved out into a separate function
which describes why we're using `sleep` instead of `yield_now` when
either of them would be sufficient.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants