Skip to content

Intermittent race in forwarding state during autoscale #3117

@slfritchie

Description

@slfritchie

Is this a bug, feature request, or feedback?

Bug

What is the current behavior?

Intermittent race (?) in forwarding state to new workers during autoscale grow & shrink events. Odds are likely that this regression was introduced during recent changes to resilience-related barrier protocols.

What is the expected behavior?

No race.

What OS and version of Wallaroo are you using?

Ubuntu Bionic/18.04 LTS + Wallaroo @ commit 35d2038

Steps to reproduce?

See README.md in tarball at http://wallaroolabs-dev.s3.amazonaws.com/scott/count.tar.gz. Instructions include options for building & running a demonstration test via a VM or Docker.

The relevant test involves:

  • Starting a 1-worker Wallaroo cluster with a simple'ish stateful pipeline app in Python.
  • Use tail to watch the output of the app's sinks.
  • Grow the cluster slowly up to 10 workers. While growing, send the same 3 lines of input to the app's source.
  • Output from tail shows the count of previous times that input keys have been seen.

Occasionally, the count displays anomalies, e.g., the count drops to 0 or jumps up by more than 1. For example, in the output at https://gist.github.com/slfritchie/00af23a28fbe427610f00f097cd46fd5 shows anomalies at lines 64, 85, 127, 170, and 191.

At line 205, all of the counters are reset, then the cluster is slowly shrunk down to 1 worker. Anomalies continue, see lines 233, 277, and 321.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions