Skip to content

Conversation

@teskje
Copy link
Contributor

@teskje teskje commented Nov 13, 2025

This PR introduces Consensus/Blob implementations that both forward commands over a turmoil::net::TcpStream to a server task that's a thin wrapper around the Mem* implementations. This is to support persist simulation in turmoil tests, which can now crash the consensus/blob servers, and introduce network faults in the persist communication.

The new implementations are gated behind a "turmoil" feature.

Motivation

  • This PR adds a feature that has not yet been specified.

Support simulating persist in turmoil-based tests.

Tips for reviewer

Here this is used in action: #34110

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje teskje marked this pull request as ready for review November 13, 2025 15:59
@teskje teskje requested a review from a team as a code owner November 13, 2025 15:59
@teskje teskje requested a review from bkirwi November 13, 2025 15:59
Copy link
Contributor

@bkirwi bkirwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little surprised that turmoil doesn't give us a nice way to have workers interleave / interact outside these managed tcp streams, but that does seem like the right way to model it here, and the implementations themselves are pretty clean! Thanks for keeping the feature well-segregated.

This commit introduces `Consensus`/`Blob` implementations that both
forward commands over a `turmoil::net::TcpStream` to a server task
that's a thin wrapper around the `Mem*` implementations. This is to
support persist simulation in turmoil tests, which can now crash the
consensus/blob servers, and introduce network faults in the persist
communication.

The new implementations are gated behind a "turmoil" feature.
@teskje
Copy link
Contributor Author

teskje commented Nov 14, 2025

I was a little surprised that turmoil doesn't give us a nice way to have workers interleave / interact outside these managed tcp streams

Yeah, turmoil's main thing is network simulation, and it expects that nodes only communicate through the turmoil TcpStreams. Though I think there is nothing fundamental that prevents it from exploring more node interleavings (tasks in a single node is hard because you'd need to reach into tokio to control that). There is already enable_random_order, and if that turns out insufficient, we can consider adding ways to pause nodes explicitly or randomly.

@teskje
Copy link
Contributor Author

teskje commented Nov 14, 2025

TFTR!

@teskje teskje merged commit 5866f76 into MaterializeInc:main Nov 14, 2025
129 checks passed
@teskje teskje deleted the persist-turmoil branch November 14, 2025 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants