Skip to content

Conversation

@maddeleine
Copy link
Contributor

@maddeleine maddeleine commented Nov 25, 2025

Goal

Fix an issue in our async callback code that breaks when multiple handshake messages are sent in the same record. The root cause of this is essentially that we've never needed to exit the negotiate loop with handshake data still waiting to be read. Now that we have new async callbacks that trigger in between reading handshake messages, we are exiting the loop in between handshake messages and the unread handshake messages are being wiped.

Why

Our async callback code that triggers on reading messages will cause the handshake to fail if multiple handshake messages messages are sent in the same record. There are three issues that you run into:

  1. Our s2n_connection_apply_error_blinding is supposed to do nothing if we fail with an async blocked error. Instead it wipes the conn->in stuffer. This wipes the remaining TLS messages in the stuffer if there are more, leading to a handshake failure.
  2. The handle_retry_state function also wipes the conn->in stuffer after the message handler succeeds. This means that again, the remaining TLS messages in the stuffer will be wiped, and the handshake will fail.
  3. After fixing both of the above issues, you run into a third issue where the data in the conn->in stuffer is presumed to be application data, instead of handshake messages.

How

Now the handle_retry_state function will process the remaining messages in the conn->in stuffer before going back to the normal negotiate loop.

Callouts

I think its probably not good that our code has duplicate handshake read/write logic for our async handling. Presumably there should be a way to get the async code to follow the normal negotiate read/write path. That would be a bigger refactor, and it's hard to justify that big of a code change to fix this bug. My fix does move us closer to more code sharing between these two paths though.

Testing

Includes a unit test to show that the situation which failed now succeeds. I removed the #[ignore] tag on the integ tests from #5638 which proves that my change fixes the issue.

Related

release summary: The TLS handshake now succeeds when the async cert callback is configured and peers sent multiple TLS handshake messages per record.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

if (s2n_connection_is_quic_enabled(conn)) {
record_type = TLS_HANDSHAKE;
uint8_t message_type = 0;
POSIX_GUARD_RESULT(s2n_quic_read_handshake_message(conn, &message_type));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, something I discovered while doing this refactor is that the message_type that gets parsed in this function is never used? It's essentially overwritten later in s2n_read_full_handshake_message. So that's weird...

@maddeleine maddeleine requested a review from jmayclin November 26, 2025 22:45
/* Just to get access to advance_message */
#include "tls/s2n_handshake_io.c"

struct test_data {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe s2n_multi_message_test_data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants