-
-
Notifications
You must be signed in to change notification settings - Fork 34.3k
Description
Version
v22.13.1
Platform
Darwin 24.3.0 (macOS, arm64)
Subsystem
http2, tls
What steps will reproduce the bug?
Reproducible test case (requires sudo for firewall manipulation)
We created a test that simulates a network "black hole" using macOS pf firewall:
// test-zombie-blackhole.js
// Run with: sudo node test-zombie-blackhole.js
const http2 = require('http2')
const { spawn, execSync } = require('child_process')
const PORT = 8444
async function main() {
if (process.getuid() !== 0) {
console.log('Run with: sudo node test-zombie-blackhole.js')
process.exit(1)
}
// Start server as child process
const server = spawn('node', ['-e', `
const http2 = require('http2')
const fs = require('fs')
const server = http2.createSecureServer({
key: fs.readFileSync('./key.pem'),
cert: fs.readFileSync('./cert.pem'),
})
server.on('stream', (s, h) => {
s.respond({ ':status': 200 })
s.end('ok')
})
server.listen(${PORT}, () => console.log('Server ready'))
setInterval(() => {}, 10000)
`], { stdio: 'inherit' })
await new Promise(r => setTimeout(r, 1000))
// Connect client
const session = http2.connect(\`https://localhost:\${PORT}\`, { rejectUnauthorized: false })
session.on('error', e => console.log('error event:', e.message))
session.on('close', () => console.log('close event'))
await new Promise(r => session.on('connect', r))
console.log('Connected')
// Make initial request to establish session
await new Promise((resolve, reject) => {
const s = session.request({ ':path': '/init' })
s.on('end', resolve)
s.on('error', reject)
s.end()
})
// Block traffic with firewall (black hole - no RST/FIN, packets just disappear)
execSync(\`echo "block drop quick proto tcp from any to 127.0.0.1 port \${PORT}" | pfctl -a zombie_test -f -\`)
execSync('pfctl -e 2>/dev/null || true')
console.log('Firewall blocking traffic')
// Wait - session should remain "healthy" looking
await new Promise(r => setTimeout(r, 5000))
console.log('Session state:', {
closed: session.closed,
destroyed: session.destroyed,
queueSize: session.state?.outboundQueueSize
})
// Attempt write to trigger crash
const syms = Object.getOwnPropertySymbols(session)
const sockSym = syms.find(s => s.toString().includes('socket'))
const tls = sockSym ? session[sockSym] : session.socket
console.log('Writing to socket...')
tls.write('test') // CRASHES HERE
// Cleanup (won't reach here)
execSync('pfctl -a zombie_test -F all')
server.kill()
}
main()What happens
- Server and client establish HTTP/2 connection over TLS
- Firewall rule creates a "black hole" (packets dropped, no RST/FIN sent)
- Session continues to report
closed: false,destroyed: false - No error or close events fire
- Writing to the TLS socket triggers an assertion failure crash
Original discovery
This was originally discovered in a long-running production process (~2 days) where the TCP socket entered CLOSED state at the OS level (visible via lsof) but Node.js never received the close event.
How often does it reproduce? Is there a required condition?
100% reproducible with the firewall-based test above.
In production, it's intermittent and requires:
- Long-running HTTP/2 session
- Network event that causes packet loss without proper TCP RST/FIN (NAT timeout, network partition, etc.)
What is the expected behavior? Why is that the expected behavior?
-
Close events should propagate - When the OS-level TCP socket enters CLOSED state, this should propagate up through TLS and HTTP/2 layers, setting
session.closed = trueand emitting appropriate events -
Write should fail gracefully - Even if the zombie state occurs, calling
.write()should return an error via callback, not crash with an assertion failure
What do you see instead?
- Close event is lost - All layers above TCP continue to report healthy status
- Writes queue up -
session.state.outboundQueueSizegrows indefinitely (we observed 2815 queued frames) - Crash on write - Calling
.write()crashes the process with assertion failure
Detailed debugging evidence
We attached Chrome DevTools inspector to the running process and gathered the following:
OS level (via lsof -p <pid>):
node <pid> 2128u IPv4 ... TCP ...:62689->...:https (CLOSED)
The file descriptor 2128 is in CLOSED state at the OS level.
Node level (via inspector):
sessionInfo.session.closed // false - thinks it's open!
sessionInfo.session.destroyed // false
sessionInfo.session.connecting // false
// TLS socket also reports healthy:
socket.destroyed // false
socket.readable // true
socket.writable // true
socket.readyState // 'open'
// But the outbound queue is stuck:
session.state.outboundQueueSize // 2815 frames queued!
sessionInfo.pendingRejects.size // 3 requests waiting
// The TLS socket's underlying TCP handle IS the CLOSED fd:
socket._handle._parent.fd // 2128 (the CLOSED socket!)Ping test (callback never fires):
sessionInfo.session.ping((err, duration) => console.log(err, duration))
// Returns true (ping "sent") but callback NEVER executesFresh connection to same host works fine:
require('http2').connect('https://same-host.com').ping((e,d) => console.log(e,d))
// -> connected!
// -> ping: null 19.748583 (success, 20ms latency)This proves the server is reachable; only the cached zombie session is broken.
Crash output
We've observed two different assertion failures depending on the scenario:
Crash 1: From reproducible test (firewall black hole)
# node[7869]: virtual void node::http2::Http2Session::OnStreamAfterWrite(node::WriteWrap *, int) at ../src/node_http2.cc:1741
# Assertion failed: is_write_in_progress()
----- Native stack trace -----
1: 0x1043e8d1c node::Assert(node::AssertionInfo const&)
2: 0x10601d89c node::http2::Http2Session::OnStreamAfterWrite(node::WriteWrap*, int) (.cold.1)
3: 0x10441a9a4 node::http2::Http2Session::ClearOutgoing(int)
4: 0x1044f8520 node::WriteWrap::OnDone(int)
5: 0x1044f8848 node::StreamReq::Done(int, char const*)
6: 0x104576f2c node::crypto::TLSWrap::InvokeQueued(int, char const*)
7: 0x104578c88 node::crypto::TLSWrap::OnStreamAfterWrite(node::WriteWrap*, int)
...
Crash 2: From production debugging (long-running zombie session)
# node[71801]: virtual int node::crypto::TLSWrap::DoWrite(node::WriteWrap *, uv_buf_t *, size_t, uv_stream_t *) at ../src/crypto/crypto_tls.cc:1033
# Assertion failed: !current_write_
----- Native stack trace -----
1: 0x102978d1c node::Assert(node::AssertionInfo const&)
2: 0x1045de9bc node::crypto::TLSWrap::DoWrite(node::WriteWrap*, uv_buf_t*, unsigned long, uv_stream_s*) (.cold.8)
3: 0x102b09e24 node::crypto::TLSWrap::DoWrite(node::WriteWrap*, uv_buf_t*, unsigned long, uv_stream_s*)
4: 0x102a85198 node::StreamBase::Write(uv_buf_t*, unsigned long, uv_stream_s*, v8::Local<v8::Object>, bool)
5: 0x102a89288 int node::StreamBase::WriteString<(node::encoding)1>(v8::FunctionCallbackInfo<v8::Value> const&)
...
----- JavaScript stack trace -----
1: handleWriteReq (node:internal/stream_base_commons:62:21)
2: writeGeneric (node:internal/stream_base_commons:148:15)
3: Socket._writeGeneric (node:net:971:11)
4: Socket._write (node:net:983:8)
5: writeOrBuffer (node:internal/streams/writable:572:12)
6: _write (node:internal/streams/writable:501:10)
7: Writable.write (node:internal/streams/writable:510:10)
Both crashes indicate internal state corruption in the TLS/HTTP2 layers when the underlying connection is silently broken.
Relationship to existing issues
This appears related to but distinct from previously fixed issues:
-
Node.js HTTP/2 segfault if underlying socket is unexpectedly closed #49307 (HTTP/2 segfault if underlying socket is unexpectedly closed) - Fixed via PR Fix a segfault by ensuring TLS Sockets are closed if the underlying stream closes #49327. That fix handles the case where the socket is destroyed, but our scenario involves a socket in CLOSED state at the OS level where no close/error event ever propagated to Node.js.
-
TLS assertion error in DoWrite #30896 (TLS assertion error in DoWrite) - Same assertion failure
!current_write_, attributed to memory exhaustion. Our case is not memory-related; it's a zombie session with a silently-closed TCP connection. -
PR #18987 (Handle writes after SSL destroy more gracefully) - This fix handles writes after SSL is destroyed, but in our case the SSL layer doesn't know it should be destroyed.
The key difference in our scenario: the OS socket is CLOSED but Node.js never received the close event, leaving the TLS and HTTP/2 layers in an inconsistent state where they believe the connection is healthy.
Additional information
The zombie session persisted for an extended period (potentially hours) before we discovered it via debugging. All requests to the affected origin silently failed (queued but never sent), while requests to other origins continued to work normally.
The session.state.outboundQueueSize growing while bytesWritten remains static is a clear indicator of this zombie state, but there's no documented way to detect this condition - all public APIs (session.closed, socket.writable, etc.) report the connection as healthy.