[pull] main from danny-avila:main#104
Merged
pull[bot] merged 4 commits intoinnFactory:mainfrom Apr 9, 2026
Merged
Conversation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…ay (#12586) * fix: Merge Custom Endpoints by Name Instead of Replacing Entire Array The DB base config's `endpoints.custom` array was wholesale-replacing the YAML-derived array, causing YAML endpoint additions to be silently lost after the first admin panel save. Add path-aware array merging to `deepMerge` so keyed arrays (matched by `name`) are merged item-by-item instead of replaced. * fix: Harden mergeArrayByKey — deduplicate, sanitize, and prevent mutation - Remove redundant sourceOrder array; iterate Map.keys() instead to prevent duplicate entries when source contains repeated names. - Sanitize override-only appended items through deepMerge to enforce UNSAFE_KEYS prototype-pollution protection on all code paths. - Shallow-copy unmatched base items to prevent mutation leak-back. - Add post-OVERRIDE_KEY_MAP remapping note to ARRAY_MERGE_KEYS JSDoc. - Add tests: duplicate source names, base mutation safety, multi-priority sequential merging of the same custom endpoint. * fix: Add inline comments and test for keyless source items - Document keyless item drop behavior in mergeArrayByKey with inline comment and matching test case. - Add last-write-wins comment to deduplication test assertion. - Clarify path semantics in mergeArrayByKey target-iteration comment. * fix: Relocate keyless-item comment and test target-side preserve - Move keyless-item comment to the if-guard where the skip happens and clarify that target-side keyless items are preserved, not dropped. - Add test verifying base items without a name field are kept in output.
…ments (#12578) * fix: atomize Redis event sequence counters for multi-replica deployments Replace in-memory sequenceCounters Map with shared atomic Redis counter (INCR via Lua script) so all replicas share an authoritative sequence source. Make syncReorderBuffer async to read current sequence from Redis instead of defaulting to 0 on non-publishing replicas. Closes #12575 * fix: harden emitChunk error handling and syncReorderBuffer race safety Wrap getNextSequence inside emitChunk's try/catch so a transient Redis failure is logged-and-swallowed, preserving the non-fatal chunk emission contract. In syncReorderBuffer, replace pending.clear() with selective discard of stale entries (seq < currentSeq) followed by flushPendingMessages, preventing loss of chunks that arrived during the async GET window. * fix: address review findings — harden error paths, validate types, DRY tests - Wrap syncReorderBuffer in try/catch in GenerationJobManager.subscribe() so a Redis GET failure degrades gracefully instead of crashing SSE reconnect. - Validate eval return type in getNextSequence; throw on unexpected type instead of silently computing NaN/-1 and dropping all messages. - Add NaN guard to parseInt in syncReorderBuffer for corrupted Redis values. - Consolidate two streams loops in destroy() into a single pass. - Extract createMockPublisher to shared helpers/publisher.ts (DRY). - Add tests for eval failure and unexpected eval return type in emitChunk. - Document performance tradeoff (2x RTT per emit) and TTL rationale. * test: add cross-replica integration tests for sequence desync fix (#12575) Three real-Redis tests that reproduce the exact multi-replica failure: - Late subscriber on a different replica syncs to the shared counter and receives chunks immediately (no 500ms force-flush). - Multiple subscribe/unsubscribe cycles across replicas maintain correct sequence alignment on every reconnect. - Shared counter key is cleaned up when the stream is destroyed. * fix: close syncReorderBuffer race, stop destroy() from nuking shared keys - Replace selective prune with Math.min(currentSeq, minPendingSeq) so chunks that arrive in pending during the async GET window are preserved instead of incorrectly pruned. Since unsubscribe() already clears pending, any entries at sync time are live messages from the current subscription and must not be discarded. - Remove DEL from destroy() — the sequence key is shared across replicas and a shutting-down instance must not nuke it for active publishers. Keys expire naturally via their 1-hour TTL; cleanup() handles single-stream lifecycle teardown. - Add race-condition test: pauses GET, injects a message into pending during the window, resolves GET, asserts the chunk is delivered. - Add emitDone/emitError eval failure tests to cover the asymmetric error contract (chunk swallows, done/error propagates). - Add explicit MockPublisher return type to helpers/publisher.ts. * fix: context-aware syncReorderBuffer to prevent duplicate delivery, silence test logs The Math.min(currentSeq, minPending) fix from the prior commit correctly handles the cross-replica race (chunk arriving during async GET), but causes duplicate delivery in same-replica mode: onAbort subscribes to the Redis channel during createJob, so chunks published before subscribe() may arrive via pub/sub AND earlyEventBuffer. The Math.min logic then flushes the pub/sub copy as a "live" message. Fix: add a clearPending parameter to syncReorderBuffer. The manager passes true when earlyEventBuffer was replayed (same-replica: pending entries are duplicates → clear them) and false/undefined when it was not (cross-replica: pending entries are live → preserve via Math.min). - Silence winston logger in stream test files via logger.silent = true, following the existing pattern in data-schemas/prompt.spec.ts. - Remove sequence key DEL from destroy() — shared across replicas, a shutting-down instance must not nuke the counter for active publishers. Keys expire via TTL; cleanup() handles stream teardown. * fix: share publisher client in cross-replica tests for Redis Cluster compat The cross-replica tests created publisherB via (ioredisClient as Redis) .duplicate(), which produces a plain Redis connection to a single node. In Redis Cluster, GET/EVAL on this client can't follow MOVED redirects, so syncReorderBuffer reads null → nextSeq=0 → chunks are buffered. Fix: share ioredisClient as the publisher for both replicas. It's the correct Cluster-aware client and handles slot routing automatically. Only subscriber connections need to be separate (pub/sub requirement). * fix: increase cross-replica test timeouts and use polling for CI cluster Replace fixed 300ms delivery waits with polling (50ms intervals, 2s max) to handle Redis Cluster's cross-node pub/sub broadcast latency in CI. Increase subscription activation wait from 100ms to 500ms to match the pattern used by existing same-instance tests. * chore: silence winston logs in remaining stream test files Add logger.silent = true to GenerationJobManager, RedisJobStore, and collectedUsage test files, matching the pattern already applied to RedisEventTransport and reconnect-reorder-desync tests. * fix: remove flaky cluster pub/sub tests, fix log suppression for resetModules Remove two cross-replica transport-level integration tests that are inherently non-deterministic in Redis Cluster: cluster pub/sub fan-out is async across nodes, so pre-subscribe PUBLISHes can arrive at the subscriber's node after SUBSCRIBE takes effect, causing random +/- 1 message counts. The core logic is already covered by deterministic unit tests (mock publisher) and GenerationJobManager integration tests (end-to-end with earlyEventBuffer). The cleanup test is retained. Switch log suppression from logger.silent (which doesn't survive jest.resetModules) to jest.spyOn(console, 'log').mockImplementation() for files that use resetModules (GenerationJobManager, RedisJobStore, collectedUsage). The logger.silent approach remains for files that don't use resetModules (RedisEventTransport, reconnect-reorder-desync). * fix: replace Lua EVAL+TTL with plain INCR, preserve live chunks during same-replica sync P1: syncReorderBuffer with clearPending=true was unconditionally clearing pending, dropping NEW chunks (seq >= currentSeq) from ongoing generation that arrived via pub/sub during the async GET. Fix: selectively prune only entries with seq < currentSeq (duplicates of earlyEventBuffer) and flush remaining live entries. P2: The 1-hour TTL on sequence keys could expire mid-stream during long quiet periods (e.g., slow tool calls), restarting the counter from zero and causing subscribers to silently drop all subsequent messages. Fix: remove the Lua EVAL+EXPIRE script entirely and use plain Redis INCR — no TTL. Keys are cleaned up explicitly by cleanup()/resetSequence() when streams end. Orphaned keys from crashed processes are a few bytes each, negligible compared to the production risk of TTL expiry. - Update mock publisher helper: eval → incr - Remove "unexpected eval type" tests (not applicable to incr) - Update remaining eval error tests to reference incr * fix: re-arm flush timeout after syncReorderBuffer when gaps remain syncReorderBuffer clears flushTimeout before processing, but if pending still has gaps after flushPendingMessages, no timeout is re-armed. Without new messages arriving to trigger scheduleFlushTimeout, those buffered entries sit indefinitely — stalling the stream after reconnect. * chore: address final review — fix stale docs, rename clearPending, tighten tests Fix all 10 findings from final review pass: F1: Fix stale TTL comment in destroy() — no TTL exists after EVAL removal F2: Fix stale EVAL reference in emitChunk JSDoc → INCR + PUBLISH F3: Fix syncReorderBuffer JSDoc — describes old broken behavior, not the current selective prune F4: Rename clearPending → pruneStaleEntries for clarity at call sites F5: Add publish-not-called assertion to INCR failure test F6: Replace Math.min(...spread) with explicit loop to avoid heap alloc F7: Remove resetSequence from IEventTransport interface (no external callers; implementation detail only) F8: Consolidate cleanup/resetSequence overlap — cleanup() owns the DEL, resetReorderBuffer() (now private) handles buffer state only F9: Fix import order in reconnect test (value before type imports) F10: Export MockPublisher interface from helpers/publisher.ts * chore: fix stale resetSequence references and hadBufferReplay JSDoc Two comments referenced resetSequence() which was removed in the F7/F8 refactor (now private resetReorderBuffer(), which doesn't DEL the key). Only cleanup() deletes the Redis key. Also update hadBufferReplay JSDoc to say "prune stale entries" instead of the pre-refactor "clear pending". * chore: grammar * fix: use earlyReplayCount as prune cutoff instead of Redis counter The boolean pruneStaleEntries flag used currentSeq (from Redis GET) as the prune threshold, but INCR can advance the counter past a live chunk's seq during the GET window. Example: earlyEventBuffer held seqs 0-4, generation emits seq 5 during GET, INCR advances counter to 6, GET returns 6, prune condition 5 < 6 deletes the live chunk. Fix: pass the earlyEventBuffer replay count (5) as the prune cutoff. Entries with seq < earlyReplayCount are true duplicates; entries at or above are live regardless of what the Redis counter reads. After pruning, the unified Math.min(currentSeq, minPending) logic handles both same-replica and cross-replica paths correctly. Add a targeted test that exercises this exact race: pauses GET, emits seq 5 during the window, resolves GET with counter=6, asserts seq 5 is preserved (would have been dropped with the old boolean approach). * fix: pass earlyReplayCount for skipBufferReplay path to prune pub/sub duplicates When skipBufferReplay is true (resume scenario), earlyEventBuffer events are delivered via the resume sync payload, not replayed directly. But earlyReplayCount was left at 0, so syncReorderBuffer treated all pending entries as live — meaning delayed pub/sub copies of those buffered events could be delivered again, duplicating content. Fix: capture earlyEventBuffer.length before the skip/replay branch so the count is always passed to syncReorderBuffer regardless of delivery method. Seqs 0..earlyReplayCount-1 are pruned as duplicates whether they were replayed or delivered via sync payload. * fix: keep nextSeq monotonic in syncReorderBuffer after async GET handleOrderedChunk can deliver in-order messages and advance nextSeq during the async GET window. If those messages leave pending empty, the unconditional nextSeq = currentSeq could regress nextSeq below its already-advanced value, reopening a delivered gap and causing subsequent messages to be buffered until force-flush. Fix: wrap both nextSeq assignments with Math.max(nextSeq, ...) so syncReorderBuffer never moves the delivery frontier backward. * fix: cap nextSeq at earlyReplayCount, add 24h safety TTL, add post-GET race test Finding 1 (CRITICAL): When earlyReplayCount > 0 and pending is empty, nextSeq was set to currentSeq — but INCR can advance the counter past a live chunk whose pub/sub hasn't arrived yet. Cap at earlyReplayCount instead (what was actually delivered), so in-flight chunks are not skipped. Adds test for the message-arrives-AFTER-GET-resolves scenario. Finding 3: Add a 24-hour safety-net TTL set once on first INCR only (val === 1), never refreshed. This caps orphan lifetime from crashed processes without risking mid-stream counter resets. Finding 6: Replace setTimeout(100) in cleanup test with polling loop. Finding 9: Fix variadic DEL mock + add expire mock for the new TTL. Revert P2 fix (earlyReplayCount for skipBufferReplay path) — when skipBufferReplay is true, the resume sync payload delivers everything up to currentSeq, so syncReorderBuffer should trust the Redis counter as the frontier, not the buffer length. * chore: log expire failures consistently with other fire-and-forget errors
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )