fix: make Frame.lookup() iterative to prevent StackOverflowError by fdelbrayelle · Pull Request #107 · dashjoin/jsonata-java

fdelbrayelle · 2026-04-29T10:49:06Z

Problem

Jsonata$Frame.lookup() walks the parent scope chain via tail recursion:

Object lookup(String name) {
    if (bindings.containsKey(name)) return bindings.get(name);
    else if (parent != null) return parent.lookup(name);
    return null;
}

Java does not optimize tail calls. On JVMs with smaller default thread stack sizes (e.g. Windows workers, containers with -Xss256k), deeply nested JSONata expressions overflow the stack. The StackOverflowError is a JVM Error, not an Exception — it cannot be safely caught and recovered from, and it crashes the entire worker thread.

Fix

Replace with an iterative loop. Identical semantics, zero recursion:

Object lookup(String name) {
    for (Frame f = this; f != null; f = f.parent) {
        if (f.bindings.containsKey(name)) return f.bindings.get(name);
    }
    return null;
}

Impact

No behavior change — traversal order is identical
Eliminates all stack risk from scope chain lookup regardless of depth
Fixes worker crashes observed in Kestra's plugin-transform-json on Windows workers (reported at fix(jsonata): lower default maxDepth to prevent Windows worker crash kestra-io/plugin-transform#79)

Java does not optimize tail calls. On JVMs with smaller default thread stack sizes (e.g. Windows workers, containers with -Xss256k), deeply nested JSONata expressions cause Jsonata$Frame.lookup() to overflow the stack via tail recursion. StackOverflowError is a JVM Error — it cannot be safely caught and the entire worker thread crashes. Replace the recursive parent.lookup(name) call with an iterative loop over the scope chain. Semantics are identical; traversal order is unchanged. Eliminates all stack risk from scope chain lookup regardless of nesting depth. Fixes worker crashes reported in kestra-io/plugin-transform#79.

aeberhart · 2026-04-30T07:40:30Z

thanks for the PR. looks good. can you provide an example of a JVM setup + expression that reproduces this issue. Thanks!

Tests now assert isInstanceOf(StackOverflowError.class) — current behavior without the upstream fix. Rename constant to XSS_256K and add Javadoc linking it to -Xss256k so the stack constraint is self-documenting. Comment marks where assertions flip once dashjoin/jsonata-java#107 ships.

fdelbrayelle · 2026-04-30T10:05:54Z

Hi @aeberhart 👋
We don't have the customer's initial data from his flow because it's sensitive data.
But I'm able to reproduce it. I opened a PR to let you check.

Tests document the current bug state: StackOverflowError is expected. Will flip to assertThatNoException once dashjoin/jsonata-java#107 ships.

aeberhart · 2026-04-30T13:04:03Z

Hi @fdelbrayelle, I ran some tests with the expression from your repo:

($f := function($n) { $n > 0 ? $f($n - 1) + 1 : 0 }; $f(...))

Your change does not really make a difference because the number of nested frames is independent of the recursion depth. Eventually, you get a StackOverflow at a more or less random code position.

In your engine, you might want to do this:

  @Test
  public void testStack() {
    var e = Jsonata.jsonata("($f := function($n) { $n > 0 ? $f($n - 1) + 1 : 0 }; $f(820))");
    Frame f = e.createFrame();
    f.setRuntimeBounds(Long.MAX_VALUE, 100);
    System.out.println(e.evaluate(null, f));
  }

Then, you get a JException from com.dashjoin.jsonata.Timebox.checkRunnaway().

fdelbrayelle · 2026-04-30T13:24:03Z

In plugin-transform, we're already following this pattern with a maxDepth of 200 by default now (here see):

    protected JsonNode evaluateExpression(RunContext runContext, JsonNode jsonNode) {
        try {
            var timeoutInMilli = runContext.render(getTimeout()).as(Duration.class)
                .map(Duration::toMillis)
                .orElse(Long.MAX_VALUE);
            var rMaxDepth = runContext.render(getMaxDepth()).as(Integer.class).orElseThrow();

            var data = MAPPER.convertValue(jsonNode, Object.class);
            var frame = this.parsedExpression.createFrame();
            frame.setRuntimeBounds(timeoutInMilli, rMaxDepth);

            var result = this.parsedExpression.evaluate(data, frame);
            if (result == null) {
                return NullNode.getInstance();
            }
            return MAPPER.valueToTree(result);
        } catch (JException | IllegalVariableEvaluationException e) {
            throw new RuntimeException("Failed to evaluate expression", e);
        }
    }

Frame.lookup() is called inside the recursive evaluator. With maxDepth=N and a scope chain of depth D, each level of evaluation recursion adds D lookup frames on top of its own frame. Stack usage grows as O(N × D), not O(N). Making lookup() iterative collapses that D multiplier to 1 regardless of scope nesting.

setRuntimeBounds limits evaluation recursion depth but it fires after the frames are pushed. It can't prevent overflow if the stack is already exhausted by the time checkRunnaway() is reached. The iterative lookup() reduces the per level footprint giving the bounds check room to fire cleanly.

As an evidence: Kestra already calls frame.setRuntimeBounds(timeout, 200). That didn't prevent the crash. The overflow happened inside lookup() traversal while evaluating a moderately nested expression on a container with a reduced stack size. That's exactly the scenario this PR fixes.

The change is zero-behavior-change, zero-risk and removes one class of StackOverflowError entirely. It doesn't conflict with 'setRuntimeBounds`, it makes it more reliable.

aeberhart · 2026-04-30T15:36:34Z

Hi @fdelbrayelle,

running

($f := function($n) { $n > 0 ? $f($n - 1) + 1 : 0 }; $f(1999))

this is the stack trace:

java.lang.StackOverflowError
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata$Frame.lookup(Jsonata.java:94)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata$Frame.lookup(Jsonata.java:97)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata$Frame.lookup(Jsonata.java:97)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata._evaluate(Jsonata.java:155)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluate(Jsonata.java:138)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluateFunction(Jsonata.java:1601)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata._evaluate(Jsonata.java:202)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluate(Jsonata.java:138)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluateBinary(Jsonata.java:565)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata._evaluate(Jsonata.java:166)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluate(Jsonata.java:138)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluateCondition(Jsonata.java:1247)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata._evaluate(Jsonata.java:190)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluate(Jsonata.java:138)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.applyProcedure(Jsonata.java:1920)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.applyInner(Jsonata.java:1722)
...

in the example D=4 and N is some large number. In total, lookup is called N*D times. The recursion depth is O(N+D). You can see this in the trace above. The recursive call (evaluate) is repeated N times with 4 calls to lookup "on top".
Of course, a while loop is minimally faster than recursion, but since D is very small, we won't merge the change.

Please let us know if we're missing something.

Each JSONata recursion level pushes ~8 JVM frames. On 256 KB worker stacks (~300 usable frames), the old default maxDepth=200 allowed 200 × 8 = 1600 frames before the bounds check fired — far past overflow. Two-layer fix: 1. Lower default maxDepth 200 → 50 (50 × 8 = 400 frames, safe on 256 KB). setRuntimeBounds fires at depth 50, throwing JException cleanly. 2. Run evaluate() on a dedicated thread with an explicit 4 MB stack. Worker thread stack size (e.g. 256 KB on Windows) can no longer constrain the evaluator. If a user sets a very high maxDepth and triggers StackOverflowError anyway, it is caught as Throwable inside the throwaway eval thread; the worker thread gets a clean RuntimeException instead of crashing. Update regression tests: - Parametrized test: maxDepth=50/200/500/1000 all produce JException, never StackOverflowError, even on -Xss512k JVM. - Isolation test: maxDepth=50000 with $f(49999) overflows the eval thread but worker receives RuntimeException(cause=StackOverflowError). Closes dashjoin/jsonata-java#107 dependency — fix is now self-contained in plugin-transform without requiring upstream changes.

fdelbrayelle · 2026-05-02T12:51:46Z

You're right, thank you for the analysis and the stack trace.

The recursion depth is O(N+D) and D is small so making lookup() iterative doesn't prevent the overflow. I was wrong about the multiplier.

We've fixed it on our side in plugin-transform instead: evaluation now runs on a dedicated thread with an explicit 4MB stack and the default maxDepth has been lowered to 50. The worker thread is fully isolated from any StackOverflowError in the evaluator.

So in the end, no upstream changes are needed here. Closing this PR.

aeberhart · 2026-05-04T09:19:51Z

Thanks for confirming and of course for using the lib!

… thread isolation + lower default maxDepth (#80) * test(jsonata): add StackOverflow regression tests for small JVM stacks Reproduces dashjoin/jsonata-java#107. Frame.lookup() recurses once per scope frame — on 256 KB (Windows) or 512 KB (Linux default) thread stacks, a 999/1999-deep recursive JSONata expression overflows before maxDepth fires. Tests use explicit Thread stack sizes so the scenario is reproducible on any platform. Passes once jsonata-java ships the iterative Frame.lookup() fix. * test(jsonata): refine StackOverflow regression tests — shared 256k stack helper Both Windows and Linux tests now use SMALL_STACK_BYTES = 256 KB via Thread(stackSize) so no -Xss JVM flag is needed in build config. Extract runOnSmallStack() helper to avoid duplication. * test(jsonata): assert StackOverflowError and name constant XSS_256K Tests now assert isInstanceOf(StackOverflowError.class) — current behavior without the upstream fix. Rename constant to XSS_256K and add Javadoc linking it to -Xss256k so the stack constraint is self-documenting. Comment marks where assertions flip once dashjoin/jsonata-java#107 ships. * test(jsonata): use -Xss256k JVM flag instead of Thread(stackSize) Thread(stackSize) is advisory — JVM ignores it on Linux. Add -Xss256k to test JVM args in plugin-transform-json/build.gradle via afterEvaluate so it appends after the root build's jvmArgs= Allure assignment. Simplify tests to call task.run() directly and assert StackOverflowError with assertThatThrownBy. * test(jsonata): fix -Xss256k not applied — root build used jvmArgs= (replace) afterEvaluate in the subproject was silently wiped because the root Allure subprojects{} block used jvmArgs=[...] (full assignment). Change root to jvmArgs append; subproject test{jvmArgs '-Xss256k'} then safely extends it without ordering ambiguity. * test(jsonata): drop -Xss256k, increase depth to crash default Linux stack -Xss256k is silently ignored by OpenJDK on Linux 64-bit (enforces a higher minimum). Instead use depth=4999/9999 which overflows the default 512 KB-1 MB thread stack without any JVM flag (~150 bytes/frame × 9999 ≈ 1.5 MB needed). * test(jsonata): use -Xss512k + depth=49999 to guarantee stack overflow in CI Linux default thread stack is ~8 MB (OS RLIMIT_STACK), far too large to reproduce the crash without -Xss. Pin test JVM to -Xss512k (above HotSpot minimum, safe for Kestra framework). Use depth=49999 so 49999 JVM frames × minimum 16 bytes/frame = 800 KB > 512k — overflow guaranteed regardless of JIT optimization level. * test(jsonata): fix expression — use non-tail-recursive form to prevent TCO jsonata-java TCO-optimises tail-recursive calls into a loop (O(1) stack), so $f($n-1) never caused a stack overflow. Adding + 0 after the recursive call puts it in non-tail position, forcing each frame to stay live and producing the expected StackOverflowError on -Xss512k. * test(jsonata): revert to assertThatThrownBy — lib not upgraded yet Tests document the current bug state: StackOverflowError is expected. Will flip to assertThatNoException once dashjoin/jsonata-java#107 ships. * fix(jsonata): isolate eval on 4MB thread + lower default maxDepth to 50 Each JSONata recursion level pushes ~8 JVM frames. On 256 KB worker stacks (~300 usable frames), the old default maxDepth=200 allowed 200 × 8 = 1600 frames before the bounds check fired — far past overflow. Two-layer fix: 1. Lower default maxDepth 200 → 50 (50 × 8 = 400 frames, safe on 256 KB). setRuntimeBounds fires at depth 50, throwing JException cleanly. 2. Run evaluate() on a dedicated thread with an explicit 4 MB stack. Worker thread stack size (e.g. 256 KB on Windows) can no longer constrain the evaluator. If a user sets a very high maxDepth and triggers StackOverflowError anyway, it is caught as Throwable inside the throwaway eval thread; the worker thread gets a clean RuntimeException instead of crashing. Update regression tests: - Parametrized test: maxDepth=50/200/500/1000 all produce JException, never StackOverflowError, even on -Xss512k JVM. - Isolation test: maxDepth=50000 with $f(49999) overflows the eval thread but worker receives RuntimeException(cause=StackOverflowError). Closes dashjoin/jsonata-java#107 dependency — fix is now self-contained in plugin-transform without requiring upstream changes. * refactor(jsonata): replace per-call Thread with per-instance executor; document Throwable catch Addresses two review comments from Malay on PR #80: 1. Thread-per-record regression (TransformItems): evaluateExpression() previously spawned a new 4 MB-stack Thread per call. Inside TransformItems.run(), calls are inside a Flux.map so this allocated one thread per record. Fixed by replacing new Thread(...) with a lazy-initialized single-threaded ExecutorService (daemon, 4 MB ThreadFactory) stored per Transform instance. TransformItems and TransformValue now shut it down in finally at the end of run(), so the executor lives for exactly one task run. 2. catch (Throwable): kept intentionally broad with an explanatory comment. The eval thread is a throwaway sandbox — narrowing to Exception | StackOverflowError would let other Errors (OOM, InternalError) escape silently via the UncaughtExceptionHandler, leaving both resultRef and errorRef null and returning null from evaluateExpression instead of failing. New tests: - shouldReuseEvalThreadAcrossRecords: asserts zero jsonata-eval threads alive after run() - shouldContinueWorkingAfterStackOverflowError: asserts a clean second run after a SOE Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(jsonata): restore maxDepth default to 1000 to avoid breaking change The 4 MB eval thread is the actual fix for the StackOverflowError on Windows workers. Lowering maxDepth had no effect on the crash, which was driven by Frame.lookup() scope-chain depth, not user-defined function recursion. maxDepth=50 was an unnecessary breaking change. * test(jsonata): add regression test for large-dataset Frame.lookup() StackOverflowError Reproduces a production crash: 5,000 LDAP-like records with a 15-field flat-lookup expression (ternary, $string, $join on a multi-value array) run on the -Xss512k JVM. The expression mirrors a real-world actual flow. Without the 4 MB eval thread this crashes; with it the test passes. * fix(jsonata): join eval thread in shutdown to eliminate termination race awaitTermination() returns when all tasks are done but the thread itself may still be in its exit sequence. Thread.getAllStackTraces() in the test caught it in that window, causing a flaky assertion. Track the thread reference in the factory and join() it (1 s timeout) after awaitTermination so shutdownEvalExecutor() guarantees the thread is fully dead before returning. * fix(jsonata): use join() without timeout — thread is already done --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

fdelbrayelle mentioned this pull request Apr 30, 2026

fix(jsonata): prevent StackOverflowError on small JVM stacks via eval thread isolation + lower default maxDepth kestra-io/plugin-transform#80

Merged

3 tasks

fdelbrayelle closed this May 2, 2026

fdelbrayelle deleted the fix/frame-lookup-iterative branch May 2, 2026 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make Frame.lookup() iterative to prevent StackOverflowError#107

fix: make Frame.lookup() iterative to prevent StackOverflowError#107
fdelbrayelle wants to merge 1 commit into
dashjoin:mainfrom
fdelbrayelle:fix/frame-lookup-iterative

fdelbrayelle commented Apr 29, 2026

Uh oh!

aeberhart commented Apr 30, 2026

Uh oh!

fdelbrayelle commented Apr 30, 2026

Uh oh!

aeberhart commented Apr 30, 2026

Uh oh!

fdelbrayelle commented Apr 30, 2026 •

edited

Loading

Uh oh!

aeberhart commented Apr 30, 2026

Uh oh!

fdelbrayelle commented May 2, 2026

Uh oh!

aeberhart commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fdelbrayelle commented Apr 29, 2026

Problem

Fix

Impact

Uh oh!

aeberhart commented Apr 30, 2026

Uh oh!

fdelbrayelle commented Apr 30, 2026

Uh oh!

aeberhart commented Apr 30, 2026

Uh oh!

fdelbrayelle commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aeberhart commented Apr 30, 2026

Uh oh!

fdelbrayelle commented May 2, 2026

Uh oh!

aeberhart commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fdelbrayelle commented Apr 30, 2026 •

edited

Loading