Skip to content

Conversation

@thevilledev
Copy link
Contributor

NOTE: work in progress - evaluating whether this could work for < and <= operators. Draft PR mainly to see benchmarks about the bytecode overhead.

Motivation

In #897 count comparisons like count(users, .active) >= 1 were optimised by utilising the any builtin. Expressions like count(users, .active) > 100 currently iterate through the entire array even when the 101st match is found early. For large arrays where the threshold is reached quickly, this wastes resources (both CPU and memory).

This optimization enables early termination: once the count reaches the required threshold, the loop exits immediately. This is the bytecode-level approach to optimizing count comparisons without introducing new language builtins (and bloat the stdlib in the process).

Changes

There's now a new Threshold field in the BuiltinNode AST. This handles the communication between the two phases. The new optimizer countThreshold detects count comparison patterns and calculates the threshold:

  • count(arr, pred) > N -> threshold = N + 1 (need more than N matches)
  • count(arr, pred) >= N -> threshold = N (need at least N matches)

Modified the compiler's count builtin handler to emit early-termination bytecode when a threshold is set.

Benchmark run:

go test ./optimizer/... -bench='BenchmarkCountThreshold' -run=^$ -benchmem -count=10

Results against master:

cpu: Apple M1 Pro
                                │  master.out   │               fix.out               │
                                │    sec/op     │   sec/op     vs base                │
CountThresholdEarlyMatch-8        385.72µ ±  1%   16.71µ ± 3%  -95.67% (p=0.000 n=10)
CountThresholdGteEarlyMatch-8     397.12µ ± 16%   14.99µ ± 4%  -96.22% (p=0.000 n=10)
CountThresholdNoEarlyExit-8        354.2µ ±  4%   361.4µ ± 7%        ~ (p=0.075 n=10)
CountThresholdLargeEarlyMatch-8   391.11µ ±  1%   81.33µ ± 6%  -79.21% (p=0.000 n=10)
geomean                            381.7µ         52.09µ       -86.35%

                                │  master.out   │               fix.out                │
                                │     B/op      │     B/op      vs base                │
CountThresholdEarlyMatch-8        158.26Ki ± 0%   80.92Ki ± 0%  -48.87% (p=0.000 n=10)
CountThresholdGteEarlyMatch-8     158.26Ki ± 0%   80.52Ki ± 0%  -49.12% (p=0.000 n=10)
CountThresholdNoEarlyExit-8        158.3Ki ± 0%   158.3Ki ± 0%        ~ (p=0.628 n=10)
CountThresholdLargeEarlyMatch-8    158.3Ki ± 0%   101.6Ki ± 0%  -35.81% (p=0.000 n=10)
geomean                            158.3Ki        101.2Ki       -36.08%

                                │  master.out   │                fix.out                │
                                │   allocs/op   │  allocs/op   vs base                  │
CountThresholdEarlyMatch-8         10006.0 ± 0%    106.0 ± 0%  -98.94% (p=0.000 n=10)
CountThresholdGteEarlyMatch-8     10006.00 ± 0%    55.00 ± 0%  -99.45% (p=0.000 n=10)
CountThresholdNoEarlyExit-8         10.01k ± 0%   10.01k ± 0%        ~ (p=1.000 n=10) ¹
CountThresholdLargeEarlyMatch-8    10.006k ± 0%   2.751k ± 0%  -72.51% (p=0.000 n=10)
geomean                             10.01k         632.9       -93.67%
¹ all samples are equal

Further comments

  • This follows the same pattern as BuiltinNode.Map, which the filterMap optimizer uses to exchange information between the compiler and the optimizer phases.
  • We add some bytecode overhead - essentially 4 extra opcodes when threshold is set.
  • The countAny optimizer still remains in use for > 0 and >= 1 scenarios. It runs before this new countThreshold optimizer.
  • The complexity was previously O(n) where n equals the array length. With this it's O(k) where k is position of Nth matching element.

Optimize count(arr, pred) > N and count(arr, pred) >= N patterns by
adding a threshold check inside the count loop. When the count reaches
the threshold, the loop exits early instead of scanning the entire array.

This is implemented via a new Threshold field on BuiltinNode that the
optimizer sets when detecting these patterns. The compiler then emits
bytecode that checks the count against the threshold after each increment
and jumps out of the loop when reached.

Signed-off-by: Ville Vesilehto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant