Skip to content

POC clickhouse implementation#214

Open
shirkevich wants to merge 5 commits into
roostorg:mainfrom
shirkevich:codex-poc-clickhouse-implementation
Open

POC clickhouse implementation#214
shirkevich wants to merge 5 commits into
roostorg:mainfrom
shirkevich:codex-poc-clickhouse-implementation

Conversation

@shirkevich
Copy link
Copy Markdown

@shirkevich shirkevich commented Apr 9, 2026

Summary

References #51 and #188.

This PR introduces a pluggable event-query backend abstraction and a ClickHouse reference implementation alongside the existing Druid path.

The main change is a backend-neutral filter IR and shared event-query contract. Druid now renders through the IR, and ClickHouse implements the same API shape for scan, timeseries, approximate group count, top-N, and CSV export.

Why

This makes backend work explicit instead of Druid-specific by default.

It also improves local development and backend evaluation:

  • ClickHouse dev mode is materially smaller than Druid in local Docker footprint
  • ClickHouse reaches a queryable local stack faster in local testing
  • the backend boundary is now clear enough to add other providers without rewriting the event API surface
  • on macOS/Apple Silicon, the local Druid integration stack still pays for linux/amd64 emulation, while ClickHouse does not

What changed

  • add backend-neutral filter IR and AST -> IR translation
  • keep Druid compatibility by rendering Druid filters from the IR
  • add EventQueryBackend abstraction and plugin hook
  • add built-in ClickHouse backend, client holder, and feature mapper
  • add ClickHouse local compose overlay and isolated ClickHouse integration test stack
  • add isolated Druid integration test stack for backend-specific regression coverage
  • split fast default tests from backend-specific integration suites
  • add request-path validation for backend query errors so invalid feature references return 400
  • keep execution-result storage explicit and separate from the event-query backend contract
  • fix ClickHouse typed-feature comparisons so missing JSON keys do not match default-value filters
  • fix ClickHouse scan pagination so repeated timestamps page stably with an action_id tie-breaker
  • reject non-zero ClickHouse top-N precision instead of silently returning different semantics
  • add ClickHouse and Druid integration coverage for explicit-false vs missing-feature behavior
  • wait for Druid supervisor readiness before Druid integration tests publish data
  • preserve the shared Postgres DB during Druid integration runs so the Druid metadata store is not torn down mid-stack
  • disable Datadog instrumentation telemetry in the test runner to avoid noisy shutdown exceptions
  • make Postgres schema creation stable across repeated make test runs in the same Docker test stack

Validation

  • make clickhouse-test
  • make druid-test PYTEST_ARGS='osprey_worker/src/osprey/worker/ui_api/osprey/views/tests/test_events_druid_integration.py'
  • make test
  • make test again without reset

Local timing notes

ClickHouse remains materially faster and simpler to iterate on locally than the Druid stack.

The cold-start timing picture on local macOS/Apple Silicon is still uneven enough that I do not want to overstate a single Druid number in the PR body. The more important point for this POC is that the backend-specific suites stay isolated behind dedicated make targets, while the default make test path remains fast and repeatable.

Rationale

This is a POC for a pluggable backend model, with ClickHouse as the first non-Druid reference implementation.

The immediate value is simpler local setup and a more practical path for production installations that do not want to carry a full Druid stack just to support the event query surface.

Follow-ups / known gaps

  • backend instance caching is still a follow-up; backend selection currently constructs a backend per request
  • ClickHouse still participates only as the event-query backend; full execution-result storage remains a separate contract
  • the Druid metadata DB should eventually be isolated from the app test DB instead of relying on the Druid-specific preserve behavior

@shirkevich shirkevich marked this pull request as ready for review April 10, 2026 14:35
@cassidyjames
Copy link
Copy Markdown
Member

@shirkevich thanks so much for your work on this!

We discussed this PR a bit in today's working group meeting #256. The consensus was that so far this looks a bit too ClickHouse-specific, and overall the series of tasks for making things pluggable should probably be broken down a bit more into smaller chunks/more iterative PRs. I know we have a bunch of related issues opened:

It'd probably be most helpful next to chat with other contributors like @haileyok to determine the specific steps/how things can be broken down into iterative PRs. That's probably best done on #51, in the #osprey Discord channel, or in the next working group (two weeks from today).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants