Replies: 2 comments
-
Architecture Specification: On‑Demand Solana Account Streaming via Geyser → Kafka → ksqlDB → gRPCThe revised version of the above after our discussion in our sync 1. OverviewThis document specifies a production‑ready architecture for real‑time and historical‑aware Solana account update streaming using an on‑demand tracking model. Instead of streaming all validator account updates into Kafka, the Geyser plugin only tracks and ingests accounts that clients explicitly subscribe to. The gRPC service orchestrates this tracking and relies solely on ksqlDB as the authoritative state store for initial snapshots — no gRPC‑level in‑memory cache. Key design decisions:
2. Architecture DiagramsequenceDiagram
participant C as Client
participant S as gRPC Service (Rust)
participant KSQL as ksqlDB
participant K as Kafka (account‑updates)
participant GP as Geyser Plugin
participant V as Validator (AccountsDB + RPC)
Note over GP: Initially tracks no accounts
C->>S: Subscribe(pubkeys)
S->>S: Add pubkeys to client filter
rect rgba(60, 100, 160, 0.3)
Note right of S: 2.a — Initial state (known accounts)
S->>KSQL: Pull query (pubkeys)
KSQL-->>S: Last seen state (subset found)
S->>C: Stream initial updates (from ksqlDB)
end
rect rgba(60, 140, 80, 0.3)
Note right of S: 2.b — Request tracking (unknown accounts)
S->>GP: Control: start tracking (missing pubkeys)
GP->>GP: Add pubkeys to track list
GP->>V: RPC getMultipleAccounts(missing pubkeys)
V-->>GP: Current account state
GP->>K: Produce initial state snapshots
K->>KSQL: STREAM → TABLE materialization
K->>S: Consumer receives initial snapshots
S->>C: Stream initial updates (from Kafka)
end
rect rgba(180, 130, 50, 0.3)
Note right of K: 3 — Continuous live updates
V->>GP: AccountsDB write (tracked pubkey)
GP->>K: Produce update
K->>KSQL: TABLE update (latest by key)
K->>S: Consumer receives update
S->>C: Relay update (matches client filter)
end
rect rgba(160, 60, 60, 0.3)
Note right of C: 4 — Unsubscribe
C->>S: Unsubscribe(pubkeys)
S->>S: Remove pubkeys from client filter
Note over GP: Continues tracking — no stop signal
Note over KSQL: Keeps latest state for future reuse
end
3. End‑to‑End Flow
Result: All accounts ever subscribed to remain tracked end‑to‑end (Geyser → Kafka → ksqlDB), but only updates for currently‑subscribed pubkeys are relayed to clients. This is vastly smaller than the entire Solana account space. 4. Components and ResponsibilitiesGeyser Plugin (
|
| Area | Concern | Mitigation |
|---|---|---|
| Validator impact | Geyser runs in‑process; slow brokers or serialization could stall the validator | Efficient serialization, lz4/zstd compression, tuned linger.ms/batch.size; reduced impact since only tracked accounts are serialized |
| Cold‑start latency | First‑ever subscription for an account includes control call + getMultipleAccounts + Kafka round‑trip |
Batch getMultipleAccounts calls; provide immediate feedback to client that tracking has started; accept higher latency for first request |
| ksqlDB query load | Burst subscriptions stress pull queries | Connection pooling in gRPC service; tune RocksDB block cache; small read timeouts + retries |
| Kafka consumer throughput | Deserializing all tracked‑account updates | Efficient Rust Protobuf (prost); volume limited to tracked accounts, not global state |
| Hot‑account fan‑out | Popular accounts concentrate load on a single partition/instance | Per‑client bounded buffers, shared message references (avoid per‑subscriber copies) |
| Cumulative tracking growth | Tracked set grows monotonically (never stop tracking) | Still orders of magnitude smaller than full Solana state; use Kafka compaction + ksqlDB TTLs if necessary; consider archival exports for very old/unused keys |
| Control plane reliability | If tracking request fails, account is never tracked | Idempotent retries in the control channel; persist control ops logs for auditing |
| RPC rate limits | getMultipleAccounts calls from Geyser could hit limits |
Use same‑node RPC (internal validator access); batch calls; concurrency limits |
| End‑to‑end lag | Cumulative latency across pipeline stages | Track validator→Kafka, broker E2E, consumer lag, gRPC send queue; set SLO‑based alerts |
Performance Goals:
- Sub‑millisecond ingestion latency from validator to Kafka (for tracked accounts).
- < 10 ms latency for initial state lookups via ksqlDB pull queries.
- Scalability to hundreds of thousands of tracked accounts and thousands of concurrent gRPC clients.
8. Scale‑out and Failure Handling
- Horizontal scaling: gRPC service instances are stateless; join a shared Kafka consumer group to split partitions. Key‑partitioning ensures per‑key ordering end‑to‑end.
- Kafka broker unavailable: Producer/consumer retries; service may momentarily stall; clients retain connection and receive updates when lag catches up.
- ksqlDB unavailable: Service cannot serve initial snapshots; returns a transient error prompting client retry. Live streaming continues.
- Geyser control endpoint unavailable: Tracking requests are retried with backoff; idempotent design ensures no duplicate tracking on recovery.
- Service instance crash: Clients reconnect to another instance; consumer group rebalances; router rebuilds live interest from active connections.
- Validator restart: Geyser plugin must restore its track list from persistent storage and resume tracking (see §11).
9. Security and Access
- Network policies: Restrict Geyser control endpoint to gRPC service; TLS mutual auth where possible.
- Kafka ACLs: Produce from Geyser only; consume from gRPC service only.
- ksqlDB auth: Scoped to pull queries from gRPC service.
- gRPC: TLS for client connections; authentication and authorization as required.
10. Observability and Monitoring
- Metrics:
- Producer (Geyser): records/s, batch size, compression ratio, retries, error rates, tracked account count.
- Kafka: broker throughput, ISR health, partition under‑replication, end‑to‑end lag.
- ksqlDB: pull query latency (p50/p99), RocksDB hit rate, processing threads.
- gRPC service: subscribe/unsubscribe rate, active streams, per‑client queue depth, send latency, dropped messages, ksqlDB query rate, control request rate/success.
- Tracing: Propagate trace context from gRPC request → control call → Geyser produce → Kafka consume → client send. Include pubkey as attribute (OpenTelemetry).
- Logging: Structured logs with pubkey, slot, partition, offset, consumer group, client id. Sample at high volume.
- Alerting: SLOs for initial snapshot latency and live update freshness. Alerts on consumer lag, ksqlDB p95 pull latency, staleness (no update for hot pubkey), control errors, error spikes.
11. Open Questions / Next Steps
- Control plane transport: Define the Geyser control API surface — dedicated gRPC port on Geyser, or a "control" Kafka topic? Include auth, idempotency, and batching semantics.
- Track list persistence: How does the Geyser plugin persist its track list across validator restarts? Options: local file, Kafka compacted topic, or external store.
- Schema finalization: Confirm exact Protobuf/Avro fields, encodings, and Schema Registry compatibility rules for the chosen Geyser plugin distribution.
- ksqlDB cleanup: Define TTL or archival policy for accounts not updated in a long period.
- SLO definition: Define SLOs for initial snapshot latency, steady‑state update freshness, and max accepted consumer lag; feed into partition count and infrastructure sizing.
- Load testing: Prototype a minimal pipeline, load‑test with synthetic account updates, and validate ksqlDB pull query latencies under burst subscribe patterns.
- Compacted topic: Decide whether to maintain a dedicated compacted
account-latesttopic alongside the source stream, or rely on compaction on the singleaccount-updatestopic.
12. Alternatives Considered
| Alternative | Why Rejected |
|---|---|
| Track all accounts (original approach) | Volume of updates for the entire Solana account space overwhelms Kafka/ksqlDB without massive horizontal scaling; 99%+ write amplification for updates never consumed by any client. |
| In‑memory cache in gRPC service (Moka) | Shared state between gRPC instances is hard to keep consistent; adds memory pressure and cache coherence complexity. ksqlDB centralizes "latest state" logic and simplifies the Rust service to near‑stateless. |
| Stop tracking on unsubscribe | Requires distributed reference counting across gRPC service instances to know if any client still needs the account. Adds coordination complexity and risks thrashing when subscription churn is high. |
| yellowstone‑grpc‑kafka (gRPC → Kafka bridge) | Adds an extra hop and an additional service vs. writing directly from the Geyser plugin in‑process. |
| Direct JSON‑RPC to clients (no Kafka/ksqlDB) | Hard to scale fan‑out and replay; lacks durable buffering; rejected for reliability. |
| Redis/DB as cache instead of ksqlDB | Adds new infrastructure and custom compaction/materialization logic; ksqlDB is purpose‑built for latest‑by‑key materialization over Kafka streams. |
| Kafka Streams inside the service vs. external ksqlDB | Tighter control but increases service complexity. ksqlDB centralizes materialization and enables reuse across multiple services. |
13. Config Reference
| Category | Parameters |
|---|---|
| Kafka | topic name, bootstrap servers, group id, TLS, acks=all, enable.idempotence=true, compression.type=zstd, linger.ms, batch.size, max.poll.records, fetch.max.bytes, partitions 64–128 (load‑test), cleanup.policy=compact, retention.ms |
| ksqlDB | URL, timeouts, processing.guarantee=exactly_once_v2, num.stream.threads, RocksDB block cache size, pull query timeouts |
| gRPC Service | listen address, TLS, per‑client queue limits, consumer max.poll.records, fetch.max.bytes, ksqlDB connection pool size, control request retry policy |
| Geyser Plugin | Kafka bootstrap servers, topic name, serialization format, compression, control endpoint address/port, track list persistence path |
Beta Was this translation helpful? Give feedback.
-
|
A bit overly verbose, with almost as many questions as design decision. But good starting point. Here's list of concerns/opinions:
might introduce a few problems:
Needs clarification, how exactly that will be done?
This will require periodic reboots of the validator to reset the accumulated accounts list, since people absolutely will use us as proxy (even with proxies), but probably we can keep track of few million accounts, but that needs some research to confirm that it won't slow down the plugin/validator too much.
If kafka compaction is used (and it most likely will be), then some sort of overwrite priority needs to be defined to keep only the latest state.
IF direct queries are used, we need a way to make sure we get the data atomically, i.e. not some accounts being read at one slot and others at another. message AccountUpdate {
string pubkey = 1; better to use custom struct of 4 u64s, and type cast for efficiency, otherwice it's quite cumbersome to encode and decode (which can never fail) to and fro.
uint64 slot = 2;
uint64 write_version = 3;
bytes data = 4;
uint64 lamports = 5;
string owner = 6;
bool executable = 7;
uint64 rent_epoch = 8;
string tx_signature = 9; not sure why we need that
bool is_deletion = 10; slot == 0, is an indicator
}
deserialization will be required on all stages for filtering and querying purposes. protobuf is a very cumbersome format to work with. For filtering purposes I would consider adding keys to the header data, so the routing can be done without parsing
sounds like AI hallucination, there's no need for such number of partitions or partitioning in general in this case, even with no account filtering 4 partitions would more than suffice.
have you researched whether ksqlDB can automatically parse the incoming protobuf stream? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Architecture Specification: Solana Account Streaming via Geyser → Kafka → ksqlDB → Rust gRPC
1. Overview
This document specifies a production-ready architecture for real-time and historical-aware Solana account update streaming. A Geyser plugin ingests account updates directly from the validator into Kafka. ksqlDB materializes latest account state as a queryable table. A custom Rust gRPC service seeds each client subscription with the latest state snapshot, then streams live updates. The design covers schema and partitioning, delivery semantics and ordering, observability, performance, and alternatives considered.
2. Architecture Diagram
flowchart TD subgraph Solana V[Validator] --> GP[Geyser Plugin accountsdb-plugin-kafka] end GP -->|keyed messages . pubkey| K[Kafka Topic: account-updates] subgraph Streaming_DB K -->|STREAM -> TABLE materialization| KSQL[ksqlDB] KSQL --> DB[RocksDB State Store] end subgraph Service S[gRPC Service . Rust] MC[In-Memory Cache . Moka] end C[Clients] -->|subscribe . pubkeys| S S -->|1 . pull query . latest by pubkey| KSQL S -->|2 . cache lookup/store| MC K -->|3 . consumer group . filtered stream| S S -->|4 . initial snapshot + live updates| CAccount Update Flow
account-updatesKafka topic, keyed by pubkey.SubscribeAccountsrequest with one or more pubkeys.3. Components
Geyser Plugin (
solana-accountsdb-plugin-kafka)Kafka (Broker + Schema Registry)
ksqlDB
LATEST_BY_OFFSETsemantics).Rust gRPC Service
tonic(gRPC) andrust-rdkafka(Kafka consumer).SubscribeAccounts(stream SubscribeRequest) returns (stream AccountUpdate)— client sends pubkeys to add/remove interest; server responds with initial snapshots followed by live updates.api— Protobuf definitions and tonic server.kafka— Consumer setup, partition assignment, message deserialization.router— Subscription registry, per-pubkey fan-out with backpressure.snapshot— ksqlDB pull-query client, response parsing, retries, cache integration.cache— TTL-based last-state store keyed by pubkey (Moka, 2–5 min configurable).tonic,tokio,rust-rdkafka,prost,moka,reqwest(ksqlDB REST),opentelemetry.4. Data Schema and Serialization
Format: Protobuf (recommended) or Avro; register schemas in Schema Registry for evolution and compatibility.
Account Update Schema:
Key:
pubkey(32-byte binary preferred for efficiency; include base58 in value for readability).Headers (lightweight routing/ops):
cluster,validator_id,commitment,compression,content-type,schema-id.Compression: Enable
lz4orzstdat the producer level.Mapping: Use the same Protobuf definition across Kafka payloads, ksqlDB table columns (via Schema Registry), and gRPC response streams to minimize serialization overhead.
5. Kafka Topic Configuration and Retention
account-updatescleanup.policy=compactto keep at least the last update per key, preventing unbounded growth while acting as a recovery source.min.cleanable.dirty.ratioandsegment.msappropriately.retention.msto a reasonable window (e.g., 7–30 days) in addition to compaction.account-updates— append-only, time-retained, for analytics/historical replay.account-latest— compacted, latest-by-key, fed by stream/table materialization.6. ksqlDB Materialization
account-updateswithKEY=pubkeyand the chosen value format.LATEST_BY_OFFSETsemantics).SELECT * FROM account_state_table WHERE pubkey = '...';processing.guarantee=exactly_once_v2.ksql.streams.num.stream.threads, cache sizes, RocksDB settings for lookup performance.KAFKA/JSON/PROTOBUF).7. Delivery Semantics and Ordering
enable.idempotence=true),acks=all, appropriate retries with backoff — exactly-once within a partition from the producer's perspective.(pubkey, slot, write_version).8. Performance Considerations and Bottlenecks
lz4/zstdcompression, tunedlinger.ms/batch.sizeprost); pre-filtering where possiblePerformance Goals:
9. Scale-out and Failure Handling
10. Observability and Monitoring
11. Capacity Planning
12. Alternatives Considered
yellowstone-grpc-kafka (gRPC → Kafka bridge)
Not preferred. Adds an extra hop and an additional service to manage vs. writing directly from the Geyser plugin.
Seeding initial state without ksqlDB
Fewer components but requires replay/scan or bespoke compacted-topic management; ksqlDB provides mature materialization and low-latency pull queries out of the box.
Kafka Streams inside the service vs. external ksqlDB
Tighter control and possibly lower latency, but increases service complexity. ksqlDB centralizes materialization and enables reuse across multiple services.
Alternative brokers (e.g., Redpanda)
Could reduce ops complexity while keeping Kafka API compatibility; still compatible with ksqlDB only when running separately. Keep standard Kafka unless a strong operational reason emerges.
13. Open Questions / Next Steps
account-latesttopic alongside the source stream.14. Config Knobs (Reference)
linger.ms,batch.size,acks,compression.type,max.poll.records,fetch.max.bytesprocessing.guarantee,num.stream.threads, RocksDB cache sizeBeta Was this translation helpful? Give feedback.
All reactions