From e1d02c6f17b778d8b2726eb4b35e574956b95fe9 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Tue, 28 Apr 2026 17:02:10 +0300 Subject: [PATCH 01/21] docs: Adds AGENTS.md and llms.txt for AI agent guidance --- AGENTS.md | 219 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ llms.txt | 54 ++++++++++++++ 2 files changed, 273 insertions(+) create mode 100644 AGENTS.md create mode 100644 llms.txt diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..74265dde --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,219 @@ +# AGENTS.md + +Operating rules for AI coding agents (Claude Code, Cursor, Aider, Copilot Chat, Codex CLI) working in any project that consumes [Elders.Cronus](https://github.com/Elders/Cronus). This file curates facts that already live in the canonical [`docs/`](docs/) tree and the Cronus knowledge base — read it first, follow the pointers when you need depth. + +## What Cronus is + +Cronus is a .NET DDD/CQRS/Event Sourcing framework maintained by Elders OSS. It targets `net8.0;net9.0` (see `src/Elders.Cronus/Elders.Cronus.csproj`). Domain code is organised around aggregates that emit events, application services that handle commands, and projections / sagas / ports / triggers / gateways that react to events through RabbitMQ-backed transport with Cassandra-backed event store and projections. + +## Hard rules — the things that will break the build or mislead you + +These are not style preferences. Violating one of them either fails to compile, fails at runtime, or silently corrupts the wire contract. + +1. **All publishing is async.** `IPublisher` exposes `PublishAsync(...)` only. There is no sync `Publish(...)`. The lint pattern `sync-publish` flags any `publisher.Publish(` call. + + ```csharp + // RIGHT + await publisher.PublishAsync(command); + + // WRONG — does not exist + publisher.Publish(command); + ``` + +2. **Every `IMessage` needs a stable contract id and ordered members.** Every `ICommand`, `IEvent`, `IPublicEvent`, `ISignal` (and every projection state, value record persisted in a snapshot, etc.) must carry `[DataContract(Namespace = BC.Name, Name = "")]` plus `[DataMember(Order = N)]` on every persisted property. The GUID is the wire identity — once a message is in production, **never** change it. + + ```csharp + [DataContract(Name = "728fc4e7-628b-4962-bd68-97c98aa05694")] + public class TaskCreated : IEvent + { + TaskCreated() { } + + public TaskCreated(TaskId id, string name, DateTimeOffset timestamp) + { + Id = id; Name = name; Timestamp = timestamp; + } + + [DataMember(Order = 1)] public TaskId Id { get; private set; } + [DataMember(Order = 2)] public string Name { get; private set; } + [DataMember(Order = 3)] public DateTimeOffset Timestamp { get; private set; } + } + ``` + +3. **Aggregate IDs use the non-generic `AggregateRootId(string tenant, string arName, string id)`.** The constructor order is `(tenant, arName, id)`. The generic `AggregateRootId` form is commented out in source and is not available. Do not invent `AggregateUrn`, `IUrn`, or `StringTenantId` — those are removed types. + + ```csharp + [DataContract(Name = "d5e50e1f-5886-4608-9361-9fe0eb440a6b")] + public class TaskId : AggregateRootId + { + TaskId() { } + public TaskId(string tenant, string id) : base(tenant, "task", id) { } + } + ``` + +4. **Aggregate roots inherit `AggregateRoot` and mutate state only through events.** Keep a `private` parameterless constructor for replay, validate invariants in public methods, and call `Apply(new SomeEvent(...))` to record changes. State is folded by `public void When(TEvent e)` handlers on the `AggregateRootState` class. The aggregate must remain **synchronous** — no I/O, no `async`. See [`docs/cronus-framework/domain-modeling/aggregate.md`](docs/cronus-framework/domain-modeling/aggregate.md). + +5. **Cross-aggregate flow goes through a Saga (process manager).** Aggregates do not subscribe to other aggregates' events, do not call other aggregates, and do not load anything via the repository. If you need to react to one aggregate's event by issuing a command on another, that is a saga's job. See [`docs/cronus-framework/domain-modeling/handlers/sagas.md`](docs/cronus-framework/domain-modeling/handlers/sagas.md). + +## Pick the right handler + +Six handler kinds. Pick by **intent**, not by capability. Verbatim from [`docs/cronus-framework/domain-modeling/handlers/README.md`](docs/cronus-framework/domain-modeling/handlers/README.md): + +| Handler | Reacts to | Produces | Side effects allowed? | +| --- | --- | --- | --- | +| Application Service | `ICommand` | New events on a single aggregate | No — should not perform I/O outside the event store | +| Projection | `IEvent` | A read model (snapshot or external store) | No — must not publish commands or events | +| Saga | `IEvent`, `IScheduledMessage` | New `ICommand` messages; scheduled timeouts | No business-facing side effects — coordinate aggregates | +| Port | `IEvent` | New `ICommand` messages | Yes — the classic "send email", "call external API" place | +| Trigger | `IEvent`, `ISignal` | Anything — typically starts a job or a downstream workflow | Yes | +| Gateway | `IEvent` | New `ICommand` messages, with tracked infrastructure state | Yes — owns metadata required by an external system | + +Rule of thumb: + +- One command mutates one aggregate → **Application Service**. +- Read model from events → **Projection**. +- Coordinate several aggregates → **Saga**. +- React to an event with one outbound side effect (email, HTTP call) → **Port**. +- Kick off a job or long-running workflow → **Trigger**. +- Port that needs persistent infra state (push tokens, badges) → **Gateway**. + +## Pick the right message type + +Every message implements `IMessage` and carries a `Timestamp`. Source: [`docs/cronus-framework/domain-modeling/messages/README.md`](docs/cronus-framework/domain-modeling/messages/README.md). + +| Message type | Intent | Consumed by | +| --- | --- | --- | +| `ICommand` | Request a business change. May be rejected by the aggregate. Imperative name (`CreateTask`). | Application Service | +| `IEvent` | Record a fact already committed inside this bounded context. Past-tense name (`TaskCreated`). | Projection, Saga, Port, Trigger, Gateway | +| `IPublicEvent` | Announce a change to the outside world (published language). Carries the originating `Tenant`. | Subscribers in other bounded contexts | +| `ISignal` | Trigger arbitrary side-effects (heartbeats, rebuilds, process pings). | Trigger | + +Publishing is always async: + +```csharp +Task PublishAsync(TMessage message, Dictionary headers = null); +Task PublishAsync(TMessage message, DateTime publishAt, Dictionary headers = null); +Task PublishAsync(TMessage message, TimeSpan publishAfter, Dictionary headers = null); +``` + +`false` from `PublishAsync` means the transport rejected the message — surface it. + +## Banned legacy types + +These names exist in old samples, blog posts, and Cronus v6/v7 code. They are gone in current Cronus and the lint database flags every one of them. Never propose or generate them: + +- **`ValueObject`** — base class is removed. Use C# `record` types with `[DataContract(Name = "")]`. KB concept: `value-object-record-pattern`. +- **`IUrn`**, **`AggregateUrn`** — removed; use `AggregateRootId` and `AggregateRootId.TryParse`. +- **`StringTenantId`** — removed; tenant is a `string` segment of `AggregateRootId`. +- **`AggregateRootId`** (generic form) — commented out in source. Use the non-generic `AggregateRootId(tenant, arName, id)`. +- **`IAggregateRootId`** — removed; the lint pattern is `fake-iaggregaterootid-generic`. +- **`AggregateRootApplicationService`** — old base class. Use `ApplicationService`. +- **`Cronus.Persistence.Git-*`**, **`Cronus.Persistence.MSSQL`**, **`Cronus.Serialization.Proteus`** — legacy persistence/serialization satellites. Don't add them. The current stack is `Cronus.Persistence.Cassandra`, `Cronus.Projections.Cassandra`, `Cronus.Transport.RabbitMQ`, `Cronus.Serialization.NewtonsoftJson`. +- **Sync `IPublisher.Publish(...)`** — removed; use `PublishAsync`. +- **Sync `IProjectionReader.Get(...)`** — removed; use the async `GetAsync`. +- **Sync `repository.Save(...)`** — removed; use `SaveAsync`. +- **`repository.TryLoad(id, out var ar)`** — removed; use `await repository.LoadAsync(id)` and inspect the `ReadResult` (`IsSuccess`, `NotFound`, `HasError`). +- **`public void Handle(TMessage m)` on a handler** — removed; handlers are `Task HandleAsync(TMessage m)`. + +If the user asks for one of these, tell them it has been removed and offer the modern equivalent. + +## `AddCronus(configuration)` setup + +`services.AddCronus(configuration)` is the single entry point — it registers core services, scans your assemblies for handlers, and binds options. The required configuration keys are: + +- `Cronus:BoundedContext` — alphanumeric/underscore name of the service. Validates against `^\b([\w\d_]+$)`. +- `Cronus:Tenants` — non-empty string array; same character set per element. +- `Cronus:Persistence:Cassandra:ConnectionString` — Cassandra event-store connection. +- `Cronus:Projections:Cassandra:ConnectionString` — Cassandra projections-store connection. +- `Cronus:Transport:RabbitMQ:*` — Server, Port, VHost, Username, Password. + +Every key, type and default lives in [`docs/cronus-framework/configuration.md`](docs/cronus-framework/configuration.md). When in doubt, read it — do not invent option names. + +Common process-split flags (default `true`): + +- `Cronus:ApplicationServicesEnabled` +- `Cronus:ProjectionsEnabled` +- `Cronus:SagasEnabled` +- `Cronus:PortsEnabled` +- `Cronus:GatewaysEnabled` +- `Cronus:TriggersEnabled` + +For an API process that only publishes commands, turn all six off and let a separate worker host run them. + +For local dev with [.NET Aspire](https://learn.microsoft.com/dotnet/aspire/), the AppHost spins up Cassandra/RabbitMQ/Redis/Consul/Elasticsearch and injects connection details as `Cronus__*` environment variables that `AddCronus` picks up automatically — see KB concept `aspire-cronus-wiring`. + +## `Bootstraps` enum and `[CronusStartup]` + +One-time host startup work goes in `ICronusStartup` / `ICronusTenantStartup` and is ordered by `[CronusStartup(Bootstraps.X)]`: + +| Phase | Value | Use it for | +| --- | ---: | --- | +| `Environment` | `0` | Process-wide switches, logger setup | +| `ExternalResource` | `10` | Provision DB keyspaces, broker exchanges | +| `Configuration` | `20` | Finalise options | +| `Aggregates` | `30` | One-time work for aggregates | +| `Ports` | `40` | One-time work for ports | +| `Sagas` | `50` | One-time work for sagas | +| `EventStoreIndices` | `55` | Register per-tenant event-store indices | +| `Projections` | `60` | One-time work for projections | +| `Gateways` | `70` | One-time work for gateways | +| `Runtime` | `1000` | Default — anything else | + +Pick the smallest phase number that satisfies your ordering. Startups must be safe to run repeatedly. The attribute does **nothing** on a discovery — do not put it there. Source: [`docs/cronus-framework/extensibility/startup-attribute.md`](docs/cronus-framework/extensibility/startup-attribute.md). + +## Knowledge base for grounded answers + +A curated KB of Cronus symbols, doc snippets, lint rules and concepts lives at `E:/Projects/ai stuff/Cronus-AIed/`. Use the CLI before generating non-trivial Cronus code: + +```bash +CRONUS_KB_TRACE=1 python "E:/Projects/ai stuff/Cronus-AIed/benchmark/scripts/knowledge-lookup.py" \ + {concepts,symbols,docs,snippets,lint} ... +``` + +Always set `CRONUS_KB_TRACE=1` so usage telemetry accumulates. + +The 15 curated concepts (run `concepts` with no args to list them) include `aspire-cronus-wiring`, `tenant-flow`, `multi-process-topology`, `discoveries-pattern`, `port-external-io`, `saga-external-service`, `trigger-signal-rpc`, `public-event-federation`, `aggregate-with-entities`, `value-object-record-pattern`, `contract-versioning`, `dual-store-projections`, `projection-versioning`, `atomic-action-locking`, `approval-workflow`. + +Before writing a saga / projection / port / etc., ground the shape: + +```bash +CRONUS_KB_TRACE=1 python ".../knowledge-lookup.py" concepts --id --expand +``` + +Other useful sub-commands: + +- `symbols --kind aggregate-root --canonical` — production-quality real symbols of a given kind. +- `docs --topic ""` — scope to specific docs. +- `lint --file ` — run the lint patterns against a file you just wrote. +- `lint --list` — see the patterns (the `legacy` patterns map to the **Banned legacy types** section above). + +## Verification before claiming done + +Cronus tasks are not done until both of these are green from a clean checkout: + +```shell +dotnet build +dotnet test +``` + +Lint your output too: + +```shell +CRONUS_KB_TRACE=1 python ".../knowledge-lookup.py" lint --file path/to/your/file.cs +``` + +If the lint reports any `legacy` or `bug` finding, fix it before declaring success. "I think this works" is not a status. + +## Pointers + +Canonical doc pages on this branch: + +- [Bounded context](docs/cronus-framework/domain-modeling/bounded-context.md) +- [Aggregate](docs/cronus-framework/domain-modeling/aggregate.md) +- [Aggregate IDs](docs/cronus-framework/domain-modeling/ids.md) +- [Messages overview](docs/cronus-framework/domain-modeling/messages/README.md) — commands, events, public events, signals +- [Handlers overview](docs/cronus-framework/domain-modeling/handlers/README.md) — application services, projections, sagas, ports, triggers, gateways +- [Configuration](docs/cronus-framework/configuration.md) — every `Cronus:*` key +- [Discoveries](docs/cronus-framework/extensibility/discoveries.md) and [`[CronusStartup]`](docs/cronus-framework/extensibility/startup-attribute.md) +- [Workflows](docs/cronus-framework/workflows.md) — message-processing pipeline +- [Indices](docs/cronus-framework/indices.md) — event-store secondary indices +- [Quick start: setup](docs/getting-started/quick-start/setup.md) and [persist first event](docs/getting-started/quick-start/persist-first-event.md) diff --git a/llms.txt b/llms.txt new file mode 100644 index 00000000..20ef2878 --- /dev/null +++ b/llms.txt @@ -0,0 +1,54 @@ +# Cronus + +> Cronus is a lightweight .NET DDD/CQRS/Event Sourcing framework by Elders OSS. Domain code is organised around aggregates that emit events, application services that handle commands, and projections / sagas / ports / triggers / gateways that react to events. Targets `net8.0;net9.0`. Default stack is RabbitMQ transport plus Cassandra event store and projections. + +This file points crawlers and LLMs at the highest-signal pages in the Cronus documentation tree on this repo. Every link is relative to the repo root and verified against the `docs-update-2026-04-23` branch. + +## Concepts + +- [Bounded context](docs/cronus-framework/domain-modeling/bounded-context.md): The explicit boundary inside which a model is consistent. In Cronus this collapses to the single load-bearing config value `Cronus:BoundedContext`. +- [Aggregate](docs/cronus-framework/domain-modeling/aggregate.md): Cluster of domain objects treated as one consistency boundary; event-sourced through `AggregateRoot` + `AggregateRootState`. +- [IDs](docs/cronus-framework/domain-modeling/ids.md): URN-based identity. Every aggregate root, entity, projection and id-carrying message is a URN-shaped value object. +- [Multitenancy](docs/cronus-framework/domain-modeling/multitenancy.md): Tenant is a first-class dimension of every message; declared via `Cronus:Tenants`. +- [Messages](docs/cronus-framework/domain-modeling/messages/README.md): Four kinds — `ICommand`, `IEvent`, `IPublicEvent`, `ISignal` — all under `IMessage`. +- [Commands](docs/cronus-framework/domain-modeling/messages/commands.md): Imperative request to mutate one aggregate. +- [Events](docs/cronus-framework/domain-modeling/messages/events.md): Past-tense fact emitted from an aggregate via `Apply(IEvent)`. +- [Public events](docs/cronus-framework/domain-modeling/messages/public-events.md): Published-language event that crosses bounded contexts; carries the originating tenant. +- [Signals](docs/cronus-framework/domain-modeling/messages/signals.md): Heartbeats, rebuilds, process pings — arbitrary side-effect triggers. +- [Handlers](docs/cronus-framework/domain-modeling/handlers/README.md): Six handler kinds, picked by intent — Application Service, Projection, Saga, Port, Trigger, Gateway. +- [Application services](docs/cronus-framework/domain-modeling/handlers/application-services.md): Write-side entry point; loads an aggregate, calls a method, saves the events. +- [Projections](docs/cronus-framework/domain-modeling/handlers/projections.md): Derived read model built by handling events — event-sourced or external-store. +- [Sagas](docs/cronus-framework/domain-modeling/handlers/sagas.md): Process manager that watches the event stream and publishes new commands; supports scheduled timeouts. +- [Ports](docs/cronus-framework/domain-modeling/handlers/ports.md): Where an event meets the outside world — emails, third-party APIs, follow-up commands. +- [Triggers](docs/cronus-framework/domain-modeling/handlers/triggers.md): Starts something new (a job, a workflow) in response to an event or signal. +- [Gateways](docs/cronus-framework/domain-modeling/handlers/gateways.md): A port that remembers infra state the external system needs (push tokens, badges). +- [Published language](docs/cronus-framework/domain-modeling/published-language.md): The vocabulary a bounded context exposes to others. +- [Value object](docs/cronus-framework/domain-modeling/value-object.md): Modern Cronus uses C# `record` types with `[DataContract]`; the `ValueObject` base is gone. + +## Getting started + +- [Setup](docs/getting-started/quick-start/setup.md): Provisions a two-process TaskManager skeleton — API + worker — with Cronus, Cassandra and RabbitMQ. +- [Persist first event](docs/getting-started/quick-start/persist-first-event.md): Models IDs, command, event, aggregate, application service and an API controller; ends with `TaskCreated` in Cassandra. +- [Explore projections](docs/getting-started/quick-start/explore-projections.md): Adds a `TaskProjection` event-sourced read model and exposes it through the API. + +## Reference + +- [Configuration](docs/cronus-framework/configuration.md): Every `Cronus:*` configuration key — bounded context, tenants, persistence, transport, consumer toggles. +- [Discoveries](docs/cronus-framework/extensibility/discoveries.md): How Cronus assembles itself from a core plus pluggable satellites at boot via `DiscoveryBase`. +- [`[CronusStartup]` and boot phases](docs/cronus-framework/extensibility/startup-attribute.md): The `Bootstraps` enum (`Environment`, `ExternalResource`, `Configuration`, `Aggregates`, `Ports`, `Sagas`, `EventStoreIndices`, `Projections`, `Gateways`, `Runtime`) and `ICronusStartup` ordering. +- [Fault handling](docs/cronus-framework/extensibility/fault-handling.md): Strategies for retries, dead-letter routing and error surfaces. +- [Observability](docs/cronus-framework/extensibility/observability.md): Logging, activity tracing and metrics hooks Cronus exposes. +- [Workflows](docs/cronus-framework/workflows.md): The message-processing pipeline — middleware-style wrappers around handler invocation. +- [Indices](docs/cronus-framework/indices.md): Event-store secondary indices Cronus maintains for non-id-shaped queries. +- [Event store](docs/cronus-framework/event-store/README.md): The append-only store and its replay/migration story. +- [Messaging](docs/cronus-framework/messaging/README.md): Transport, routing and delivery semantics. +- [Serialization](docs/cronus-framework/messaging/serialization.md): The `[DataContract(Name = "")]` wire contract that ties past, present and future versions of every message together. +- [Projections subsystem](docs/cronus-framework/projections/README.md): Lifecycle, snapshots and versioning of projections. +- [Cluster](docs/cronus-framework/cluster.md): How multiple Cronus hosts coordinate. +- [Jobs](docs/cronus-framework/jobs.md): Long-running background work model. +- [Unit testing](docs/cronus-framework/unit-testing.md): Patterns for testing aggregates, sagas and projections. + +## Optional + +- [AGENTS.md](AGENTS.md): Hard rules, banned legacy types, handler/message decision tables and KB-grounding guidance for AI coding agents working in Cronus codebases. +- [README](README.md): Project intro and links to the public Cronus documentation site. From 3825640cff5819268c2bcd318ebcf44430a2f56f Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Tue, 28 Apr 2026 17:02:11 +0300 Subject: [PATCH 02/21] chore: Removes FUNDING.yml --- .github/FUNDING.yml | 9 --------- 1 file changed, 9 deletions(-) delete mode 100644 .github/FUNDING.yml diff --git a/.github/FUNDING.yml b/.github/FUNDING.yml deleted file mode 100644 index a2811809..00000000 --- a/.github/FUNDING.yml +++ /dev/null @@ -1,9 +0,0 @@ -# These are supported funding model platforms - -github: mynkow -patreon: elders_cronus -open_collective: # Replace with a single Open Collective username -ko_fi: # Replace with a single Ko-fi username -tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel -community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry -custom: # Replace with a single custom sponsorship URL From d80c9a1e8cf83bcb9e133dda06909c9484f6428a Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Tue, 28 Apr 2026 17:02:11 +0300 Subject: [PATCH 03/21] chore: Targets net10 and refreshes Microsoft.* packages and copyright --- .../Elders.Cronus.Performance.csproj | 2 +- .../Elders.Cronus.Tests.csproj | 2 +- src/Elders.Cronus/Elders.Cronus.csproj | 16 ++++++++-------- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/src/Elders.Cronus.Performance/Elders.Cronus.Performance.csproj b/src/Elders.Cronus.Performance/Elders.Cronus.Performance.csproj index d8bfe457..7987db95 100644 --- a/src/Elders.Cronus.Performance/Elders.Cronus.Performance.csproj +++ b/src/Elders.Cronus.Performance/Elders.Cronus.Performance.csproj @@ -2,7 +2,7 @@ Exe - net8.0;net9.0 + net10.0 enable diff --git a/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj b/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj index 55f6b5f6..046d19f9 100644 --- a/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj +++ b/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj @@ -1,7 +1,7 @@  Library - net8.0;net9.0 + net10.0 true Elders.Cronus diff --git a/src/Elders.Cronus/Elders.Cronus.csproj b/src/Elders.Cronus/Elders.Cronus.csproj index 0a8f7e32..8b65e19e 100644 --- a/src/Elders.Cronus/Elders.Cronus.csproj +++ b/src/Elders.Cronus/Elders.Cronus.csproj @@ -1,7 +1,7 @@  Library - net8.0;net9.0 + net10.0 @@ -15,7 +15,7 @@ Cronus is a lightweight framework for building event driven systems with DDD/CQRS in mind. CQRS DDD ES Event store sourcing cronus - Copyright © 1Software/EldersOSS 2013-2025 + Copyright © Elders LTD 2026 Apache-2.0 true @@ -35,12 +35,12 @@ - - - - - - + + + + + + From 18f3389d55d4147885323ced9d7f15892e719a90 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Tue, 28 Apr 2026 17:19:28 +0300 Subject: [PATCH 04/21] chore: Bumps direct dependencies - CommunityToolkit.HighPerformance 8.4.0 -> 8.4.2 - Cronus.DomainModeling 11.0.0 -> 11.0.2 - Machine.Specifications 1.1.2 -> 1.1.3 --- src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj | 2 +- src/Elders.Cronus/Elders.Cronus.csproj | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj b/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj index 046d19f9..780906c3 100644 --- a/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj +++ b/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj @@ -6,7 +6,7 @@ Elders.Cronus - + diff --git a/src/Elders.Cronus/Elders.Cronus.csproj b/src/Elders.Cronus/Elders.Cronus.csproj index 8b65e19e..ad904f82 100644 --- a/src/Elders.Cronus/Elders.Cronus.csproj +++ b/src/Elders.Cronus/Elders.Cronus.csproj @@ -34,8 +34,8 @@ - - + + From bbb87266cb965f4cc23138ed9bfaa10ab078c1b2 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Tue, 28 Apr 2026 17:37:33 +0300 Subject: [PATCH 05/21] docs: Rewrites and refines Cronus framework documentation - Rewrites ~25 framework doc pages to match Cronus 11.x (handlers, messages, extensibility, event store, projections, configuration, workflows, quick-start) - Refines per audit: rewrites stale snapshots.md as honest stub, fixes ports.md DI sample, corrects workflows.md logging-scope key constants, dash-form anchors, ~70 cross-tree links rewritten as GitHub URLs - Cleans up orphan trees (docs/message-handlers/), folds duplicate signals.md into messages/signals.md, redirects stale events.md, adds async-IPublisher caveat to fault-handling.md, fixes TaskId ctor in versioning.md --- docs/SUMMARY.md | 11 +- docs/cronus-framework/cluster.md | 5 +- docs/cronus-framework/cluster/README.md | 70 ++- docs/cronus-framework/cluster/jobs.md | 75 ++- docs/cronus-framework/concepts/README.md | 7 + docs/cronus-framework/concepts/cqrs.md | 39 +- docs/cronus-framework/concepts/es.md | 81 ++- docs/cronus-framework/configuration.md | 555 ++++++++++++++---- docs/cronus-framework/domain-modeling.md | 3 + .../domain-modeling/README.md | 18 + .../domain-modeling/aggregate.md | 154 +++-- .../domain-modeling/bounded-context.md | 129 +++- .../domain-modeling/commands.md | 58 +- .../domain-modeling/entity.md | 104 +++- .../domain-modeling/events.md | 42 +- .../domain-modeling/handlers/README.md | 62 ++ .../handlers/application-services.md | 109 +++- .../domain-modeling/handlers/gateways.md | 68 ++- .../domain-modeling/handlers/ports.md | 101 +++- .../domain-modeling/handlers/projections.md | 213 +++---- .../domain-modeling/handlers/sagas.md | 144 +++-- .../domain-modeling/handlers/triggers.md | 84 ++- docs/cronus-framework/domain-modeling/ids.md | 168 +++++- .../domain-modeling/messages/README.md | 65 ++ .../messages/application-services.md | 59 +- .../domain-modeling/messages/commands.md | 108 ++-- .../domain-modeling/messages/events.md | 73 ++- .../domain-modeling/messages/public-events.md | 89 ++- .../domain-modeling/messages/signals.md | 145 ++++- .../domain-modeling/multitenancy.md | 171 ++++++ .../domain-modeling/published-language.md | 138 ++++- .../domain-modeling/signals.md | 2 - .../domain-modeling/value-object.md | 102 +++- docs/cronus-framework/event-store/README.md | 75 ++- .../event-store/eventstore-player.md | 52 ++ .../event-store/migrations/README.md | 77 ++- .../event-store/migrations/copy-eventstore.md | 124 ++-- docs/cronus-framework/extensibility/README.md | 27 + .../extensibility/discoveries.md | 169 ++++++ .../extensibility/fault-handling.md | 152 +++++ .../extensibility/observability.md | 169 ++++++ .../extensibility/startup-attribute.md | 121 ++++ docs/cronus-framework/indices.md | 82 ++- docs/cronus-framework/jobs.md | 144 ++++- docs/cronus-framework/messaging/README.md | 76 +++ .../messaging/serialization.md | 77 ++- docs/cronus-framework/projections/README.md | 17 + .../cronus-framework/projections/snapshots.md | 14 + .../projections/versioning.md | 65 ++ docs/cronus-framework/unit-testing.md | 102 +++- docs/cronus-framework/workflows.md | 98 +++- docs/getting-started/quick-start/README.md | 35 +- .../quick-start/explore-projections.md | 207 +++---- .../quick-start/persist-first-event.md | 234 ++++---- docs/getting-started/quick-start/setup.md | 247 +++++--- docs/message-handlers/application-services.md | 53 -- docs/message-handlers/gateways.md | 18 - docs/message-handlers/ports.md | 22 - docs/message-handlers/projections.md | 30 - docs/message-handlers/sagas.md | 22 - docs/message-handlers/triggers.md | 2 - 61 files changed, 4522 insertions(+), 1241 deletions(-) delete mode 100644 docs/cronus-framework/domain-modeling/signals.md create mode 100644 docs/cronus-framework/extensibility/README.md create mode 100644 docs/cronus-framework/extensibility/discoveries.md create mode 100644 docs/cronus-framework/extensibility/fault-handling.md create mode 100644 docs/cronus-framework/extensibility/observability.md create mode 100644 docs/cronus-framework/extensibility/startup-attribute.md create mode 100644 docs/cronus-framework/projections/README.md create mode 100644 docs/cronus-framework/projections/snapshots.md create mode 100644 docs/cronus-framework/projections/versioning.md delete mode 100644 docs/message-handlers/application-services.md delete mode 100644 docs/message-handlers/gateways.md delete mode 100644 docs/message-handlers/ports.md delete mode 100644 docs/message-handlers/projections.md delete mode 100644 docs/message-handlers/sagas.md delete mode 100644 docs/message-handlers/triggers.md diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 2393623b..8accb58f 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -39,11 +39,20 @@ * [EventStore Player](cronus-framework/event-store/eventstore-player.md) * [Migrations](cronus-framework/event-store/migrations/README.md) * [Copy EventStore](cronus-framework/event-store/migrations/copy-eventstore.md) +* [Projections](cronus-framework/projections/README.md) + * [Versioning](cronus-framework/projections/versioning.md) + * [Snapshots](cronus-framework/projections/snapshots.md) * [Workflows](cronus-framework/workflows.md) * [Indices](cronus-framework/indices.md) * [Jobs](cronus-framework/jobs.md) -* [Cluster](cronus-framework/cluster.md) +* [Cluster](cronus-framework/cluster/README.md) + * [Jobs](cronus-framework/cluster/jobs.md) * [Messaging](cronus-framework/messaging/README.md) * [Serialization](cronus-framework/messaging/serialization.md) * [Configuration](cronus-framework/configuration.md) +* [Extensibility](cronus-framework/extensibility/README.md) + * [Discoveries](cronus-framework/extensibility/discoveries.md) + * [Startup Attribute](cronus-framework/extensibility/startup-attribute.md) + * [Fault Handling](cronus-framework/extensibility/fault-handling.md) + * [Observability](cronus-framework/extensibility/observability.md) * [Unit testing](cronus-framework/unit-testing.md) diff --git a/docs/cronus-framework/cluster.md b/docs/cronus-framework/cluster.md index 1bab9bd3..1676492e 100644 --- a/docs/cronus-framework/cluster.md +++ b/docs/cronus-framework/cluster.md @@ -1,4 +1,5 @@ # Cluster -[https://github.com/Elders/Cronus/issues/279](https://github.com/Elders/Cronus/issues/279) - +{% content-ref url="cluster/README.md" %} +[cluster/README.md](cluster/README.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/cluster/README.md b/docs/cronus-framework/cluster/README.md index 02482fbe..6b10dc04 100644 --- a/docs/cronus-framework/cluster/README.md +++ b/docs/cronus-framework/cluster/README.md @@ -1,4 +1,72 @@ # Cluster -TODO +A Cronus service is normally deployed as multiple identical hosts behind a load balancer: the same binary runs on several machines for throughput and fault tolerance. For the message-handling path this works trivially — RabbitMQ delivers each message to exactly one of the workers and there is nothing else to coordinate. For the _long-running, singleton_ work (projection rebuilds, index rebuilds, migration sweeps, replays) the cluster has to agree on one host doing the work at a time, and it has to agree on where that work is _up to_ if the host dies in the middle of it. The cluster subsystem exists to provide that coordination. +## What clustering does + +Concretely, the cluster gives you: + +* **Leader election / singleton execution.** A job declares a `Name` and the cluster guarantees that only one host is actively advancing the job with that name. If that host dies, another host picks up from where the cluster last saw progress. +* **Shared job state.** A job's [`IJobData`](../../../src/Elders.Cronus/Cluster/Job/IJobData.cs) is synced to the cluster through [`IClusterOperations.PingAsync`](../../../src/Elders.Cronus/Cluster/Job/IClusterOperations.cs). When a new host starts running a job, it pings the cluster first and uses whatever state comes back. +* **Cancellation.** Each running job is registered under its name in a `JobManager` (see [`ICronusJobRunner`](../../../src/Elders.Cronus/Cluster/Job/ICronusJobRunner.cs)). Jobs can be cancelled by name from any host. +* **Health signals.** Every host emits a `HeartbeatSignal` on a configurable interval so the cluster can see which hosts are live. The interval is `Cronus:Heartbeat:IntervalInSeconds` (default `5`, validated `5..3600`) — see [Configuration](../configuration.md#cronus.heartbeat.intervalinseconds). + +## The contracts + +The cluster exposes two related contracts under [`Cronus/src/Elders.Cronus/Cluster/Job/`](../../../src/Elders.Cronus/Cluster/Job/): + +```csharp +public interface IClusterOperations +{ + Task PingAsync(CancellationToken cancellationToken = default) where TData : class, new(); + Task PingAsync(TData data, CancellationToken cancellationToken = default) where TData : class, new(); +} + +public interface ICronusJobRunner : IDisposable +{ + Task ExecuteAsync(ICronusJob job, CancellationToken cancellationToken = default); + JobManager JobManager { get; } +} +``` + +`IClusterOperations` is the _"talk to the cluster"_ side: jobs call `PingAsync(Data)` to publish progress and `PingAsync()` to fetch the last-known state. `ICronusJobRunner` is the _"execute work on this host"_ side: it registers the cancellation source under the job name and invokes the job's `RunAsync`. + +Most hosts extend [`AbstractCronusJobRunner`](../../../src/Elders.Cronus/Cluster/Job/ICronusJobRunner.cs) rather than implementing the runner from scratch; the abstract base class does the cancellation book-keeping for you. + +## In-memory default + +Out of the box Cronus wires the cluster contracts to their no-op in-memory implementations: + +* [`InMemoryCronusJobRunner`](../../../src/Elders.Cronus/Cluster/Job/InMemory/InMemoryCronusJobRunner.cs) — executes the job on the current host with no external coordination. +* [`NoClusterOperations`](../../../src/Elders.Cronus/Cluster/Job/InMemory/NoClusterOperations.cs) — `PingAsync()` returns `default`; `PingAsync(data)` returns the data it was given. + +This is what you want during local development and single-host deployments. It is also what [`JobDiscovery`](../../../src/Elders.Cronus/Cluster/Job/JobDiscovery.cs) wires by default. + +## Consul-backed cluster + +The production runner is the satellite package [`Cronus.Cluster.Consul`](https://github.com/Elders/Cronus.Cluster.Consul). It replaces the in-memory runner and operations with a Consul-backed pair: job state lives in the Consul key-value store, leader election uses Consul session locks, and the heartbeat writes into Consul's health checks. + +The Consul package is configured through the standard Cronus configuration surface — see [Configuration](../configuration.md) for the `cronus:cluster:*` options (connection string, acl token, namespace). When the Consul runner is registered it replaces the in-memory one in DI so existing jobs run unchanged. + +## Sub-topics + +{% content-ref url="jobs.md" %} +[jobs.md](jobs.md) +{% endcontent-ref %} + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **must** run a real cluster runner (Consul-backed) in production; the in-memory one is only safe for a single host +* you **should** keep job names stable across releases so the cluster can correlate progress across versions +* you **should** ping the cluster after every meaningful increment so another host can resume work +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **must not** run duplicate instances of the same singleton job without the cluster; each one will think it is alone +* you **should not** put large payloads into `IJobData`; the cluster key-value store is not a document database +{% endhint %} diff --git a/docs/cronus-framework/cluster/jobs.md b/docs/cronus-framework/cluster/jobs.md index b35ad34a..50128b7e 100644 --- a/docs/cronus-framework/cluster/jobs.md +++ b/docs/cronus-framework/cluster/jobs.md @@ -1,2 +1,75 @@ -# Jobs +# Jobs in the cluster +The generic [Jobs](../jobs.md) page describes what a `CronusJob` is and how you write one. This page is about the part that matters only once more than one host is running: how the cluster picks who advances a job, how progress is shared, and what that means for the jobs Cronus ships with. + +## Singleton semantics + +The foundational guarantee the cluster makes is _"at most one host is actively advancing the job with this `Name`"_. This is the property that lets a rebuild run across many machines without double-writing projection rows, and that lets a migration copy events without duplicating them at the destination. + +Implementations achieve it with different primitives. The in-memory [`InMemoryCronusJobRunner`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/InMemory/InMemoryCronusJobRunner.cs) trivially satisfies the rule because there is only one host. The Consul-backed [`Cronus.Cluster.Consul`](https://github.com/Elders/Cronus.Cluster.Consul) runner relies on Consul session locks: acquiring the session for a given job name is what gives a host the right to advance it. If the session expires — because the host died or stopped pinging — Consul releases the lock and another host can take over. + +The fact that jobs are keyed by `Name` has consequences. Cronus tries hard to produce deterministic, tenant-scoped names; see, for example, how the message-counter rebuild factory builds its name: + +```csharp +job.Name = $"urn:{boundedContext.Name}:{contextAccessor.CronusContext.Tenant}:{job.Name}"; +``` + +— from [`RebuildIndex_MessageCounter_JobFactory`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/RebuildIndex_MessageCounter_Job.cs). Every job factory you meet constructs the name from the bounded context, the tenant and a stable suffix, so the cluster can correlate runs across restarts and across hosts. + +## State sharing + +A job's progress is an instance of `IJobData` (see [`IJobData.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/IJobData.cs) for the minimal contract — `IsCompleted` and `Timestamp`). Real jobs carry more: pagination tokens, counters, the projection version being rebuilt, the event-type id being replayed. + +[`CronusJob.SyncInitialStateAsync`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/CronusJob.cs) is the sync point. When a host is about to run a job, it first calls: + +```csharp +Data = await cluster.PingAsync(cancellationToken).ConfigureAwait(false); +if (Data is null) Data = BuildInitialData(); +Data = DataOverride(Data); +``` + +If the cluster has state for this job, the host uses that state. If it does not, the host falls back to the initial data built by the factory — which typically seeds things like the timestamp when the work was first requested. + +After that, every meaningful step of the work is followed by a `Data = await cluster.PingAsync(Data)`. That call does two things: it publishes the current `Data` as the cluster's authoritative state, and it returns whatever the cluster has (which may be the same or may have been overwritten concurrently, though in practice only the singleton lock-holder writes). + +## Leader election + +Leader election is not exposed as a first-class API — it is implicit in `ExecuteAsync`. If the runner decides this host is not currently the leader for a given job name, `ExecuteAsync` will either refuse to advance the work or it will wait for the leader to release the lock. The concrete behaviour is backend-specific (Consul returns fast if the session is taken; the in-memory runner always proceeds). + +If you need to _cancel_ the work from another host, the cluster's mechanism for that is the `JobManager` — call `JobManager.CancelAsync(jobName)` on any host, and the cancellation token propagates through to the runner that currently holds the lock. + +## Work distribution in shipped jobs + +The jobs Cronus ships use the cluster in subtly different ways: + +* [`RebuildIndex_MessageCounter_Job`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/RebuildIndex_MessageCounter_Job.cs) — a singleton sweep over every registered event type. It pings the cluster every 5 seconds (in a background loop) so the cluster sees the host is alive while the sweep runs. +* [`RebuildProjection_Job`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Rebuilding/RebuildProjection_Job.cs) — a singleton per projection version. Progress tokens per event type live in `Data.EventTypePaging` and are pinged to the cluster through `NotifyProgressAsync` inside the `PlayerOperator`. +* [`ReplayPublicEvents_Job`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Players/ReplayPublicEvents_Job.cs) — a singleton per `{recipient bounded context, recipient handlers, source event type}` triple. Different recipients can replay in parallel because their job names differ. + +All three share the same pattern: a short `SyncInitialStateAsync`, a loop that does one unit of work at a time, a `PingAsync(Data)` after each unit, and a `IsCompleted = true; PingAsync(Data)` at the end. + +## Related pages + +{% content-ref url="README.md" %} +[README.md](README.md) +{% endcontent-ref %} + +{% content-ref url="../jobs.md" %} +[jobs.md](../jobs.md) +{% endcontent-ref %} + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **should** keep job names deterministic and versionless; append a revision only when the semantics of the job itself change +* you **should** ping the cluster before any expensive operation so the singleton lock is kept alive +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **must not** rely on a job running on "the same host every time"; the cluster may move it without notice +* you **must not** use `JobManager.CancelAsync` as a regular flow-control mechanism; reserve it for operator intervention +{% endhint %} diff --git a/docs/cronus-framework/concepts/README.md b/docs/cronus-framework/concepts/README.md index e834417d..22dcc5c7 100644 --- a/docs/cronus-framework/concepts/README.md +++ b/docs/cronus-framework/concepts/README.md @@ -1,2 +1,9 @@ # Concepts +Cronus is a framework for building services that are honest about the way their business works. Three ideas are present in every Cronus service and are worth reading about before the rest of the documentation: + +* [**Domain-Driven Design**](ddd.md) — is what the services are _about_. Cronus assumes you have already decided what your core domain is, worked out the ubiquitous language with domain experts, and have settled on the boundaries of each bounded context. Without that, the rest of the framework is a set of answers to questions you have not asked yet. +* [**Event Sourcing**](es.md) — is how Cronus remembers what happened. Aggregates emit events when they change; events are appended to an immutable log; any current-state representation (the aggregate itself, a projection, an audit report) is derived from that log by replaying events. The log is the source of truth and everything else is disposable. +* [**CQRS**](cqrs.md) — is how Cronus separates writes from reads. The write side is a stack of aggregates, application services and an event store tuned for appends. The read side is a stack of projections tuned for the queries each UI actually makes. Commands and queries live on different types, often on different hosts, scaled independently. + +You can build a Cronus service without being deeply fluent in all three — but you cannot build a _good_ one. The rest of this documentation assumes the vocabulary of these three pages. diff --git a/docs/cronus-framework/concepts/cqrs.md b/docs/cronus-framework/concepts/cqrs.md index b83e822c..47831775 100644 --- a/docs/cronus-framework/concepts/cqrs.md +++ b/docs/cronus-framework/concepts/cqrs.md @@ -4,7 +4,44 @@ description: CQRS # Command Query Responsibility Segregation -{% embed url="https://github.com/Elders/Cronus/issues/274" %} +## What CQRS is in Cronus +CQRS is a structural decision: the write side of a service and the read side of a service are different programs. They have different types, different stores, different scaling characteristics, and — often — different hosts. They share the business domain and the event log; they do not share their runtime. +In Cronus the split is explicit: +* **Writes** happen against an [Aggregate](../domain-modeling/aggregate.md). A command arrives, an application service loads the aggregate from the [Event Store](../event-store/README.md), calls a method, and saves the resulting events. The write side is tuned for correctness and for append-only throughput. +* **Reads** happen against a [Projection](../domain-modeling/handlers/projections.md). Projections subscribe to events, update their own store, and are queried through `IProjectionReader`. The read side is tuned for query shape — each projection exists to serve one or a few UI or API reads cheaply. + +Neither side knows about the other at the type level. An application service never references a projection type; a projection never issues a command. Between them sits the [Event Store](../event-store/README.md) and the [Messaging](../messaging/README.md) layer. + +## Commands and queries differ at the wire + +The division shows up in the message taxonomy. Every command in Cronus is an `ICommand` — a strongly-typed instance with a `[DataContract]` attribute, routed through `IPublisher`, delivered through the application-services consumer, handled by a method on an application service that loads an aggregate, mutates it and calls `SaveAsync`. See the [commands reference](../domain-modeling/messages/commands.md). + +Queries, by contrast, do not travel on the bus at all. A query is a `GetAsync(IBlobId)` call against `IProjectionReader`, returning a `ReadResult` — an in-process function call. The UI layer or the API controller injects the reader and asks; no routing, no broker, no retry policy, no "eventually delivered". The read side is synchronous from the caller's point of view and the latency is bounded by how fast the projection store can answer. + +That distinction is load-bearing. Commands have to be durable, at-least-once, and ordered per aggregate — they are business intent. Queries have to be cheap and right-now — they are user experience. Treating them as the same kind of thing would compromise both. + +## Independent scaling and evolution + +Because the two sides are runtime-separate, they scale independently. A read-heavy service might run three projection hosts and one application-services host; a write-heavy migration might run a single projection host and ten application-services hosts. The [`Cronus:*Enabled`](../configuration.md#cronus.applicationservicesenabled) flags are exactly the knobs: turn `ApplicationServicesEnabled` off on a host and it becomes a pure projection reader; turn `ProjectionsEnabled` off and it becomes a pure command processor. + +Evolution is similarly decoupled. A new projection can be added without touching the aggregate — you add the projection type, deploy, let it catch up from the event store. A new command can be added without touching existing projections — the new event it produces only affects projections that subscribe to it. This is the operational payoff of CQRS, and it is the reason Cronus makes the split structural rather than stylistic. + +## Independence from event sourcing + +CQRS does not require event sourcing and event sourcing does not require CQRS, but in a Cronus service they always appear together. The event log is what lets the read side be rebuildable — a new projection version replays events through its updated handler; see [Projections / Versioning](../projections/versioning.md). Without the event log, a CQRS service would need a separate write-to-read synchronisation mechanism, and without the CQRS split, the event log would be trying to serve both the latest-state query and the append-only write path at once. Cronus picks the combination that makes the most of both. + +## What each side owns + +A useful way to think about it: + +* The write side owns the _rules_. Can this performer be added to this concert? The aggregate knows. Can this order be cancelled after shipping? The aggregate knows. Rules are enforced by `AggregateRoot` methods that either produce an event or throw. +* The read side owns the _shape_. What does the list of upcoming concerts look like to a mobile client? A projection. What does the ops dashboard show? A projection. The read side is free to denormalise, aggregate across multiple aggregates, and expose exactly the shape the consumer wants. + +Rules are written once and do not change shape with the UI. Shapes change often and do not change the rules. That is why keeping them in separate pipelines pays. + +## Related + +See also [Event Sourcing](es.md) and [Domain-Driven Design](ddd.md) — the three concepts reinforce each other; they are the load-bearing assumptions behind every Cronus service. diff --git a/docs/cronus-framework/concepts/es.md b/docs/cronus-framework/concepts/es.md index 021bb893..4cacab24 100644 --- a/docs/cronus-framework/concepts/es.md +++ b/docs/cronus-framework/concepts/es.md @@ -4,34 +4,77 @@ description: ES # Event Sourcing -## What is Event Sourcing +## What Event Sourcing is in Cronus -Event Sourcing is a foundational concept in the Cronus framework, emphasizing the storage of state changes as a sequence of events. This approach ensures that every modification to an application's state is captured and stored, facilitating a comprehensive history of state transitions. +An event-sourced service does not persist the current state of its business objects. It persists the sequence of events that describe every change that ever happened to them, and derives the state on demand by replaying those events. Cronus treats the event store as the single source of truth for the write side of the service — everything else (aggregates in memory, projections, indices, analytical reports) is a derived artefact that can be thrown away and recomputed. -## Key Principles of Event Sourcing in Cronus: +Contrast this with an EF- or ORM-based service. There you persist the current row of a `Concert` table, and a change is an `UPDATE`. Any insight about _how_ the concert got to its current shape — when performers were added, when the venue changed, which commands were accepted and rejected — is lost unless you manually write an audit trail. An event-sourced service gets the audit trail for free because the events _are_ the persistence format. - Immutable Events: Each event represents a discrete change in the system and is immutable, ensuring a reliable audit trail. +{% hint style="info" %} +State is a cached summary of a sequence of events. The events are the truth. +{% endhint %} - Event Storage: Events are persistently stored, allowing the system to reconstruct the current state by replaying these events. +## The pattern - State Reconstruction: The current state of an entity is derived by sequentially applying all relevant events, ensuring consistency and traceability. +A Cronus aggregate inherits from `AggregateRoot` and changes its state by calling `Apply(IEvent)`: -## Benefits of Using Event Sourcing with Cronus: +```csharp +public class Concert : AggregateRoot +{ + Concert() { } -{% hint style="success" %} -* Auditability: Maintains a complete history of all changes, facilitating debugging and compliance. -* Scalability: Efficiently handles high-throughput systems by focusing on event storage and processing. -* Flexibility: Supports complex business logic and workflows by modeling state changes as events. -{% endhint %} + public Concert(string name, Venue venue, DateTimeOffset startTime, TimeSpan duration) + { + Apply(new ConcertAnnounced(...)); + } + + public void RegisterPerformer(Performer performer) + { + if (state.HasStarted) throw new InvalidOperationException("..."); + Apply(new PerformerRegistered(...)); + } +} +``` + +The state lives in a separate class with `When(TEvent)` handlers — the only place the state is allowed to change: + +```csharp +public class ConcertState : AggregateRootState +{ + public List Performers { get; private set; } = new(); + + public void When(ConcertAnnounced @event) { /* mutate state */ } + public void When(PerformerRegistered @event) { /* append performer */ } +} +``` + +A command is turned into zero or more events by the aggregate; those events become an `AggregateCommit` in the event store. Rehydrating the aggregate later is symmetric: Cronus loads the stream of past events for that aggregate id, feeds them through `When`, and the state you started from is the state you left off at. For the full walkthrough see [Aggregate](../domain-modeling/aggregate.md). + +## Why Cassandra is a natural store -## Implementing Event Sourcing in Cronus: +An event-sourced workload has a very specific shape: every write is an append, and the vast majority of reads are _"give me all events for this aggregate id in revision order"_. Cassandra's storage model is a direct match — a wide row per aggregate id, ordered clustering by revision, append-only inserts, no updates. Reads are a single partition scan; writes do not compete for a shared row lock. Distribution and replication come for free because that is what Cassandra is for. -Define Events: Create events that represent meaningful changes in the domain. In Cronus, events are immutable and should be named in the past tense to reflect actions that have occurred. -ELDERS CRONUS +Cronus's canonical event-store backend is [`Cronus.Persistence.Cassandra`](https://github.com/Elders/Cronus.Persistence.Cassandra) and it has been in production use since 2013. See [Event Store](../event-store/README.md) for the details of how the contract is implemented on top of Cassandra. -Persist Events: Store events in the event store, which serves as the single source of truth for the system's state. -ELDERS OSS +## Rebuilding projections from events -Rehydrate State: Reconstruct the current state of aggregates by replaying the sequence of events associated with them. +The property you buy with event sourcing — that the event log is the source of truth and everything else is derived — is most visible at the projection layer. A projection is a read model that handlers update as events flow through the system. If you change the projection's shape, the old rows in its store no longer match the new code. With a relational model, that would mean writing a schema migration. With event sourcing, it means _bumping the projection version and replaying the events through the new handler_; see [Projections / Versioning](../projections/versioning.md). + +The same property makes things like _"backfill a new subscriber with every event it cares about since the start of time"_ and _"recompute this analytical report after fixing a bug"_ routine. Both are a call to [`IEventStorePlayer.EnumerateEventStore`](../event-store/eventstore-player.md) with the right event-type filter. + +## When event sourcing helps and when it does not + +Event sourcing is not a free meal. The write side is more complex than a CRUD service, the read side requires an explicit projection pipeline, and the investment only pays off once you lean on the event log for rebuilding state, backfilling subscribers and auditing history. The shape of business that benefits most is also the shape DDD prefers — established, non-trivial, long-lived, with a rich set of distinct state transitions that people actually care about. + +{% hint style="success" %} +* you are building software for an established and already operating business +* audit trails, history and "how did we get here" matter to the business +* new read models keep showing up and you need to rebuild them from historical data +{% endhint %} + +{% hint style="danger" %} +* a single-service CRUD app over a small, stable set of entities +* a throwaway MVP where the events will never outlive the current code +{% endhint %} -By adhering to these principles, Cronus enables developers to build robust, event-driven systems that are both scalable and maintainable. \ No newline at end of file +See also [Domain-Driven Design](ddd.md) and [Command Query Responsibility Segregation](cqrs.md) — the three concepts reinforce each other. diff --git a/docs/cronus-framework/configuration.md b/docs/cronus-framework/configuration.md index 4c9158f9..e4f46e93 100644 --- a/docs/cronus-framework/configuration.md +++ b/docs/cronus-framework/configuration.md @@ -2,235 +2,550 @@ ## Overview -By default, Cronus and its sub-components have good default settings. However, not everything can be auto-configured, such as connection strings to databases or endpoints to various services. +Cronus and its satellite packages are configured through the standard ASP.NET Core configuration system. Every options class binds to a well-known section of `IConfiguration` under the root key `cronus:`, which means you can supply values from `appsettings.json`, environment variables, Consul/Vault providers, command-line arguments, or any other `IConfigurationProvider` you register. + +Defaults are chosen so that a local development environment works out of the box. In production you almost always override at least: + +* `cronus:boundedcontext` — the name of your service +* `cronus:tenants` — the list of tenants the service serves +* The connection strings for Cassandra, RabbitMQ and Redis + +Configuration keys are case-insensitive. The headings below use the exact casing emitted by the option providers as a reference. ## Cronus -| Name | Type | Required | Default Value | -| -------------------------------------------------------------------------------------------------- | --------- | -------- | ------------- | -| [Cronus:BoundedContext](configuration.md#cronus-boundedcontext) | string | yes | | -| [Cronus:Tenants](configuration.md#cronus-tenants-greater-than-greater-than-string-or-required-yes) | string\[] | yes | | -| [Cronus:ApplicationServicesEnabled](configuration.md#cronus-applicationservicesenabled) | bool | no | true | -| [Cronus:ProjectionsEnabled](configuration.md#cronus-projectionsenabled) | bool | no | true | -| [Cronus:PortsEnabled](configuration.md#cronus-portsenabled) | bool | no | true | -| [Cronus:SagasEnabled](configuration.md#cronus-sagasenabled) | bool | no | true | -| [Cronus:GatewaysEnabled](configuration.md#cronus-gatewaysenabled) | bool | no | true | +The core Cronus package exposes two required settings (the bounded context name and the tenant list) plus a family of boolean feature flags controlled through [`CronusHostOptions`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/CronusHostOptions.cs). The flags decide which consumers and background services Cronus starts inside the host process, so you can split a monolith into several specialised processes by toggling them. + +| Name | Type | Required | Default Value | +| --------------------------------------------------------------------------------------- | ---------- | -------- | ------------- | +| [Cronus:BoundedContext](configuration.md#cronus-boundedcontext) | string | yes | | +| [Cronus:Tenants](configuration.md#cronus-tenants) | string\[] | yes | | +| [Cronus:Heartbeat:IntervalInSeconds](configuration.md#cronus-heartbeat-intervalinseconds) | uint | no | 5 | +| [Cronus:ApplicationServicesEnabled](configuration.md#cronus-applicationservicesenabled) | bool | no | true | +| [Cronus:ProjectionsEnabled](configuration.md#cronus-projectionsenabled) | bool | no | true | +| [Cronus:PortsEnabled](configuration.md#cronus-portsenabled) | bool | no | true | +| [Cronus:SagasEnabled](configuration.md#cronus-sagasenabled) | bool | no | true | +| [Cronus:GatewaysEnabled](configuration.md#cronus-gatewaysenabled) | bool | no | true | +| [Cronus:TriggersEnabled](configuration.md#cronus-triggersenabled) | bool | no | true | +| [Cronus:SystemServicesEnabled](configuration.md#cronus-systemservicesenabled) | bool | no | true | +| [Cronus:MigrationsEnabled](configuration.md#cronus-migrationsenabled) | bool | no | false | +| [Cronus:RpcApiEnabled](configuration.md#cronus-rpcapienabled) | bool | no | false | #### Cronus:BoundedContext -Cronus uses this setting to personalize your application. This setting is used for naming the following components: +The logical name of the service, bound from [`BoundedContext`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/BoundedContext.cs). It is marked `[Required]` and validated against the regular expression `^\b([\w\d_]+$)`, so it may contain only alphanumeric characters and underscores. The value is lower-cased and trimmed by `BoundedContextProvider`. + +Cronus uses this value to personalise the infrastructure it creates on your behalf. It is used when naming: -* RabbiMQ exchange and queue names -* Cassandra EventStore names -* Cassandra Projection store names +* RabbitMQ exchanges and queues +* The Cassandra event-store keyspace and tables +* The Cassandra projections-store keyspace and tables -Allowed Characters: `Cronus:BoundedContext` must be an alphanumeric character or underscore only: `^\b([\w\d_]+$)`' +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "boundedcontext": "billing" + } +} +``` +{% endcode %} #### Cronus:Tenants -List of tenants allowed to use the system. Cronus is designed with multitenancy in mind from the beginning and requires at least one tenant to be configured in order to work properly. The multitenancy aspects are applied to many components and to give you a feel about this here is an incomplete list of different parts of the system using this setting: +The list of tenants the service is allowed to serve, bound from [`TenantsOptions`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Multitenancy/TenantsOptions.cs). The array is marked `[Required]`; every element is validated against `^\b([\w\d_]+$)` (alphanumeric plus underscore), lower-cased, trimmed and deduplicated by `TenantsOptionsProvider` before being used. -* Message - every message which is sent through Cronus is bound to a specific _tenant_ -* RabbitMQ exchanges and queues are tenant-aware -* Event Store - every tenant has a separate storage -* Projection Store - every tenant has a separate storage +Cronus is multitenant by design and will refuse to start if the list is empty. Many components of the framework are tenant-aware, including: -Each value you provide in the array is converted and used further to lower. +* Every `CronusMessage` is tagged with the tenant it belongs to +* RabbitMQ exchanges and queues are partitioned per tenant +* The Cassandra event store keeps a dedicated keyspace per tenant +* The Cassandra projection store keeps a dedicated keyspace per tenant -Allowed Characters: `Cronus:Tenants` must be an alphanumeric character or underscore only: `^\b([\w\d_]+$)`' +Once set, the `TenantsOptions` instance is available via dependency injection wherever you need to enumerate the tenants at runtime. See [Multitenancy](domain-modeling/multitenancy.md) for the full model. -Example value: `["tenant1","tenant2","tenant3"]` +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "tenants": [ "tenant1", "tenant2", "tenant3" ] + } +} +``` +{% endcode %} + +#### Cronus:Heartbeat:IntervalInSeconds -Once set you could use [`TenantsOptions`](https://github.com/Elders/Cronus/tree/f14b4918aa5862a73a0789cc868b5f08258410ea/src/Elders.Cronus/Multitenancy/TenantsOptions.cs) object via Dependency Injection for other purposes. +The interval, in seconds, between the signal messages Cronus emits to prove the host is alive. Bound from [`HeartbeatOptions`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/Heartbeat/HeartbeatOptions.cs). Validated with `[Range(5, 3600)]`. + +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "heartbeat": { + "intervalinseconds": 5 + } + } +} +``` +{% endcode %} #### Cronus:ApplicationServicesEnabled -Specifies whether to start a consumer for the Application Services +Starts the consumer that dispatches commands to application services (command handlers). Turn it off on processes that are not supposed to handle commands — for example a read-only projections host. #### Cronus:ProjectionsEnabled -Specifies whether to start a consumer for the Projections +Starts the consumer that writes events into the projection store. Turn it off on processes that only dispatch commands or act as API gateways, or when you want to pause projections while running a migration. #### Cronus:PortsEnabled -Specifies whether to start a consumer for the Ports +Starts the consumer that dispatches events to ports (outbound integrations such as sending e-mail or calling a third-party API). Disable it where you do not want outbound side effects, typically on read-only replicas. #### Cronus:SagasEnabled -Specifies whether to start a consumer for the Sagas +Starts the consumer that dispatches events to sagas (long-running processes). Disable it on hosts that must not advance saga state. #### Cronus:GatewaysEnabled -Specifies whether to start a consumer for the Gateways +Starts the consumer that dispatches events to gateways (components that publish to a second transport, typically the public RabbitMQ bus). Disable it on hosts that should not re-publish events outside the service. -## Cronus.Api +#### Cronus:TriggersEnabled -| Name | Type | Required | Default Value | -| --------------------------------------------------------------- | -------------------- | -------- | ------------- | -| [Cronus:Api:Kestrel](configuration.md#hosting) | configurationSection | no | | -| [Cronus:Api:JwtAuthentication](configuration.md#authentication) | configurationSection | no | | +Starts the consumer that dispatches events to triggers (hooks that react to specific events). Disable it on hosts where triggers should not fire. -### Hosting +#### Cronus:SystemServicesEnabled -The API is hosted with Kestrel. +Starts Cronus-internal system services such as the heartbeat publisher. Leave it on unless you are deliberately running a worker that must emit no system signals. -By default, the API is hosted on port `7477`. +#### Cronus:MigrationsEnabled -A configuration could be provided by [KestrelOptions](https://docs.microsoft.com/en-us/aspnet/core/fundamentals/servers/kestrel?view=aspnetcore-3.0#kestrel-options). You can supply them directly in the DI or through a configuration file. +Starts the replay/migration consumer. Defaults to `false` because running migrations in the same process that serves production traffic is usually undesirable; enable it on a dedicated migrator host while a migration is in progress. -#### Cronus:Api:Kestrel +#### Cronus:RpcApiEnabled -``` +Starts the RPC API host that exposes request/response endpoints over RabbitMQ (see [`RpcApiDiscovery`](https://github.com/Elders/Cronus.Transport.RabbitMQ/blob/master/src/Elders.Cronus.Transport.RabbitMQ/RpcAPI/RpcApiDiscovery.cs)). Defaults to `false`; enable it on the process that should answer RPC calls. + +## Cronus.Api + +`Cronus.Api` hosts an ASP.NET Core endpoint with Kestrel. Two configuration sections are read directly by the host at startup: + +| Name | Type | Required | Default Value | +| --------------------------------------------------------------- | --------------------- | -------- | ------------- | +| [Cronus:Api:Kestrel](configuration.md#cronus-api-kestrel) | configuration section | no | | +| [Cronus:Api:JwtAuthentication](configuration.md#cronus-api-jwtauthentication) | configuration section | no | | + +### Cronus:Api:Kestrel + +The API is hosted with Kestrel on port `7477` by default. The whole section is forwarded to the [standard Kestrel options](https://learn.microsoft.com/aspnet/core/fundamentals/servers/kestrel/options), so any endpoint, certificate or HTTP-protocol setting Kestrel understands is valid here. + +{% code title="appsettings.json" %} +```json { - "Cronus": { - "Api": { - "Kestrel": { - "Endpoints": { - "Https": { - "Url": "https://*:7477", - "Certificate": { - "Subject": "*.example.com", - "Store": "My", - "Location": "CurrentUser", - "AllowInvalid": "true" - } - } - } + "Cronus": { + "Api": { + "Kestrel": { + "Endpoints": { + "Https": { + "Url": "https://*:7477", + "Certificate": { + "Subject": "*.example.com", + "Store": "My", + "Location": "CurrentUser", + "AllowInvalid": "true" } + } } + } } + } } ``` +{% endcode %} -### Authentication +### Cronus:Api:JwtAuthentication -The API could be protected using a JWT bearer authentication. +The API can be protected with JWT bearer authentication. If this section exists in configuration, `Cronus.Api.Startup` enables authentication and forwards the section to [`JwtBearerOptions`](https://learn.microsoft.com/dotnet/api/microsoft.aspnetcore.authentication.jwtbearer.jwtbeareroptions). If the section is absent, the API runs without authentication. -The configuration is provided by [JwtBearerOptions](https://docs.microsoft.com/en-us/dotnet/api/microsoft.aspnetcore.builder.jwtbeareroptions?view=aspnetcore-1.1\&viewFallbackFrom=aspnetcore-2.2). You can supply them directly in the DI or through a configuration file. - -#### Cronus:Api:JwtAuthentication - -``` +{% code title="appsettings.json" %} +```json { - "Cronus": { - "Api": { - "JwtAuthentication": { - "Authority": "https://example.com", - "Audience": "https://example.com/resources" - } - } + "Cronus": { + "Api": { + "JwtAuthentication": { + "Authority": "https://example.com", + "Audience": "https://example.com/resources" + } } + } } ``` +{% endcode %} -Remarks: [https://stackoverflow.com/a/58736850/224667](https://stackoverflow.com/a/58736850/224667) +Additional remarks: [https://stackoverflow.com/a/58736850/224667](https://stackoverflow.com/a/58736850/224667). ## Cronus.Persistence.Cassandra +Configuration for the Cassandra-backed event store, bound from [`CassandraProviderOptions`](https://github.com/Elders/Cronus.Persistence.Cassandra/blob/master/src/Elders.Cronus.Persistence.Cassandra/CassandraProviderOptions.cs). + | Name | Type | Required | Default Value | | --------------------------------------------------------------------------------------------------------------------- | --------- | -------- | ------------- | | [Cronus:Persistence:Cassandra:ConnectionString](configuration.md#cronus-persistence-cassandra-connectionstring) | string | yes | | | [Cronus:Persistence:Cassandra:ReplicationStrategy](configuration.md#cronus-persistence-cassandra-replicationstrategy) | string | no | simple | -| [Cronus:Persistence:Cassandra:ReplicationFactor](configuration.md#cronus-persistence-cassandra-replicationstrategy) | int | no | 1 | -| [Cronus:Persistence:Cassandra:Datacenters](configuration.md#cronus-persistence-cassandra-replicationstrategy) | string\[] | no | | +| [Cronus:Persistence:Cassandra:ReplicationFactor](configuration.md#cronus-persistence-cassandra-replicationfactor) | int | no | 1 | +| [Cronus:Persistence:Cassandra:Datacenters](configuration.md#cronus-persistence-cassandra-datacenters) | string\[] | no | \[] | #### Cronus:Persistence:Cassandra:ConnectionString -The connection to the Cassandra database server +The connection string to the Cassandra cluster that holds the event store keyspaces. #### Cronus:Persistence:Cassandra:ReplicationStrategy -Configures Cassandra replication strategy. This setting has effect only in the first run when creating the database. +The replication strategy used when Cronus creates the event-store keyspace for a new tenant. This setting only has an effect when a keyspace is first provisioned. Valid values: -* simple -* network\_topology - when using this setting you need to specify `Cronus:Persistence:Cassandra:ReplicationFactor` and `Cronus:Persistence:Cassandra:Datacenters` as well +* `simple` +* `network_topology` — when used you must also provide `ReplicationFactor` and `Datacenters`. + +#### Cronus:Persistence:Cassandra:ReplicationFactor + +The replication factor used together with `ReplicationStrategy`. Only used at keyspace creation. + +#### Cronus:Persistence:Cassandra:Datacenters + +The datacenter names used with the `network_topology` strategy. Unused for `simple`. + +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "persistence": { + "cassandra": { + "connectionstring": "Contact Points=cassandra;Default Keyspace=billing", + "replicationstrategy": "network_topology", + "replicationfactor": 3, + "datacenters": [ "dc1", "dc2" ] + } + } + } +} +``` +{% endcode %} ## Cronus.Projections.Cassandra -| Name | Type | Required | Default Value | -| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | -------- | ------------- | -| [Cronus:Projections:Cassandra:ConnectionString](configuration.md#cronus-projections-cassandra-connectionstring) | string | yes | | -| [Cronus:Pojections:Cassandra:ReplicationStrategy](configuration.md#cronus-projections-cassandra-replicationstrategy) | string | no | simple | -| [Cronus:Pojections:Cassandra:ReplicationFactor](configuration.md#cronus-projections-cassandra-replicationfactor) | int | no | 1 | -| [Cronus:Pojections:Cassandra:Datacenters](configuration.md#cronus-projections-cassandra-datacenters) | string\[] | no | | -| [Cronus:Pojections:Cassandra:TableRetention:DeleteOldProjectionTables](configuration.md#cronus-projections-cassandra-tableretention-deleteoldprojectiontables) | boolean | no | true | -| [Cronus:Projections:Cassandra:TableRetention:NumberOfOldProjectionTablesToRetain](configuration.md#cronus-projections-cassandra-tableretention-numberofoldprojectiontablestoretain) | uint | no | 2 | +Configuration for the Cassandra-backed projection store. The main connection settings come from [`CassandraProviderOptions`](https://github.com/Elders/Cronus.Projections.Cassandra/blob/master/src/Elders.Cronus.Projections.Cassandra/Infrastructure/CassandraProviderOptions.cs); a dedicated `TableRetention` sub-section controls how Cronus prunes old projection tables during a replay. + +| Name | Type | Required | Default Value | +| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------- | -------- | ------------- | +| [Cronus:Projections:Cassandra:ConnectionString](configuration.md#cronus-projections-cassandra-connectionstring) | string | yes | | +| [Cronus:Projections:Cassandra:ReplicationStrategy](configuration.md#cronus-projections-cassandra-replicationstrategy) | string | no | simple | +| [Cronus:Projections:Cassandra:ReplicationFactor](configuration.md#cronus-projections-cassandra-replicationfactor) | int | no | 1 | +| [Cronus:Projections:Cassandra:Datacenters](configuration.md#cronus-projections-cassandra-datacenters) | string\[] | no | \[] | +| [Cronus:Projections:Cassandra:TableRetention:DeleteOldProjectionTables](configuration.md#cronus-projections-cassandra-tableretention-deleteoldprojectiontables) | bool | no | false | +| [Cronus:Projections:Cassandra:TableRetention:NumberOfOldProjectionTablesToRetain](configuration.md#cronus-projections-cassandra-tableretention-numberofoldprojectiontablestoretain) | uint | no | 2 | -#### `Cronus:Projections:Cassandra:ConnectionString` +#### Cronus:Projections:Cassandra:ConnectionString -The connection to the Cassandra database server +The connection string to the Cassandra cluster that holds the projection keyspaces. -#### `Cronus:Projections:Cassandra:ReplicationStrategy` +#### Cronus:Projections:Cassandra:ReplicationStrategy -Configures Cassandra replication strategy. This setting has effect only in the first run when creating the database. +The replication strategy used when Cronus creates the projections keyspace for a new tenant. Only effective at keyspace creation. Valid values: -* simple -* network\_topology - when using this setting you need to specify `Cronus:Projections:Cassandra:ReplicationFactor` and `Cronus:Projections:Cassandra:Datacenters` as well +* `simple` +* `network_topology` — when used you must also provide `ReplicationFactor` and `Datacenters`. + +#### Cronus:Projections:Cassandra:ReplicationFactor + +The replication factor used with `ReplicationStrategy`. Only used at keyspace creation. + +#### Cronus:Projections:Cassandra:Datacenters + +The datacenter names used with the `network_topology` strategy. Unused for `simple`. + +#### Cronus:Projections:Cassandra:TableRetention:DeleteOldProjectionTables + +When a projection is rebuilt (replayed), Cronus creates a fresh table and, once the replay finishes, can optionally drop the older versions. This flag toggles that behaviour. The default is `false` — old tables are kept forever unless you opt in. The feature is experimental and was originally introduced for Cosmos DB, where every table has a direct cost. + +#### Cronus:Projections:Cassandra:TableRetention:NumberOfOldProjectionTablesToRetain + +When `DeleteOldProjectionTables` is enabled, this value controls how many previous versions of a projection table are kept around before older ones are deleted. The current (live) table is never counted against this limit; a value of `2` therefore means "the live table plus the two most recent historical tables". + +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "projections": { + "cassandra": { + "connectionstring": "Contact Points=cassandra;Default Keyspace=billing_projections", + "replicationstrategy": "simple", + "replicationfactor": 1, + "tableretention": { + "deleteoldprojectiontables": false, + "numberofoldprojectiontablestoretain": 2 + } + } + } + } +} +``` +{% endcode %} + +## Cronus.Transport.RabbitMQ -#### `Cronus:Projections:Cassandra:ReplicationFactor` +Configuration for the primary (private) RabbitMQ connection, bound from [`RabbitMqOptions`](https://github.com/Elders/Cronus.Transport.RabbitMQ/blob/master/src/Elders.Cronus.Transport.RabbitMQ/RabbitMqOptions.cs) under the `cronus:transport:rabbitmq` section. This is the bus Cronus uses to route commands, events and signals between the services that share the same trust boundary. -#### `Cronus:Projections:Cassandra:Datacenters` +| Name | Type | Required | Default Value | +| ------------------------------------------------------------------------------------------------------- | --------------------- | -------- | ------------- | +| [Cronus:Transport:RabbitMQ:Server](configuration.md#cronus-transport-rabbitmq-server) | string | no | 127.0.0.1 | +| [Cronus:Transport:RabbitMQ:Port](configuration.md#cronus-transport-rabbitmq-port) | int | no | 5672 | +| [Cronus:Transport:RabbitMQ:VHost](configuration.md#cronus-transport-rabbitmq-vhost) | string | no | / | +| [Cronus:Transport:RabbitMQ:Username](configuration.md#cronus-transport-rabbitmq-username) | string | no | guest | +| [Cronus:Transport:RabbitMQ:Password](configuration.md#cronus-transport-rabbitmq-password) | string | no | guest | +| [Cronus:Transport:RabbitMQ:AdminPort](configuration.md#cronus-transport-rabbitmq-adminport) | int | no | 5672 | +| [Cronus:Transport:RabbitMQ:ApiAddress](configuration.md#cronus-transport-rabbitmq-apiaddress) | string | no | | +| [Cronus:Transport:RabbitMQ:BoundedContext](configuration.md#cronus-transport-rabbitmq-boundedcontext) | string | no | implicit | +| [Cronus:Transport:RabbitMQ:UseAsyncDispatcher](configuration.md#cronus-transport-rabbitmq-useasyncdispatcher) | bool | no | true | +| [Cronus:Transport:RabbitMQ:FederatedExchange](configuration.md#cronus-transport-rabbitmq-federatedexchange) | configuration section | no | | +| [Cronus:Transport:RabbitMQ:ExternalServices](configuration.md#cronus-transport-rabbitmq-externalservices) | object\[] | no | \[] | -#### `Cronus:Projections:Cassandra:TableRetention:DeleteOldProjectionTables` +#### Cronus:Transport:RabbitMQ:Server -Enables deletion of old projection tables +DNS name or IP address of the RabbitMQ broker. Multiple endpoints can be provided as a comma-separated list; they are parsed with `AmqpTcpEndpoint.ParseMultiple`. -#### `Cronus:Projections:Cassandra:TableRetention:NumberOfOldProjectionTablesToRetain` +#### Cronus:Transport:RabbitMQ:Port -Configures Cassandra number of old projection tables -> default: live table and 2 old tables +The AMQP port of the RabbitMQ broker. -## Cronus.Transport.RabbitMq +#### Cronus:Transport:RabbitMQ:VHost -#### `Cronus:Transport:RabbiMQ:ConsumerWorkersCount` >> _integer | Required: Yes | Default: 5_ +The virtual host to use. It is good practice not to use the default `/` vhost; see the [RabbitMQ virtual hosts documentation](https://www.rabbitmq.com/vhosts.html). Cronus does not use vhosts to implement multitenancy. -Configures the number of threads which will be dedicated to consuming messages from RabbitMQ for _every_ consumer. +#### Cronus:Transport:RabbitMQ:Username -#### `Cronus:Transport:RabbiMQ:Server` >> _string | Required: Yes | Default: 127.0.0.1_ +Username used by the AMQP connection. -DNS or IP to the RabbitMQ server +#### Cronus:Transport:RabbitMQ:Password -#### `Cronus:Transport:RabbiMQ:Port` >> _integer | Required: Yes | Default: 5672_ +Password used by the AMQP connection. -The port number on which the RabbitMQ server is running +#### Cronus:Transport:RabbitMQ:AdminPort -#### `Cronus:Transport:RabbiMQ:VHost` >> _string | Required: Yes | Default: /_ +The HTTP port of the RabbitMQ management API. Cronus uses this port together with `ApiAddress` to create and delete exchanges, queues, policies and federation links. -The name of the virtual host. It is a good practice to not use the default `/` vhost. For more details see the [official docs](https://www.rabbitmq.com/vhosts.html). Cronus is not using this for managing multitenancy. +#### Cronus:Transport:RabbitMQ:ApiAddress -#### `Cronus:Transport:RabbiMQ:Username` >> _string | Required: Yes | Default: guest_ +Base URL of the RabbitMQ management API (for example `http://rabbitmq:15672`). When not set Cronus falls back to `Server` and `AdminPort`. -The RabbitMQ username +#### Cronus:Transport:RabbitMQ:BoundedContext -#### `Cronus:Transport:RabbiMQ:Password` >> _string | Required: Yes | Default: guest_ +The bounded context this particular RabbitMQ connection belongs to. Defaults to `implicit`, which means "this service's own bounded context". You only need to override it when declaring an entry inside `ExternalServices` that points at another bounded context. -The RabbitMQ password +#### Cronus:Transport:RabbitMQ:UseAsyncDispatcher -#### `Cronus:Transport:RabbiMQ:AdminPort` >> _integer | Required: Yes | Default: 5672_ +Whether the RabbitMQ client uses the async dispatcher. Leave it at the default (`true`) unless you have a specific reason to fall back to the synchronous dispatcher. -RabbitMQ admin port used to create, delete rabbitmq resources +#### Cronus:Transport:RabbitMQ:FederatedExchange + +Settings for a RabbitMQ federated exchange. Currently only one property is supported: + +* `MaxHops` (`int`, default `1`) — the maximum hops a federated message is allowed to travel. + +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "transport": { + "rabbitmq": { + "federatedexchange": { + "maxhops": 1 + } + } + } + } +} +``` +{% endcode %} + +#### Cronus:Transport:RabbitMQ:ExternalServices + +An optional array of additional RabbitMQ connections, one per external bounded context that this service needs to reach over its own transport. Each item has the same shape as the root `RabbitMqOptions` (including `BoundedContext`). If a property is left at its default value on an entry, Cronus fills it in with the value from the root section. `ExternalServices` is bound with `BindNonPublicProperties = true` because the property itself is internal to the options class. + +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "transport": { + "rabbitmq": { + "server": "rabbitmq.internal", + "username": "service", + "password": "secret", + "externalservices": [ + { + "boundedcontext": "identity", + "server": "rabbitmq.identity.internal" + } + ] + } + } + } +} +``` +{% endcode %} + +## Cronus.Transport.RabbitMQ — Public RabbitMQ + +A second, independent RabbitMQ connection used by the public-event fan-out. It is bound from [`PublicRabbitMqOptions`](https://github.com/Elders/Cronus.Transport.RabbitMQ/blob/master/src/Elders.Cronus.Transport.RabbitMQ/PublicRabbitMqOptions.cs) under the `cronus:transport:publicrabbitmq` section. Use it when you want to publish a subset of events onto a broker that lives outside your trust boundary, leaving the private broker untouched. + +| Name | Type | Required | Default Value | +| ------------------------------------------------------------------------------------------------------------------- | --------------------- | -------- | ------------- | +| [Cronus:Transport:PublicRabbitMQ:Server](configuration.md#cronus-transport-publicrabbitmq-server) | string | no | 127.0.0.1 | +| [Cronus:Transport:PublicRabbitMQ:Port](configuration.md#cronus-transport-publicrabbitmq-port) | int | no | 5672 | +| [Cronus:Transport:PublicRabbitMQ:VHost](configuration.md#cronus-transport-publicrabbitmq-vhost) | string | no | / | +| [Cronus:Transport:PublicRabbitMQ:Username](configuration.md#cronus-transport-publicrabbitmq-username) | string | no | guest | +| [Cronus:Transport:PublicRabbitMQ:Password](configuration.md#cronus-transport-publicrabbitmq-password) | string | no | guest | +| [Cronus:Transport:PublicRabbitMQ:AdminPort](configuration.md#cronus-transport-publicrabbitmq-adminport) | int | no | 5672 | +| [Cronus:Transport:PublicRabbitMQ:ApiAddress](configuration.md#cronus-transport-publicrabbitmq-apiaddress) | string | no | | +| [Cronus:Transport:PublicRabbitMQ:UseAsyncDispatcher](configuration.md#cronus-transport-publicrabbitmq-useasyncdispatcher) | bool | no | false | +| [Cronus:Transport:PublicRabbitMQ:FederatedExchange](configuration.md#cronus-transport-publicrabbitmq-federatedexchange) | configuration section | no | `{ MaxHops = 1 }` | + +The semantics of each key match the corresponding one on the private connection; the only differences are: + +* `UseAsyncDispatcher` defaults to `false`. +* `FederatedExchange` is always initialised (to `{ MaxHops = 1 }`) rather than left null, because public events are typically federated across brokers. +* `PublicRabbitMqOptions.GetUpstreamUris()` parses `Server` with `AmqpTcpEndpoint.ParseMultiple` and combines it with `VHost` to build the upstream URIs for federation. + +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "transport": { + "publicrabbitmq": { + "server": "rabbitmq.public.example.com", + "port": 5672, + "vhost": "/public", + "username": "public-service", + "password": "secret", + "apiaddress": "https://rabbitmq.public.example.com:15672", + "federatedexchange": { + "maxhops": 1 + } + } + } + } +} +``` +{% endcode %} + +## Cronus.Transport.RabbitMQ — Consumer + +Consumer-side settings, bound from [`RabbitMqConsumerOptions`](https://github.com/Elders/Cronus.Transport.RabbitMQ/blob/master/src/Elders.Cronus.Transport.RabbitMQ/RabbitMqConsumerOptions.cs) under the `cronus:transport:rabbitmq:consumer` section. This replaces the legacy flat `Cronus:Transport:RabbitMq:ConsumerWorkersCount` setting that earlier versions used. + +| Name | Type | Required | Default Value | +| --------------------------------------------------------------------------------------------------------- | ---- | -------- | ------------- | +| [Cronus:Transport:RabbitMQ:Consumer:WorkersCount](configuration.md#cronus-transport-rabbitmq-consumer-workerscount) | int | no | 10 | +| [Cronus:Transport:RabbitMQ:Consumer:RpcTimeout](configuration.md#cronus-transport-rabbitmq-consumer-rpctimeout) | int | no | 10 | +| [Cronus:Transport:RabbitMQ:Consumer:FanoutMode](configuration.md#cronus-transport-rabbitmq-consumer-fanoutmode) | bool | no | false | + +#### Cronus:Transport:RabbitMQ:Consumer:WorkersCount + +The number of worker threads each Cronus consumer (application services, projections, ports, sagas, gateways, triggers) spins up for processing messages pulled from RabbitMQ. Validated with `[Range(1, int.MaxValue)]`. + +#### Cronus:Transport:RabbitMQ:Consumer:RpcTimeout + +The timeout, in seconds, Cronus waits for a response when dispatching an RPC call over RabbitMQ. + +#### Cronus:Transport:RabbitMQ:Consumer:FanoutMode + +Drastically changes the infrastructure layout. When enabled, each node creates its own queue and every message is delivered to every node, i.e. the bus behaves as a fan-out rather than a worker pool. Use with care; the default `false` gives you classic competing-consumers semantics. + +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "transport": { + "rabbitmq": { + "consumer": { + "workerscount": 10, + "rpctimeout": 10, + "fanoutmode": false + } + } + } + } +} +``` +{% endcode %} ## Cronus.AtomicAction.Redis -An implementation of `Cronus.AtomicAction` using distributed locks with Redis +The Redis-backed implementation of `Cronus.AtomicAction` uses distributed locks following the [Redlock algorithm](https://redis.io/topics/distlock). The `cronus:atomicaction:redis` configuration section is bound by **two** option providers in this package: + +* [`RedisAtomicActionOptions`](https://github.com/Elders/Cronus.AtomicAction.Redis/blob/master/src/Elders.Cronus.AtomicAction.Redis/Config/RedisAtomicActionOptions.cs) — the atomic-action TTLs and connection string. +* [`RedLockOptions`](https://github.com/Elders/RedLock/blob/master/src/RedLock/RedLockOptions.cs) (from the `Elders.RedLock` library) — the Redlock-specific retry and drift settings. -(_Source:_ [https://redis.io/topics/distlock](https://redis.io/topics/distlock)) +Both classes read from the same section, so every key below goes into the same `cronus:atomicaction:redis` object in `appsettings.json`. -#### `Cronus:AtomicAction:Redis:ConnectionString` >> _string | Required: Yes_ +### RedisAtomicActionOptions -Configures the connection string where Redis is located +| Name | Type | Required | Default Value | +| ----------------------------------------------------------------------------------- | -------- | -------- | --------------- | +| [Cronus:AtomicAction:Redis:ConnectionString](configuration.md#cronus-atomicaction-redis-connectionstring) | string | yes | | +| [Cronus:AtomicAction:Redis:LockTtl](configuration.md#cronus-atomicaction-redis-lockttl) | TimeSpan | no | 00:00:01.000 | +| [Cronus:AtomicAction:Redis:LongTtl](configuration.md#cronus-atomicaction-redis-longttl) | TimeSpan | no | 00:00:05.000 | -#### `Cronus:AtomicAction:Redis:LockTtl` >> _TimeSpan | Required: No | Default: 00:00:01.000_ +#### Cronus:AtomicAction:Redis:ConnectionString -#### `Cronus:AtomicAction:Redis:ShorTtl` >> _TimeSpan | Required: No | Default: 00:00:01.000_ +Connection string to the Redis instance (or cluster) that holds the locks. Marked `[Required]`. -#### `Cronus:AtomicAction:Redis:LongTtl` >> _TimeSpan | Required: No | Default: 00:00:05.000_ +#### Cronus:AtomicAction:Redis:LockTtl -#### `Cronus:AtomicAction:Redis:LockRetryCount` >> _int | Required: No | Default: 3_ +The TTL applied at the start of an atomic action. While this lock is held no other node may execute an action against the same aggregate root + revision. Defaults to `1 second`. -#### `Cronus:AtomicAction:Redis:LockRetryDelay` >> _TimeSpan | Required: No | Default: 00:00:00.100_ +#### Cronus:AtomicAction:Redis:LongTtl -#### `Cronus:AtomicAction:Redis:ClockDriveFactor` >> _double | Required: No | Default: 0.01_ +A second, longer TTL applied after the atomic action has succeeded. It prevents a late-arriving node from overwriting the last action on the same aggregate + revision. Defaults to `5 seconds` and does not interfere with the normal action flow. + +### RedLockOptions + +| Name | Type | Required | Default Value | +| ------------------------------------------------------------------------------------------------------ | -------- | -------- | --------------- | +| [Cronus:AtomicAction:Redis:ConnectionString](configuration.md#cronus-atomicaction-redis-connectionstring) | string | yes | | +| [Cronus:AtomicAction:Redis:LockRetryCount](configuration.md#cronus-atomicaction-redis-lockretrycount) | ushort | no | 1 | +| [Cronus:AtomicAction:Redis:LockRetryDelay](configuration.md#cronus-atomicaction-redis-lockretrydelay) | TimeSpan | no | 00:00:00.010 | +| [Cronus:AtomicAction:Redis:ClockDriveFactor](configuration.md#cronus-atomicaction-redis-clockdrivefactor) | double | no | 0.01 | + +#### Cronus:AtomicAction:Redis:LockRetryCount + +How many times the Redlock client retries acquiring a contended lock before giving up. Defaults to `1`. + +#### Cronus:AtomicAction:Redis:LockRetryDelay + +How long the client waits between retries. Defaults to `10 ms`. + +#### Cronus:AtomicAction:Redis:ClockDriveFactor + +Safety margin used by Redlock to compensate for clock drift between Redis nodes. See the [Redlock safety arguments](https://redis.io/docs/manual/patterns/distributed-locks/#safety-arguments) for the full derivation. Defaults to `0.01`. + +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "atomicaction": { + "redis": { + "connectionstring": "redis:6379", + "lockttl": "00:00:01.000", + "longttl": "00:00:05.000", + "lockretrycount": 1, + "lockretrydelay": "00:00:00.010", + "clockdrivefactor": 0.01 + } + } + } +} +``` +{% endcode %} diff --git a/docs/cronus-framework/domain-modeling.md b/docs/cronus-framework/domain-modeling.md index e98cbba3..53aa3200 100644 --- a/docs/cronus-framework/domain-modeling.md +++ b/docs/cronus-framework/domain-modeling.md @@ -1,2 +1,5 @@ # Domain Modeling +{% content-ref url="domain-modeling/README.md" %} +[README.md](domain-modeling/README.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/README.md b/docs/cronus-framework/domain-modeling/README.md index e98cbba3..ba96d96c 100644 --- a/docs/cronus-framework/domain-modeling/README.md +++ b/docs/cronus-framework/domain-modeling/README.md @@ -1,2 +1,20 @@ # Domain Modeling +Cronus is built to let you express Domain-Driven Design tactical patterns with minimal friction. Aggregates, entities, value objects, commands, events, and signals all have first-class primitives and serialize/route through a predictable, conventions-first pipeline. If you are new to the DDD vocabulary, start with [`concepts/ddd.md`](../concepts/ddd.md); if you are new to event sourcing, read [`concepts/es.md`](../concepts/es.md). + +The pages below cover the tactical building blocks you assemble into a Cronus-powered service. + +| Page | What it covers | +|---|---| +| [Aggregate](aggregate.md) | `AggregateRoot` and `AggregateRootState` — the unit of consistency. | +| [Entity](entity.md) | `Entity` — identified objects that live inside an aggregate boundary. | +| [Value Object](value-object.md) | Immutable, equality-by-value types — Cronus does not ship a base class; use `record` or hand-written equality. | +| [IDs](ids.md) | `Urn`, `AggregateRootId`, `EntityId` — stable identity on the wire. | +| [Bounded Context](bounded-context.md) | The `Cronus:BoundedContext` setting and the `DataContract.Namespace` convention. | +| [Published Language](published-language.md) | `[DataContract(Name = "")]` and how GUIDs decouple serialization from class names. | +| [Multitenancy](multitenancy.md) | `CronusContext.Tenant`, `TenantsOptions`, and the resolver chain. | +| [Signals](signals.md) | `ISignal`, `ISignalHandle`, and `ITrigger` — ambient pub/sub broadcasts. | +| [Messages](messages/) | Commands, events, public events, signals — the four lanes of the message bus. | +| [Handlers](handlers/) | Application services, gateways, ports, projections, sagas, triggers. | + +Each page is self-contained and cross-references the others where a single concept spans multiple files. diff --git a/docs/cronus-framework/domain-modeling/aggregate.md b/docs/cronus-framework/domain-modeling/aggregate.md index b5142805..07ca57b6 100644 --- a/docs/cronus-framework/domain-modeling/aggregate.md +++ b/docs/cronus-framework/domain-modeling/aggregate.md @@ -1,106 +1,144 @@ # Aggregate -Aggregates represent the business models explicitly. They are designed to fully match any needed requirements. Any change done to an instance of an aggregate goes through the aggregate root. +An **aggregate** is a cluster of domain objects treated as a single consistency boundary. Every change enters through the root; every invariant is enforced inside the root. With Cronus, an aggregate is event-sourced — its state is the fold of the events it has produced. + +Three classes work together: + +* `AggregateRoot` — the entry point for behaviour. It calls `Apply(IEvent)` to record a change. +* `AggregateRootState` — the snapshot of current data. It reacts to events via `public void When(TEvent e)`. +* `AggregateRootId` — the aggregate's URN-based identity. ## Aggregate root -Creating an aggregate root with Cronus is as simple as writing a class that inherits`AggregateRoot` and a class for the state of the aggregate root. To publish an event from an aggregate root use the `Apply(IEvent @event)` method provided by the base class. +Inherit `AggregateRoot`. Keep a private parameterless constructor so Cronus can rehydrate the aggregate during replay, and expose behaviour as plain methods that call `Apply` to record events. +{% code title="TaskAggregate.cs" %} ```csharp -public class Concert : AggregateRoot +public class TaskAggregate : AggregateRoot { - Concert() {} // keep the private parameterless constructor - - public Concert(string name, Venue venue, DateTimeOffset startTime, TimeSpan duration) + TaskAggregate() { } + + public TaskAggregate(TaskId id, UserId userId, string name, DateTimeOffset deadline) { - // business logic for creating a concert - Apply(new ConcertAnnounced(...)); + if (id is null) throw new ArgumentNullException(nameof(id)); + if (userId is null) throw new ArgumentNullException(nameof(userId)); + if (string.IsNullOrWhiteSpace(name)) throw new ArgumentException("Name is required", nameof(name)); + + Apply(new TaskCreated(id, userId, name, deadline, DateTimeOffset.UtcNow)); } - public void RegisterPerformer(Performer performer) + public void Rename(string newName) { - // business logic for registering a performer - Apply(new PerformerRegistered(...)); + if (string.IsNullOrWhiteSpace(newName)) throw new ArgumentException("Name is required", nameof(newName)); + if (state.Name == newName) return; // idempotent + + Apply(new TaskRenamed(state.Id, state.Name, newName, DateTimeOffset.UtcNow)); + } + + public void Close(UserId closedBy) + { + if (state.IsClosed) return; // already closed + Apply(new TaskClosed(state.Id, closedBy, DateTimeOffset.UtcNow)); } - - // ... } ``` +{% endcode %} -## Aggregate root state +{% hint style="success" %} +**You can / should / must** -The aggregate root state keeps the current data of the aggregate root and is responsible for changing it based on events raised only by the root. +* an aggregate root **must** enforce its invariants before calling `Apply` +* an aggregate root **must** remain synchronous — no I/O, no `async` +* an aggregate root **should** be idempotent — calling the same method twice with the same input produces the same events or none at all +* an aggregate root **must not** reference other aggregates directly; use ports or sagas for cross-aggregate flows +{% endhint %} -Use the abstract helper class `AggregateRootState` to create an aggregate root state. It can be accessed in the aggregate root using the `state` field provided by the base class. Also, you can implement the `IAggregateRootState` interface by yourself in case inheritance is not a viable option. +## Aggregate root state -To change the state of an aggregate root, create event-handler methods for each event with a method signature `public void When(Event e) { ... }`. +Inherit `AggregateRootState`. State exposes the current data, maintains the `Id`, and folds events into itself via `public void When(TEvent)` handlers. Cronus discovers the handlers by reflection once per process lifetime. +{% code title="TaskState.cs" %} ```csharp -public class ConcertState : AggregateRootState +public class TaskState : AggregateRootState { - public ConcertState() + public override TaskId Id { get; set; } + public UserId UserId { get; set; } + public string Name { get; set; } + public DateTimeOffset CreatedAt { get; set; } + public DateTimeOffset Deadline { get; set; } + public bool IsClosed { get; set; } + + public void When(TaskCreated e) { - Performers = new List(); + Id = e.Id; + UserId = e.UserId; + Name = e.Name; + CreatedAt = e.Timestamp; + Deadline = e.Deadline; } - public override ConcertId Id { get; set; } - - public string Name { get; private set; } - - public Venue Venue { get; private set; } - - public DateTimeOffset StartTime { get; private set; } + public void When(TaskRenamed e) => Name = e.NewName; - public TimeSpan Duration { get; private set; } - - public List Performers { get; private set; } - - public void When(ConcertAnnounced @event) - { - // change the state here ... - } - - public void When(PerformerRegistered @event) - { - // change the state here ... - } + public void When(TaskClosed e) => IsClosed = true; } ``` +{% endcode %} {% hint style="info" %} -You could read more about the state pattern [here](https://refactoring.guru/design-patterns/state/csharp/example) and [here](https://www.dofactory.com/net/state-design-pattern). +The state class is an implementation of the **state pattern** — behaviour (validation, event emission) stays in the aggregate root; pure data and event folding stays in the state. Background reading: [Refactoring Guru](https://refactoring.guru/design-patterns/state/csharp/example), [DoFactory](https://www.dofactory.com/net/state-design-pattern). {% endhint %} ## Aggregate root id -All aggregate root ids must implement the `IAggregateRootId` interface. Since Cronus uses [URNs](https://en.wikipedia.org/wiki/Uniform\_Resource\_Name) for ids that will require implementing the [URN specification](https://tools.ietf.org/html/rfc8141) as well. If you don't want to do that, you can use the provided helper base class `AggregateRootId`. +An `AggregateRootId` is a URN with three segments: `tenant`, `aggregateRootName`, and `id`. The base class's primary constructor is: + +```csharp +public AggregateRootId(string tenant, string arName, string id) +``` + +Define your own typed ID to avoid stringly-typed code elsewhere: +{% code title="TaskId.cs" %} ```csharp -[DataContract(Name = "e96d90d0-4943-43f4-8a84-cd90b1217d06")] -public class ConcertId : AggregateRootId +[DataContract(Name = "d5e50e1f-5886-4608-9361-9fe0eb440a6b")] +public class TaskId : AggregateRootId { - const string RootName = "concert"; + TaskId() { } - public ConcertId(AggregateUrn urn) : base(RootName, urn) { } - public ConcertId(string idBase, string tenant) : base(idBase, RootName, tenant) { } - protected ConcertId() { } + public TaskId(string tenant, string arName, string id) : base(tenant, arName, id) { } + + public TaskId(string tenant, string id) : base(tenant, "task", id) { } } ``` +{% endcode %} + +{% hint style="warning" %} +The constructor order is `(tenant, arName, id)`. Older docs referenced a generic `AggregateRootId` base with a different order — that generic form is commented out in the current source and is **not** available. Use the non-generic `AggregateRootId` with `Parse` / `TryParse` for URN hydration. +{% endhint %} -Another option is to use the `AggregateRootId` class. This will give you more flexibility in constructing instances of the id. Also, parsing URNs will return the specified type `T` instead of `AggregateUrn`. +### Parsing a URN back into an ID ```csharp -[DataContract(Name = "e96d90d0-4943-43f4-8a84-cd90b1217d06")] -public class ConcertId : AggregateRootId +if (AggregateRootId.TryParse(urnString, out AggregateRootId parsed)) { - const string RootName = "concert"; + // parsed.Tenant, parsed.AggregateRootName, parsed.Id +} +``` - ConcertId() { } - public ConcertId(string id, string tenant) : base(id, RootName, tenant) { } +## Loading and saving - protected override ConcertId Construct(string id, string tenant) - { - return new ConcertId(id, tenant); - } +An aggregate is persisted through `IAggregateRepository`: + +```csharp +public interface IAggregateRepository +{ + Task SaveAsync(AR aggregateRoot) where AR : IAggregateRoot; + Task> LoadAsync(AggregateRootId id) where AR : IAggregateRoot; } ``` + +Application services are the only place that should call the repository — see: + +{% content-ref url="handlers/application-services.md" %} +[application-services.md](handlers/application-services.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/bounded-context.md b/docs/cronus-framework/domain-modeling/bounded-context.md index 3ddb108b..053fb8b6 100644 --- a/docs/cronus-framework/domain-modeling/bounded-context.md +++ b/docs/cronus-framework/domain-modeling/bounded-context.md @@ -1,16 +1,131 @@ # Bounded Context -[https://github.com/Elders/Cronus/issues/275](https://github.com/Elders/Cronus/issues/275) +In Domain-Driven Design a **bounded context** is the explicit boundary inside which a model is consistent and unambiguous. Two teams working on the same company may use the word "Order" to mean different things — DDD accepts that and asks you to name the context each meaning lives in. For the conceptual foundation, see [`concepts/ddd.md`](../concepts/ddd.md). -Imaginary example: +Cronus turns that concept into a single, load-bearing configuration value: `Cronus:BoundedContext`. -Imagine that you have to build an online store. Until now, the business has been operating locally in a big city and the business has been very successful. The idea is to make it possible for other people outside of the big city to have the same experience which will allow the business to expand and reach a wider customer audience. There are a few questions you have to ask the business or discover somehow from the domain experts. +## One name, many side effects -Q: **What are the key advantages over the direct competition?** +The `Cronus:BoundedContext` setting is **one alphanumeric string**. It prefixes: -A: _We offer unique loyalty programs which enable good discounts to customers. In addition, we have a rich network of suppliers that gives a wide variety of goods to choose from._ +* every RabbitMQ exchange and queue the host declares, +* every Cassandra keyspace the event store and projection store write to, +* the default `Namespace` stored against contracts whose type does not set one explicitly. -Q: **How the online store is going to generate profit?** +Change that string on a running system and you fork your whole store: new queues, new keyspaces, and the old streams become invisible. -A: _Unlocking the loyalty program requires a paid monthly subscription._ +## The POCO +The setting is bound to the `BoundedContext` options class: + +{% code title="Elders.Cronus/BoundedContext.cs" %} +```csharp +public class BoundedContext +{ + [Required(AllowEmptyStrings = false, + ErrorMessage = "The configuration `Cronus:BoundedContext` is required.")] + [RegularExpression(@"^\b([\w\d_]+$)", + ErrorMessage = "Characters are not allowed for configuration `Cronus:BoundedContext`.")] + public string Name { get; set; } + + public override string ToString() => Name; +} + +public class BoundedContextProvider : CronusOptionsProviderBase +{ + public const string SettingKey = "cronus:boundedcontext"; + + public override void Configure(BoundedContext options) + { + options.Name = configuration[SettingKey]?.ToLower()?.Trim(); + } +} +``` +{% endcode %} + +Two rules are enforced by the class itself: + +* The value is **required** — a host will not start without it. +* The value must match `^\b([\w\d_]+$)`: alphanumeric characters and underscores only. No dots, dashes, or spaces. +* `BoundedContextProvider` lower-cases and trims the input, which matches RabbitMQ's case-sensitivity and keeps names canonical. + +Inject `IOptionsMonitor` to read the current value at runtime. + +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "boundedcontext": "billing", + "tenants": [ "acme" ] + } +} +``` +{% endcode %} + +## Bounded context on the wire: the `Namespace` field + +Every message type in Cronus carries a `[DataContract]` attribute. The `Name` parameter is the contract id (a GUID). The `Namespace` parameter is the bounded context the type belongs to. + +{% code title="MessageInfo.cs (extract)" %} +```csharp +public static string GetBoundedContext(this Type messageType, string defaultBoundedContext = "implicit") +{ + string boundedContext; + if (!typeToBoundedContext.TryGetValue(messageType, out boundedContext)) + { + boundedContext = GetAndCacheBoundedContextFromAttribute(messageType, defaultBoundedContext); + } + return boundedContext; +} +``` +{% endcode %} + +If the type sets `Namespace` explicitly, that wins; otherwise the caller's default is used. For a service's own contracts the default is `"implicit"`, meaning "route under this service's own bounded context". For contracts shared with other services, set `Namespace` explicitly so routing stays stable regardless of which host serializes the message. + +A common pattern is to keep a static constants class in your `.Contracts` project: + +{% code title="Elders.IdentityAndAccess.Contracts/BC.cs" %} +```csharp +public static class BC +{ + public const string IdentityAndAccess = "IdentityAndAccess"; +} +``` +{% endcode %} + +and reference it from every contract: + +```csharp +[DataContract(Namespace = BC.IdentityAndAccess, Name = "73ffd67d-b775-4e53-ac87-90de404fc58a")] +public class TenantId : AggregateRootId { /* ... */ } +``` + +That keeps the bounded-context string in exactly one place per contract assembly. + +## Choosing a name + +{% hint style="success" %} +You **can** name your bounded context after the business capability it encapsulates (`billing`, `identityandaccess`, `catalog`). + +You **should** keep the name short — it is concatenated into every RabbitMQ routing key and Cassandra keyspace name in the system. + +You **must** choose the name before the first production message flows through the service. Changing `Cronus:BoundedContext` later will orphan every existing stream, queue, and projection table. +{% endhint %} + +{% hint style="warning" %} +You **should not** include the environment (`billing_prod`, `billing_dev`) in the bounded-context name. Use separate configuration files or deployment environments instead — the bounded context is part of the domain, not the deployment topology. +{% endhint %} + +## Related + +{% content-ref url="../configuration.md" %} +[configuration.md](../configuration.md) +{% endcontent-ref %} + +{% content-ref url="published-language.md" %} +[published-language.md](published-language.md) +{% endcontent-ref %} + +{% content-ref url="../concepts/ddd.md" %} +[ddd.md](../concepts/ddd.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/commands.md b/docs/cronus-framework/domain-modeling/commands.md index 282b0794..58427964 100644 --- a/docs/cronus-framework/domain-modeling/commands.md +++ b/docs/cronus-framework/domain-modeling/commands.md @@ -1,55 +1,13 @@ # Commands -A command is used to dispatch domain model changes. It can be accepted or rejected depending on the domain model invariants. - -## Communication Guide Table - -| Triggered by | Description | -| :---: | :--- | -| UI | It is NOT common practice to send commands directly from the UI. Usually the UI communicates with web APIs | -| API | APIs sit in the middle between UI and Server translating web requests into commands | -| External System | It is NOT common practice to send commands directly from the External System. Usually the External System communicates with web APIs. | -| Port | Ports are a simple way for an aggregate to communicate with other aggregates. | -| Saga | Sagas are a simple way for an aggregate to do complex communication with other aggregates. | -| | | - -## Best Practices - -{% hint style="success" %} -**You can/should/must...** - -* a command **must** be immutable -* a command **must** clearly state a business intent with a name in imperative form -* a command **can** be rejected due to domain validation, error or other reason -* a command **must** update only one Aggregate -{% endhint %} - -## Examples - -```csharp -public class DeactivateAccount : ICommand -{ - DeactivateAccount() {} - public DeactivateAccount(AccountId id, Reason reason) - { - Id = id; - Reason = reason; - } - - public AccountId Id { get; private set; } - public Reason ReasonToDeactivate { get; private set; } -} - -[DataContract(Name = "24c59143-b95e-4fd6-8bbf-8d5efffe3185")] -public class AccountId : StringTenantId -{ - protected AccountId() { } - public AccountId(string id, string tenant) : base(id, "account", tenant) { } - public AccountId(IUrn urn) : base(urn, "account") { } -} - -public class Reason : ValueObject{...} -``` +Commands are documented under _Messages_, not at this level of the tree. This page is kept to avoid breaking existing deep links. Head over to the canonical page: +{% content-ref url="messages/commands.md" %} +[commands.md](messages/commands.md) +{% endcontent-ref %} +For the command handler (application service) see: +{% content-ref url="handlers/application-services.md" %} +[application-services.md](handlers/application-services.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/entity.md b/docs/cronus-framework/domain-modeling/entity.md index 5458858f..df2a6902 100644 --- a/docs/cronus-framework/domain-modeling/entity.md +++ b/docs/cronus-framework/domain-modeling/entity.md @@ -1,67 +1,121 @@ # Entity -An entity is an object that has an identity and is mutable. Each entity is uniquely identified by an ID rather than by its properties; therefore, two entities can be considered equal if both of them have the same ID even though they have different properties. +An **entity** is a domain object with identity. Two entities are equal when their IDs match, regardless of their other properties. In Cronus, an entity lives inside an aggregate and shares the aggregate's consistency boundary — the aggregate root is the only thing that mutates entities directly. -You can define an entity with Cronus using the `Entity` base class. To publish an event from an entity, use the `Apply(IEvent @event)` method provided by the base class. +Use entities when a part of the aggregate needs its own identity-based lifecycle, but that lifecycle is still owned by the root. Common examples are versions inside a document, line items in an order, or rooms inside a building. +## Defining an entity + +Inherit `Entity` and pass the root plus the entity ID into the base constructor. Events are emitted via the protected `Apply(IEvent)` method, just like in the aggregate root — but the event is wrapped in an `EntityEvent` so the right entity's state handler receives it. + +{% code title="Wallet.cs" %} ```csharp public class Wallet : Entity { - public Wallet(UserAggregate root, WalletId entityId, string name, decimal amount) : base(root, entityId) + public Wallet(UserAggregate root, WalletId entityId, string name, decimal openingBalance) : base(root, entityId) { - state.EntityId = entityId; - state.Name = name; - state.Amount = amount; + Apply(new WalletOpened(entityId, name, openingBalance, DateTimeOffset.UtcNow)); } - public void AddMoney(decimal value, UserId userId) + public void AddMoney(decimal amount) { - - if (value > 0) - { - IEvent @event = new AddMoney(state.EntityId, userId, value, DateTimeOffset.UtcNow); - Apply(@event); - } + if (amount <= 0) throw new ArgumentOutOfRangeException(nameof(amount)); + Apply(new MoneyAdded(state.EntityId, amount, DateTimeOffset.UtcNow)); } } ``` +{% endcode %} {% hint style="info" %} -Set the initial state of the entity using the constructor. The event responsible for creating the entity is being published by the root/parent to modify its state. That means that you can not \(and should not\) subscribe to that event in the entity state using `When(Event e)`. +The entity's creation event is emitted by the **root** (not by the entity itself). The root's state handles the "entity created" event to add the new entity to its collection. Inside the entity, use `When(TEvent)` on the entity state only for events that modify the entity _after_ creation. {% endhint %} ## Entity state -The entity state keeps current data of the entity and is responsible for changing it based on events raised only by the same entity. - -Use the abstract helper class `EntityState` to create an entity state. It can be accessed in the entity using the `state`field provided by the base class. Also, you can implement the `IEntityState` interface by yourself in case inheritance is not a viable option. - -To change the state of an entity, create event-handler methods for each event with a method signature `public void When(Event e) { ... }`. +Inherit `EntityState`. Like the aggregate state, the entity state folds its events into itself via `public void When(TEvent e)` handlers. +{% code title="WalletState.cs" %} ```csharp public class WalletState : EntityState { public override WalletId EntityId { get; set; } - public string Name { get; set; } + public decimal Balance { get; set; } + + public void When(WalletOpened e) + { + EntityId = e.EntityId; + Name = e.Name; + Balance = e.OpeningBalance; + } - public decimal Amount { get; set; } + public void When(MoneyAdded e) => Balance += e.Amount; } ``` +{% endcode %} -## Entity id +## Entity ID -All entity ids must implement the `IEntityId` interface. Since Cronus uses [URNs](https://en.wikipedia.org/wiki/Uniform_Resource_Name) for ids that will require implementing the [URN specification](https://tools.ietf.org/html/rfc8141) as well. If you don't want to do that, you can use the provided helper base class `EntityId`. +Inherit the generic `EntityId`. The base ctor takes the ID base plus the parent aggregate's ID, and you must override `EntityName`: +```csharp +public abstract class EntityId : EntityId + where TAggregateRootId : AggregateRootId +{ + public EntityId(ReadOnlySpan idBase, TAggregateRootId rootId); + + protected abstract ReadOnlySpan EntityName { get; } +} +``` + +{% code title="WalletId.cs" %} ```csharp [DataContract(Name = "1d23c591-219f-491e-bfb1-a775fe2751b6")] public class WalletId : EntityId { + WalletId() { } + + public WalletId(string id, UserId rootId) : base(id.AsSpan(), rootId) { } + protected override ReadOnlySpan EntityName => "wallet"; +} +``` +{% endcode %} - WalletId() { } +The URN of a wallet then looks like: `urn:tenant:user:/wallet:` — you can see the entity name appears after the hierarchical delimiter `/`. + +{% hint style="success" %} +**You can / should / must** + +* an entity **must** live inside exactly one aggregate +* an entity **must not** be loaded or saved independently — the aggregate owns the lifecycle +* an entity **should** have its own behaviour methods; moving logic to the root defeats the purpose +* an entity ID **must** include its parent aggregate's URN so the entity cannot be confused across aggregates +{% endhint %} + +## Wiring the entity into the aggregate + +The aggregate's state reacts to the "entity created" event, constructs the entity, and adds it to its collection. From that point on, the root can route calls to the entity's methods. - public WalletId(string id, UserId idBase) : base(id.AsSpan(), idBase) { } +```csharp +public class UserAggregate : AggregateRoot +{ + public void OpenWallet(WalletId walletId, string name, decimal openingBalance) + { + new Wallet(this, walletId, name, openingBalance); // ctor applies WalletOpened + } + + public void AddToWallet(WalletId walletId, decimal amount) + { + var wallet = state.Wallets.First(w => w.EntityId.Equals(walletId)); + // In a real aggregate, state exposes wallets as live entities, not just state rows. + // See the handlers pages for the full pattern. + } } ``` +See the aggregate page for the companion pieces (root, state, ID): + +{% content-ref url="aggregate.md" %} +[aggregate.md](aggregate.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/events.md b/docs/cronus-framework/domain-modeling/events.md index 229fca85..768d0d91 100644 --- a/docs/cronus-framework/domain-modeling/events.md +++ b/docs/cronus-framework/domain-modeling/events.md @@ -1,41 +1,7 @@ # Events -Domain events represent business changes which already happened. - -## Communication Guide Table - -| Triggered by | Description | -| :--- | :--- | -| AggregateRoot | TODO | - -{% hint style="success" %} -**You can/should/must...** - -* an event **must** be immutable -* an event **must** represent a domain event which already happened with a name in past tense -* an event **can** be dispatched only by one aggregate -{% endhint %} - -## Examples - -```csharp -[DataContract(Name = "fff400a3-1af0-4332-9cf5-b86c1c962a01")] -public class AccountSuspended : IEvent -{ - AccountSuspended() { } - - public AccountSuspended(AccountId id) - { - Id = id; - } - - [DataMember(Order = 1)] - public AccountId Id { get; private set; } - - public override string ToString() - { - return "Account was suspended"; - } -} -``` +Events are documented under _Messages_, not at this level of the tree. This page is kept to avoid breaking existing deep links. Head over to the canonical page: +{% content-ref url="messages/events.md" %} +[events.md](messages/events.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/handlers/README.md b/docs/cronus-framework/domain-modeling/handlers/README.md index ca8c495f..141e534d 100644 --- a/docs/cronus-framework/domain-modeling/handlers/README.md +++ b/docs/cronus-framework/domain-modeling/handlers/README.md @@ -1,2 +1,64 @@ # Handlers +A handler is a class that reacts to an incoming message and produces some outcome. In Cronus, handlers come in six shapes, each with a different job. + +Every handler implements the marker interface [`IMessageHandler`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/IMessageHandler.cs). Beyond that, the kind of handler determines which types of messages it listens to and what it is allowed to do in response. + +## The six kinds + +| Handler | Reacts to | Produces | Side effects allowed? | +| --- | --- | --- | --- | +| [Application Service](application-services.md) | `ICommand` | New events on a single aggregate | No — should not perform I/O outside the event store | +| [Projection](projections.md) | `IEvent` | A read model (snapshot or external store) | No — must not publish commands or events | +| [Saga](sagas.md) | `IEvent`, `IScheduledMessage` | New `ICommand` messages; scheduled timeouts | No business-facing side effects — coordinate aggregates | +| [Port](ports.md) | `IEvent` | New `ICommand` messages | Yes — the classic "send email", "call external API" place | +| [Trigger](triggers.md) | `IEvent`, `ISignal` | Anything — typically starts a job or a downstream workflow | Yes | +| [Gateway](gateways.md) | `IEvent` | New `ICommand` messages, with tracked infrastructure state | Yes — owns metadata required by an external system | + +## Choosing a handler + +The choice is about intent, not capability. + +* Use an **Application Service** when a command must mutate one aggregate root. +* Use a **Projection** when you need a queryable read model built from the event stream. +* Use a **Saga** when the outcome of one event must lead to new commands that coordinate several aggregates. +* Use a **Port** when the outcome is a single follow-up command to another aggregate, or when you need a simple "when X happens, call the outside world" reaction. +* Use a **Trigger** when an event or signal should kick off a long-running job or workflow. +* Use a **Gateway** when you need a Port with a small amount of persistent infrastructure state — for example, last-known device token badges. + +## Subscriber toggles + +Each kind of handler is served by a dedicated subscriber that can be enabled or disabled per host process via [`CronusHostOptions`](../../configuration.md): + +* `Cronus:ApplicationServicesEnabled` +* `Cronus:ProjectionsEnabled` +* `Cronus:SagasEnabled` +* `Cronus:PortsEnabled` +* `Cronus:TriggersEnabled` +* `Cronus:GatewaysEnabled` + +All default to `true`. Turn one off when you want to split a monolith into specialised processes — for example, a projections host that never runs ports or sagas. + +{% content-ref url="application-services.md" %} +[application-services.md](application-services.md) +{% endcontent-ref %} + +{% content-ref url="projections.md" %} +[projections.md](projections.md) +{% endcontent-ref %} + +{% content-ref url="sagas.md" %} +[sagas.md](sagas.md) +{% endcontent-ref %} + +{% content-ref url="ports.md" %} +[ports.md](ports.md) +{% endcontent-ref %} + +{% content-ref url="triggers.md" %} +[triggers.md](triggers.md) +{% endcontent-ref %} + +{% content-ref url="gateways.md" %} +[gateways.md](gateways.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/handlers/application-services.md b/docs/cronus-framework/domain-modeling/handlers/application-services.md index e65e3cc2..e5ab0608 100644 --- a/docs/cronus-framework/domain-modeling/handlers/application-services.md +++ b/docs/cronus-framework/domain-modeling/handlers/application-services.md @@ -1,61 +1,112 @@ # Application Services -This is a handler where commands are received and delivered to the addressed aggregate. Such a handler is called an application service. This is the "_write"_ side in [CQRS](../../concepts/cqrs.md). +An **application service** is the write-side entry point of an aggregate. It is the handler where commands are received and translated into operations on a single aggregate root. This is the "write" side in [CQRS](../../concepts/cqrs.md). -An application service is a command handler for a specific aggregate. One aggregate has one application service whose purpose is to orchestrate how commands will be fulfilled. Its the application service's responsibility to invoke the appropriate aggregate methods and pass the command's payload. It mediates between Domain and infrastructure and it shields any domain model from the "_outside_". Only the application service __ interacts with the domain model. +An application service orchestrates a command: it loads (or creates) the aggregate, calls the right method on it, and saves the resulting events. It is the only place that bridges the infrastructure (the command bus, the repository) and the domain model. Nothing outside of an application service should ever touch the aggregate directly. {% content-ref url="../aggregate.md" %} [aggregate.md](../aggregate.md) {% endcontent-ref %} -You can create an application service with Cronus by using the `AggregateRootApplicationService` base class. Specifying which commands the application service can handle is done using the `ICommandHandler` interface. +## Defining an application service -`AggregateRootApplicationService` provides a property of type `IAggregateRepository` that you can use to load and save the aggregate state. There is also a helper method `Update(IAggregateRootId id, Action update)` that loads and aggregate based on the provided id invokes the action and saves the new state if there are any changes. +Inherit from `ApplicationService` where `AR` is the aggregate root type. Implement `ICommandHandler` for every command you want to handle. The base class gives you a protected `repository` field of type `IAggregateRepository` and a `UpdateAsync` helper that loads, mutates, and saves in one call. ```csharp -public class ConcertAppService : AggregateRootApplicationService, - ICommandHandler, - ICommandHandler +public abstract class ApplicationService : IApplicationService where AR : IAggregateRoot { - ... - - public void Handle(AnnounceConcert command) + protected readonly IAggregateRepository repository; + + public ApplicationService(IAggregateRepository repository) { ... } + + public virtual async Task UpdateAsync(AggregateRootId id, Action update) { ... } +} +``` + +`ICommandHandler` requires one async method: + +```csharp +public interface ICommandHandler where T : ICommand +{ + Task HandleAsync(T command); +} +``` + +## A canonical application service + +{% code title="TaskAppService.cs" %} +```csharp +public class TaskAppService : ApplicationService, + ICommandHandler, + ICommandHandler, + ICommandHandler +{ + public TaskAppService(IAggregateRepository repository) : base(repository) { } + + public async Task HandleAsync(CreateTask command) { - if (Repository.TryLoad(command.Id, out _)) - return; + ReadResult result = await repository.LoadAsync(command.Id).ConfigureAwait(false); + if (result.IsSuccess) + return; // already created — commands are idempotent - var concert = new Concert(...); - Repository.Save(concert); + var task = new TaskAggregate(command.Id, command.UserId, command.Name, command.Deadline); + await repository.SaveAsync(task).ConfigureAwait(false); } - - public void Handle(RegisterPerformer command) + + public Task HandleAsync(RenameTask command) { - Update(command.Id, x => x.RegisterPerformer(...)); + return UpdateAsync(command.Id, task => task.Rename(command.NewName)); } - ... + public Task HandleAsync(CloseTask command) + { + return UpdateAsync(command.Id, task => task.Close(command.ClosedBy)); + } } ``` +{% endcode %} + +### When to load, when to create + +Use the explicit `LoadAsync` + `SaveAsync` pattern when the command can create the aggregate. `ReadResult` exposes `IsSuccess`, `NotFound` and `HasError`: + +```csharp +ReadResult result = await repository.LoadAsync(command.Id); +if (result.NotFound) +{ + var task = new TaskAggregate(...); + await repository.SaveAsync(task); +} +``` + +Use `UpdateAsync(id, action)` when you are 100% sure the aggregate already exists. It throws if the load fails and saves automatically on success: + +```csharp +await UpdateAsync(command.Id, task => task.Rename(command.NewName)); +``` + +{% hint style="info" %} +The `update` delegate passed to `UpdateAsync` is **synchronous** (`Action`). The aggregate's methods are expected to be synchronous — they compute and `Apply` events without any I/O. If you need async work, do it _before_ calling `UpdateAsync`, not inside the delegate. +{% endhint %} ## Best Practices {% hint style="success" %} -**You can/should/must...** +**You can / should / must** * an application service **can** load an aggregate root from the event store * an application service **can** save new aggregate root events to the event store -* an application service **can** establish calls to the read model (not a common practice but sometimes needed) -* an application service **can** establish calls to external services -* you **can** do dependency orchestration -* an application service **must** be stateless -* an application service **must** update only one aggregate root. Yes, you can create one aggregate and update another one but think twice before doing so. +* an application service **can** read from a projection to resolve a missing piece of context (not common — think twice) +* an application service **can** call an external service _before_ mutating the aggregate (e.g. to resolve an ID) +* an application service **must** be stateless — no fields beyond the injected repository +* an application service **must** update only one aggregate per command {% endhint %} {% hint style="warning" %} -**You should not...** +**You should not** -* an application service **should not** update more than one aggregate root in a single command/handler -* you **should not** place domain logic inside an application service -* you **should not** use an application service to send emails, push notifications etc. Use a port or a gateway instead -* an application service **should not** update the read model +* an application service **should not** mutate more than one aggregate in a single `HandleAsync` call — use a [saga](sagas.md) instead +* an application service **should not** contain domain logic — keep decisions inside the aggregate +* an application service **should not** send emails, push notifications, or fire HTTP calls — use a [port](ports.md) or a [gateway](gateways.md) reacting to the resulting events +* an application service **should not** update a projection — projections are read models that rebuild themselves from events {% endhint %} diff --git a/docs/cronus-framework/domain-modeling/handlers/gateways.md b/docs/cronus-framework/domain-modeling/handlers/gateways.md index 4dbcfe9d..53f9d6f6 100644 --- a/docs/cronus-framework/domain-modeling/handlers/gateways.md +++ b/docs/cronus-framework/domain-modeling/handlers/gateways.md @@ -1,19 +1,73 @@ # Gateways -[https://github.com/Elders/Cronus/issues/260](https://github.com/Elders/Cronus/issues/260) +A gateway is a port that remembers things. It reacts to events and talks to an external system, just like a port, but in addition it maintains a small amount of persistent state that the external system — not the business — needs. -Compared to a Port, which can dispatch a command, a Gateway can do the same but it also has a persistent state. A scenario could be sending commands to external BC, such as push notifications, emails, etc. There is no need to event source this state and it's perfectly fine if this state is wiped. Example: iOS push notifications badge. This state should be used only for infrastructure needs and never for business cases. Compared to Projection, which tracks events, projects their data, and is not allowed to send any commands at all, a Gateway can store and track metadata required by external systems. Furthermore, Gateways are restricted and not touched when events are replayed. +In Cronus, a gateway is any class that implements [`IGateway`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/IGateway.cs): -## Communication Guide Table +```csharp +public interface IGateway : IMessageHandler { } +``` -| Triggered by | Description | -| ------------ | -------------------------------------------------------------------- | -| Event | Domain events represent business changes which have already happened | +The marker interface is minimal. A gateway earns its behaviour through [`IEventHandler`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/IEventHandler.cs) implementations, just like a port or a saga. + +## Gateway vs Port vs Projection + +The XML doc on `IGateway` sums up the niche it fills: + +> Compared to `IPort`, which can dispatch a command, an `IGateway` can do the same but it also has a persistent state. A scenario could be sending commands to external BC like push notifications, emails etc. There is no need to event source this state and it's perfectly fine if this state is wiped. Example: iOS push notifications badge. This state should be used only for infrastructure needs and never for business cases. Compared to projections, which track events and project their data and are not allowed to send any commands at all, an `IGateway` store and track a metadata required by external systems. Also, `IGateway` are restricted and not touched when events are replayed. + +Three rules follow from that: + +1. A gateway is the right choice when the **external system** needs state (an APNS badge count, a last-seen device id, a third-party handle). Business state belongs on aggregates. +2. A gateway is **not replayed**. Unlike projections, gateway state survives a projection rebuild. That makes it safe to store values that are meaningful only in combination with live external resources. +3. A gateway, unlike a projection, **may publish commands**. That is why it is shaped more like a port than a projection. + +## Why a gateway encapsulates retries and serialisation + +Calling an external service by hand from an application service is a trap: you either hand-roll retry, timeout and serialisation logic every time, or you forget it. A gateway is the canonical place to centralise those concerns for a given external system, so the application services above it stay clean. + +In practice a gateway wraps the SDK of the external system (HTTP client, gRPC stub, legacy RPC proxy) and applies the retry policy you want for that system's failure modes. + +## Example + +```csharp +public class SampleGateway : IGateway, + IEventHandler +{ + public Task HandleAsync(SampleReserved @event) + { + Console.WriteLine($"Sample with ID: '{@event.Id}' was reserved!"); + return Task.CompletedTask; + } +} +``` + +A realistic gateway would inject the external-system client plus whatever persistent store holds the infrastructure metadata (for example a table of APNS device tokens and current badge counts). + +## Configuration + +The subscriber that dispatches events to gateways is toggled by `Cronus:GatewaysEnabled` (default: `true`). Disable it on hosts that should not talk to the external system, for instance when you split the host into a write-only and a read-only process. + +{% content-ref url="../../configuration.md" %} +[configuration.md](../../configuration.md) +{% endcontent-ref %} ## Best Practices {% hint style="success" %} **You can/should/must...** -* a gateway **can** send new commands +* a gateway **can** publish new commands +* a gateway **can** call external services and SDKs +* a gateway **can** maintain persistent infrastructure state +* a gateway **must** tolerate the external service being slow or down +* a gateway **must** be idempotent — the same event may arrive more than once +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* a gateway **should not** hold business state — put that on an aggregate +* a gateway **should not** be touched during event replay — its state must not depend on event order +* a gateway **should not** be used when a plain port is enough; only reach for a gateway when the external system needs to persist metadata {% endhint %} diff --git a/docs/cronus-framework/domain-modeling/handlers/ports.md b/docs/cronus-framework/domain-modeling/handlers/ports.md index 3e034d5f..bc3c3383 100644 --- a/docs/cronus-framework/domain-modeling/handlers/ports.md +++ b/docs/cronus-framework/domain-modeling/handlers/ports.md @@ -1,36 +1,105 @@ # Ports in the Cronus Framework -In the Cronus framework, **Ports** facilitate communication between aggregates, enabling one aggregate to react to events triggered by another. This design promotes a decoupled architecture, allowing aggregates to interact through well-defined events without direct dependencies. +A port is the place where an event from the domain meets the outside world. It reacts to events and does I/O: sending emails, calling third-party APIs, writing to disk, pushing a message to another system, or publishing a follow-up command to a related aggregate. -## Key Characteristics of Ports +In Cronus, a port is any class that implements [`IPort`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/IPort.cs): -- **Event-Driven Communication:** Ports listen for domain events—representing business changes that have already occurred—and dispatch corresponding commands to other aggregates that need to respond. +```csharp +public interface IPort : IMessageHandler { } -- **Statelessness:** Ports do not maintain any persistent state. Their sole responsibility is to handle the routing of events to appropriate command handlers. +public abstract class Port : IPort +{ + protected readonly IPublisher publisher; -## When to Use Ports + public Port(IPublisher publisher) + { + this.publisher = publisher; + } +} +``` -Ports are ideal for straightforward interactions where an event from one aggregate necessitates a direct response from another. However, for more complex workflows involving multiple steps or requiring state persistence, implementing a **Saga** is recommended. Sagas provide a transparent view of the business process and manage the state across various interactions, ensuring consistency and reliability. +The `Port` base class gives you a command publisher for convenience; you are not required to use it. What makes a class a port is the `IPort` marker plus one or more [`IEventHandler`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/IEventHandler.cs) implementations. -## Communication Guide Table +## Why a port and not an application service? -| Triggered by | Description | -|--------------|-------------------------------------------------------| -| Event | Domain events represent business changes that have already happened. | +Keep Application Services focused on a single aggregate and free of I/O. That leaves the domain model fast, testable and replayable. All the messy things — latency, failure, retries — belong in a port. -By utilizing Ports appropriately, developers can design systems that are both modular and maintainable, adhering to the principles of Domain-Driven Design and Event Sourcing. +A port is a natural retry boundary. If the third-party API is down, the port fails, the message is not acknowledged, and the subscriber delivers it again later. If the same side effect lived in an application service, a transient failure would also block the aggregate's write path. + +## Port vs Saga + +Both react to events; both can publish commands. The difference is intent: + +* A **port** reaches out. Its reason to exist is the side effect — call the outside world, or fan out a single follow-up command. +* A **saga** orchestrates. Its reason to exist is a multi-step business process that spans several aggregates. -**Port example** +Use a port when the reaction is a one-shot "when X happens, do Y". Use a saga when Y is followed by Z and possibly a compensating W. + +## Example ```csharp [DataContract(Name = "a44e9a38-ab13-4f86-844a-86fefa925b53")] -public class AlertPort : IPort, +public class WelcomeEmailPort : IPort, IEventHandler { + private readonly IEmailSender emailSender; + + public WelcomeEmailPort(IEmailSender emailSender) + { + this.emailSender = emailSender; + } + public Task HandleAsync(UserCreated @event) { - //Implement your custom logic here - return Task.CompletedTask; + return emailSender.SendAsync(@event.Email, "Welcome", $"Hi {@event.Name}, welcome aboard."); + } +} +``` + +And a port that reacts to an event by issuing a command to a different aggregate: + +```csharp +public class RegisterUserOnAccountRegistered : IPort, + IEventHandler +{ + private readonly IPublisher commandPublisher; + + public RegisterUserOnAccountRegistered(IPublisher commandPublisher) + { + this.commandPublisher = commandPublisher; + } + + public Task HandleAsync(AccountRegistered @event) + { + var userId = new UserId(@event.Id.Tenant, @event.Id.Id); + return commandPublisher.PublishAsync(new CreateUser(userId, @event.Email)); } } -``` \ No newline at end of file +``` + +## Configuration + +The subscriber that dispatches events to ports is toggled by `Cronus:PortsEnabled` (default: `true`). Turn it off on processes that must not perform outbound side effects, for example a read-only replica or a dedicated projections host. + +{% content-ref url="../../configuration.md" %} +[configuration.md](../../configuration.md) +{% endcontent-ref %} + +By utilizing Ports appropriately, developers can design systems that are both modular and maintainable, adhering to the principles of Domain-Driven Design and Event Sourcing. + +{% hint style="success" %} +**You can/should/must...** + +* a port **can** call external services (HTTP, SMTP, push services, file system, etc.) +* a port **can** publish new commands +* a port **must** be idempotent — the same event may be delivered more than once +* a port **must** tolerate the external service being slow or down — let the subscriber retry +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* a port **should not** load or mutate aggregate state — publish a command instead +* a port **should not** maintain persistent business state — use a gateway or a projection instead +* a port **should not** orchestrate a multi-step process — use a saga +{% endhint %} diff --git a/docs/cronus-framework/domain-modeling/handlers/projections.md b/docs/cronus-framework/domain-modeling/handlers/projections.md index fa09b116..7aa3e0c3 100644 --- a/docs/cronus-framework/domain-modeling/handlers/projections.md +++ b/docs/cronus-framework/domain-modeling/handlers/projections.md @@ -1,174 +1,181 @@ # Projections -A projection is a representation of an object using a different perspective. In the context of CQRS, projections are queryable models on the "_read_" side that never manipulate the original data (events in event-sourced systems) in any way. Projections should be designed in a way that is useful and convenient for the reader (API, UI, etc.). +A **projection** is a read model. It is a derived representation of the events in the event store, shaped for the way a particular reader wants to consume data — an API endpoint, a dashboard, a search index. A projection never mutates the source of truth; it only applies events to its own state. -Cronus supports non-event-sourced and event-sourced projections with snapshots. +Cronus supports two flavours: -## Defining a projection +* **Event-sourced projections** — inherit `ProjectionDefinition` and subscribe to the events they care about. Their state is rebuilt by replaying events, optionally from a stored snapshot. +* **Non-event-sourced projections** — implement `IProjection` directly and persist state to an external store (for example Elasticsearch or a relational database). -To create a projection, create a class for it that inherits `ProjectionDefinition`. The id can be any type that implements the `IBlobId` interface. All ids provided by Cronus implement this interface but it is common to create your own for specific business cases. The `ProjectionDefinition` base class provides a `Subscribe()` the method that is used to create a projection id from an event. This will define an event-sourced projection with a state that will be used to persist snapshots. +## Defining an event-sourced projection -Use the `IEventHandler` interface to indicate that the projection can handle events of the specified event type. Implement this interface for each event type your projection needs to handle. +Inherit `ProjectionDefinition` where `TState` is a serializable plain class and `TId` is any type that implements `IBlobId`. Declare `IEventHandler` once per event you want to handle. In the constructor, call `Subscribe(event => projectionId)` for every event — that's how Cronus knows which projection instance a given event belongs to. +{% code title="TaskProjection.cs" %} ```csharp [DataContract(Name = "c94513d1-e5ee-4aae-8c0f-6e85b63a4e03")] -public class TaskProjection : ProjectionDefinition, - IEventHandler +public class TaskProjection : ProjectionDefinition, + IEventHandler, + IEventHandler { public TaskProjection() { - Subscribe(x => new TaskId(x.Id.NID)); + Subscribe(e => e.UserId); + Subscribe(e => e.UserId); } public Task HandleAsync(TaskCreated @event) { - Data task = new Data(); - - task.Id = @event.Id; - task.UserId = @event.UserId; - task.Name = @event.Name; - task.Timestamp = @event.Timestamp; - - State.Tasks.Add(task); - + State.Tasks.Add(new TaskProjectionState.Entry + { + Id = @event.Id, + Name = @event.Name, + CreatedAt = @event.Timestamp, + IsClosed = false + }); return Task.CompletedTask; } - public IEnumerable GetTaskByName(string name) + + public Task HandleAsync(TaskClosed @event) { - return State.Tasks.Where(x => x.Name.Equals(name)); + var entry = State.Tasks.FirstOrDefault(x => x.Id.Equals(@event.Id)); + if (entry is not null) entry.IsClosed = true; + return Task.CompletedTask; } + + public IEnumerable TasksByName(string name) + => State.Tasks.Where(x => x.Name.Equals(name, StringComparison.OrdinalIgnoreCase)); } ``` +{% endcode %} -Create a class for the projection state. The state of the projection gets serialized and deserialized when persisting or restoring a snapshot. That's why it must have a parameterless constructor, a data contract and data members. - -{% content-ref url="../../messaging/serialization.md" %} -[serialization.md](../../messaging/serialization.md) -{% endcontent-ref %} - +{% code title="TaskProjectionState.cs" %} ```csharp [DataContract(Name = "c135893e-b9e3-453a-b0e0-53545094ec5d")] -public class TaskProjectionData +public class TaskProjectionState { - public TaskProjectionData() - { - Tasks = new List(); - } + public TaskProjectionState() { Tasks = new List(); } [DataMember(Order = 1)] - public List Tasks { get; set; } + public List Tasks { get; set; } [DataContract(Name = "317b3cbb-593a-4ffc-8284-d5f5c599d8ae")] - public class Data + public class Entry { - [DataMember(Order = 1)] - public TaskId Id { get; set; } - - [DataMember(Order = 2)] - public UserId UserId { get; set; } - - [DataMember(Order = 3)] - public string Name { get; set; } - - [DataMember(Order = 4)] - public DateTimeOffset CreatedAt { get; set; } - - [DataMember(Order = 5)] - public DateTimeOffset Timestamp { get; set; } + [DataMember(Order = 1)] public TaskId Id { get; set; } + [DataMember(Order = 2)] public string Name { get; set; } + [DataMember(Order = 3)] public DateTimeOffset CreatedAt { get; set; } + [DataMember(Order = 4)] public bool IsClosed { get; set; } } } ``` +{% endcode %} + +The state class is serialised into a snapshot, so it needs a parameterless constructor, a `[DataContract]` attribute, and `[DataMember(Order = n)]` on every persisted property — the same rules as for events. + +{% content-ref url="../../messaging/serialization.md" %} +[serialization.md](../../messaging/serialization.md) +{% endcontent-ref %} {% hint style="info" %} -There is no guarantee the events will be handled in the order of publishing nor that every event will be handled at most once. That's why you should design projections in a way that solves those problems. Always assign all possible properties from the handled event to the state and make sure the projection is idempotent. +`Subscribe(...)` returns the projection ID for the given event. One event can be routed to many projection IDs (return an array via chained calls), and one projection ID can aggregate many events. {% endhint %} -{% hint style="info" %} -If the projection state contains a collection, make sure it doesn't get populated with duplicates. This can be achieved by using a `HashSet` and `ValueObject`. +## Idempotency + +Cronus does not guarantee that an event is delivered exactly once, and out-of-order delivery can happen during catch-up or rebuild. Design your projection accordingly: + +{% hint style="success" %} +**You can / should / must** + +* a projection **must** be idempotent — applying the same event twice must not produce different state +* a projection **should** store every piece of data an event carries so rebuilding does not require another query +* a projection **must not** issue new commands or publish new events +* you **should** guard against duplicate entries in collections (e.g. a `HashSet` with value-equality) +{% endhint %} + +{% hint style="warning" %} +**You should not** + +* a projection **should not** query other projections — derive everything from the events it subscribes to +* a projection **should not** perform external I/O inside `HandleAsync` {% endhint %} -You can define a non-event-sourced projection by decorating it with the `IProjection` interface. This is useful when you want to persist the state in an external system (e.g. ElasticSearch, relational database). +## Non-event-sourced projection + +Sometimes the read model lives in an external store that tracks its own consistency — a SQL database via EF Core, for example. Implement `IProjection` directly and skip the snapshot machinery: ```csharp -// TODO: give a relevant example [DataContract(Name = "af157a4d-7608-4c9d-8e42-63bd483a8ad4")] -public class ExampleEfProjection : IProjection, - IEventHandler +public class TaskSqlProjection : IProjection, + IEventHandler { - public DbContext Context { get; set; } + private readonly TaskDbContext db; + + public TaskSqlProjection(TaskDbContext db) + { + this.db = db; + } - public void Handle(ExampleCreated @event) + public async Task HandleAsync(TaskCreated @event) { - var exampleDto = new ExampleDto(@event.Id, @event.Name); - Context.Examples.Add(exampleDto); - Context.SaveChanges(); + db.Tasks.Add(new TaskRow(@event.Id.Value, @event.Name, @event.Timestamp)); + await db.SaveChangesAsync().ConfigureAwait(false); } } ``` -By default, all projections' states are being persisted as snapshots. If you want to disable this feature for a specific projection, use the `IAmNotSnapshotable` interface. +You own the write path; Cronus only fans the event into your handler. + +## Querying a projection + +Inject `IProjectionReader` and call `GetAsync(id)`. The returned `ReadResult` tells you whether the projection was found and whether there was an error. ```csharp -// TODO: give a relevant example -[DataContract(Name = "bae8bd10-9903-4960-95c4-b4fa4688a860")] -public class ExampleByIdProjection : ProjectionDefinition, - IEventHandler, - IAmNotSnapshotable +public interface IProjectionReader { - // ... + Task> GetAsync(IBlobId projectionId) where T : IProjectionDefinition; + Task> GetAsOfAsync(IBlobId projectionId, DateTimeOffset timestamp) where T : IProjectionDefinition; + + Task> GetAsync(IBlobId projectionId, Type projectionType); + Task> GetAsOfAsync(IBlobId projectionId, Type projectionType, DateTimeOffset timestamp); } ``` -## Querying a projection - -To query a projection, you need to inject an instance of `IProjectionReader` in your code and invoke the `Get()` or `GetAsync()` method. The returned object will be of type `ReadResult` or `Task` containing the projection and a few properties indicating if the loading was successful. - +{% code title="TaskQueryController.cs" %} ```csharp -public class GetExampleController : ControllerBase +[ApiController] +[Route("[controller]/[action]")] +public class TaskQueryController : ControllerBase { - private IProjectionReader projectionReader; - - public GetExampleController(IProjectionReader projectionReader) + private readonly IProjectionReader reader; + + public TaskQueryController(IProjectionReader reader) { - this.projectionReader = projectionReader; + this.reader = reader; } - public async Task GetExample(GetExampleRequest request) + [HttpGet] + public async Task GetByUser(string tenant, string userId) { - var id = ExampleId.New(request.Tenant, request.Id); - var result = await projectionReader.GetAsync(id); - if (result.IsSuccess) - return Ok(new GetExampleResponse(result.Data.State)); - else - return BadRequest(result.Error); - } + var id = new UserId(tenant, "user", userId); + ReadResult result = await reader.GetAsync(id).ConfigureAwait(false); - public class GetExampleResponse - { - // ... - } + if (result.NotFound) return NotFound(); + if (result.HasError) return Problem(result.Error); + + return Ok(result.Data.State.Tasks); + } } ``` +{% endcode %} + +`GetAsOfAsync` rebuilds the projection as it was at a given point in time — useful for time-travel queries and diagnostics. {% hint style="info" %} -Use separate models for the API responses from the projection states to ensure you won't introduce breaking changes if the projection gets modified. +Expose API DTOs separately from your projection state. If you return `result.Data.State` directly, renaming a property breaks the API contract. Map to a response DTO first. {% endhint %} ## Projection versioning -TODO - -## Best Practices - -{% hint style="success" %} -**You can/should/must...** - -* a projection **must** be idempotent -* a projection **must not** issue new commands or events -{% endhint %} - -{% hint style="warning" %} -**You should not...** - -* a projection **should not** query other projections. All the data of a projection must be collected from the Events' data -{% endhint %} +When a projection's shape changes (new event subscription, different state schema, new aggregation) Cronus treats it as a new _version_: the old version keeps serving reads while the new one rebuilds in the background, then traffic switches over atomically. That flow is covered in the projections versioning documentation. diff --git a/docs/cronus-framework/domain-modeling/handlers/sagas.md b/docs/cronus-framework/domain-modeling/handlers/sagas.md index 2f03fab0..95c4fc19 100644 --- a/docs/cronus-framework/domain-modeling/handlers/sagas.md +++ b/docs/cronus-framework/domain-modeling/handlers/sagas.md @@ -4,81 +4,139 @@ description: Sometimes called a Process Manager # Sagas in the Cronus Framework -In the Cronus framework, **Sagas**—also known as **Process Managers**—are designed to handle complex workflows that span multiple aggregates. They provide a centralized mechanism to coordinate and manage long-running business processes, ensuring consistency and reliability across the system. +A saga is a message handler that watches the event stream and, as a reaction, publishes new commands. Its job is to coordinate a business process that spans more than one aggregate, or to schedule work for the future. -## Key Characteristics of Sagas +In Cronus, a saga is any class that implements [`ISaga`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/ISaga.cs). Nearly every real saga derives from the abstract `Saga` base class in the same file, which wires up the two publishers a saga needs: -- **Event-Driven Coordination:** Sagas listen for domain events, which represent business changes that have already occurred, and react accordingly to drive the process forward. - -- **State Management:** Unlike simple event handlers, Sagas maintain state to track the progress of the workflow, enabling them to handle complex scenarios and ensure that all steps are completed successfully. - -- **Command Dispatching:** Sagas can send new commands to aggregates or other components, orchestrating the necessary actions to achieve the desired business outcome. - -## When to Use Sagas +```csharp +public abstract class Saga : ISaga +{ + protected readonly IPublisher commandPublisher; + protected readonly IPublisher timeoutRequestPublisher; -Sagas are particularly useful when dealing with processes that: + public Saga(IPublisher commandPublisher, IPublisher timeoutRequestPublisher) + { + this.commandPublisher = commandPublisher ?? throw new ArgumentNullException(nameof(commandPublisher)); + this.timeoutRequestPublisher = timeoutRequestPublisher ?? throw new ArgumentNullException(nameof(timeoutRequestPublisher)); + } -- Involve multiple aggregates or bounded contexts. + public Task RequestTimeoutAsync(T timeoutMessage) where T : IScheduledMessage + { + return timeoutRequestPublisher.PublishAsync(timeoutMessage, timeoutMessage.PublishAt); + } +} +``` -- Require coordination of several steps or actions. +A saga reacts to events through the standard [`IEventHandler`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/IEventHandler.cs) interface. On top of that, it can receive scheduled messages through [`ISagaTimeoutHandler`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/ISaga.cs), where `T : IScheduledMessage`. -- Need to handle compensating actions in case of failures to maintain consistency. +## Saga vs Application Service -By encapsulating the workflow logic within a Saga, developers can manage complex business processes more effectively, ensuring that all parts of the system work together harmoniously. +Both are message handlers, but they sit on opposite sides of the command/event boundary. -## Communication Guide Table +| | Application Service | Saga | +| --- | --- | --- | +| Reacts to | `ICommand` | `IEvent`, `IScheduledMessage` | +| Produces | New events on one aggregate | New `ICommand` messages | +| Loads aggregate state? | Yes — from the event store | No | +| Purpose | Fulfil a single command | Orchestrate a multi-aggregate process | -| Triggered by | Description | -|--------------|-------------------------------------------------------| -| Event | Domain events represent business changes that have already happened. | +An Application Service mutates one aggregate. A Saga never loads an aggregate directly — it publishes commands so that the appropriate Application Services do that work. If you catch yourself wanting to load aggregate state inside a saga, you almost certainly belong in an Application Service or in a new projection. -## Best Practices +## Timeouts -- A Saga can send new commands to drive the process forward. +A saga often needs to do something later — "if the customer has not confirmed in 24 hours, cancel the reservation". It expresses this by publishing an [`IScheduledMessage`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/ISaga.cs): -- Ensure that Sagas are idempotent to handle potential duplicate events gracefully. +```csharp +public interface IScheduledMessage : IMessage +{ + /// + /// The date when this message will be published. + /// + DateTime PublishAt { get; } +} +``` -- Maintain clear boundaries for each Saga to prevent unintended side effects. +Call `RequestTimeoutAsync(message)` on the base class. Cronus will deliver the message back to the saga at `PublishAt`, and the saga handles it through `ISagaTimeoutHandler`. -**Saga example** +## A small saga ```csharp [DataContract(Name = "d4eb8803-2cc7-48dd-9ca1-4512b8d9b88f")] -public class TaskSaga : Saga, +public class WelcomeSaga : Saga, IEventHandler, - ISagaTimeoutHandler - + ISagaTimeoutHandler { - public TaskSaga(IPublisher commandPublisher, IPublisher timeoutRequestPublisher) : base(commandPublisher, timeoutRequestPublisher) + public WelcomeSaga(IPublisher commandPublisher, IPublisher timeoutRequestPublisher) + : base(commandPublisher, timeoutRequestPublisher) { } public Task HandleAsync(UserCreated @event) { - var message = new Message(); - message.Info = @event.Name + "was created yesterday."; - message.PublishAt = DateTimeOffset.UtcNow.AddDays(1).DateTime; - message.Timestamp = DateTimeOffset.UtcNow; - - RequestTimeout(message); + var timeout = new SendWelcomeMessageTimeout + { + UserId = @event.Id, + PublishAt = DateTime.UtcNow.AddDays(1) + }; - return Task.CompletedTask; + return RequestTimeoutAsync(timeout); } - public Task HandleAsync(Message sagaTimeout) - { - Console.WriteLine(sagaTimeout.Info); - return Task.CompletedTask; + public Task HandleAsync(SendWelcomeMessageTimeout timeout) + { + return commandPublisher.PublishAsync(new SendWelcomeMessage(timeout.UserId)); } - } [DataContract(Name = "543e8e28-0dcb-4d41-98de-f701e403dbb2")] -public class Message : IScheduledMessage +public class SendWelcomeMessageTimeout : IScheduledMessage { - public string Info { get; set; } - public DateTime PublishAt { get; set; } - public DateTimeOffset Timestamp { get; set; } + [DataMember(Order = 1)] public UserId UserId { get; set; } + [DataMember(Order = 2)] public DateTime PublishAt { get; set; } } ``` +## A real saga + +A production saga rarely has a single handler. It usually reacts to a broad slice of the lifecycle it owns. For a realistic shape look at [`ExperienceVersionApprovalSaga`](https://github.com/Elders/locus.backend/blob/master/src/Elders.Locus/ExperinceApproval/Sagas/ExperienceVersionApprovalSaga.cs) in the Locus backend — it handles 13 events covering publish-for-approval, approval confirmation, rejection and every edit that invalidates a pending version — and in every case its only job is to publish a command to the right aggregate: + +```csharp +[DataContract(Name = "0528cd68-62f8-40e8-b258-72ced75d0f03")] +public class ExperienceVersionApprovalSaga : Saga, + IEventHandler, + IEventHandler, + IEventHandler, + IEventHandler, + IEventHandler, + IEventHandler, + /* ... nine more events ... */ +``` + +The saga does not touch aggregate state. Each handler inspects the event, decides on the next command and publishes it through `commandPublisher`. That is the whole pattern. + +## Configuration + +The subscriber that feeds sagas is toggled by `Cronus:SagasEnabled` (default: `true`). Turn it off on processes that must not advance saga state. + +{% content-ref url="../../configuration.md" %} +[configuration.md](../../configuration.md) +{% endcontent-ref %} + +## Best Practices + +- A Saga can send new commands to drive the process forward. + +* a saga **can** subscribe to events from any aggregate in the bounded context +* a saga **can** publish new commands through `commandPublisher` +* a saga **can** schedule timeouts through `RequestTimeoutAsync` +* a saga **must** be idempotent — the same event may arrive more than once +* a saga **must** decide what to do from the incoming event alone, not from external state +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* a saga **should not** load or mutate aggregate state — publish a command instead +* a saga **should not** send emails, call external APIs or touch the file system. Use a port or a gateway for that +* a saga **should not** write to the read model — that is a projection's job +{% endhint %} diff --git a/docs/cronus-framework/domain-modeling/handlers/triggers.md b/docs/cronus-framework/domain-modeling/handlers/triggers.md index dce70c5a..1d0b546c 100644 --- a/docs/cronus-framework/domain-modeling/handlers/triggers.md +++ b/docs/cronus-framework/domain-modeling/handlers/triggers.md @@ -1,4 +1,86 @@ # Triggers -[https://github.com/Elders/Cronus/issues/261](https://github.com/Elders/Cronus/issues/261) +A trigger is a message handler that starts something new in response to an event or a signal. The "something new" is typically a job, a long-running workflow, or a downstream orchestration that the current message flow is not supposed to wait for. +In Cronus, a trigger is any class that implements [`ITrigger`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/ITrigger.cs): + +```csharp +public interface ITrigger : IMessageHandler { } +``` + +The marker interface is deliberately minimal. A trigger gets its behaviour from the handler interfaces it implements on top — usually [`IEventHandler`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/IEventHandler.cs) or [`ISignalHandle`](https://github.com/Elders/Cronus.DomainModeling/blob/master/src/Elders.Cronus.DomainModeling/ISignalHandle.cs). + +## When to use a trigger + +Use a trigger when the reaction is: + +* Starting a background [job](../../jobs.md) that runs on its own schedule. +* Kicking off a workflow that is not bounded by the lifetime of the incoming message. +* Fanning a signal out into the job subsystem — signals are the idiomatic message type here because they are fire-and-forget and are not meant to participate in event sourcing. + +If you just need to publish a command or call the outside world, you want a [port](ports.md) or a [gateway](gateways.md) instead. + +## Relationship to jobs + +Jobs (see [Jobs](../../jobs.md)) run under the Cronus job manager and execute independently of the subscriber that received the original message. A trigger is the common way to connect the two worlds: the trigger receives an event or signal, decides whether a job should run, and starts it. + +The trigger itself does no business-state work — it is a thin adapter between the message bus and the job subsystem. + +## Example + +A trigger that starts a push-notification job when a signal arrives: + +```csharp +public class PushNotificationTrigger : ITrigger, + ISignalHandle +{ + private readonly IProjectionReader projections; + private readonly MultiPlatformDelivery delivery; + private readonly ILogger logger; + + public PushNotificationTrigger(IProjectionReader projections, MultiPlatformDelivery delivery, ILogger logger) + { + this.projections = projections; + this.delivery = delivery; + this.logger = logger; + } + + public Task HandleAsync(NotificationMessageSignal signal) + { + // resolve recipients, look up tokens, hand over to the delivery subsystem + // ... + return Task.CompletedTask; + } +} +``` + +## Configuration + +The subscriber that dispatches messages to triggers is toggled by `Cronus:TriggersEnabled` (default: `true`). Turn it off on hosts where triggers should not fire. + +{% content-ref url="../../configuration.md" %} +[configuration.md](../../configuration.md) +{% endcontent-ref %} + +{% content-ref url="../../jobs.md" %} +[jobs.md](../../jobs.md) +{% endcontent-ref %} + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* a trigger **can** start jobs, workflows or other long-running work +* a trigger **can** handle both events and signals +* a trigger **must** be idempotent — the same message may arrive more than once +* a trigger **should** delegate the actual work to a service injected into its constructor +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* a trigger **should not** publish commands as a matter of course — use a port or a saga +* a trigger **should not** mutate aggregate state +* a trigger **should not** block the subscriber on long-running work; hand it over to a job +{% endhint %} diff --git a/docs/cronus-framework/domain-modeling/ids.md b/docs/cronus-framework/domain-modeling/ids.md index 28330c4f..b482310d 100644 --- a/docs/cronus-framework/domain-modeling/ids.md +++ b/docs/cronus-framework/domain-modeling/ids.md @@ -1,6 +1,170 @@ # IDs -TODO: describe all different types of ids Cronus provides, their purpose and hierarchy. Explain how and why to define custom ids \(simple and composite\) for aggregates, entities and projections. Explain URNs and the different parsing methods. +Identity in Cronus is built on URNs. Every aggregate root, every entity, every projection, and every message that carries an id uses a URN-shaped value object. This page walks through the concrete types you will use, what they encode, and how to derive your own. -[https://github.com/Elders/Cronus/issues/273](https://github.com/Elders/Cronus/issues/273) +## The URN foundation +The base class is `Urn` (from `Elders.Cronus.DomainModeling`): + +{% code title="Urn.cs" %} +```csharp +[DataContract(Name = "d3ff08b5-38e2-4aaf-b3a8-ccc423ed096d")] +public class Urn : IEquatable, IBlobId +{ + public const char PARTS_DELIMITER = ':'; + public const char HIERARCHICAL_DELIMITER = '/'; + public const string UriSchemeUrn = "urn"; + // ... +} +``` +{% endcode %} + +A URN has three interesting parts for our purposes: + +* the scheme, always `urn`, +* the NID (namespace identifier) — in Cronus, the **tenant**, +* the NSS (namespace-specific string) — everything that identifies the resource inside the tenant. + +Cronus also accepts r-, q-, and f-components per [RFC 8141](https://tools.ietf.org/html/rfc8141), and by default is case-insensitive (`Urn.UseCaseSensitiveUrns = false`). NIDs must be 2–32 characters, alphanumeric plus `-`, and may not start or end with `-` nor contain the string `urn`. + +## AggregateRootId + +`AggregateRootId` is the concrete URN used for aggregate roots. Its NSS is `arname:id`, so a full aggregate id looks like: + +``` +urn:acme:order:a2c19b5f-0d2e-45f3-81d1-7a5b6c9d4ee8 + │ │ └─ the per-tenant id part + │ └─ the aggregate-root name + └─ the tenant (NID) +``` + +The load-bearing constructor takes those three pieces in order: + +{% code title="AggregateRootId.cs" %} +```csharp +[DataContract(Name = "b78e63f3-1443-4e82-ba4c-9b12883518b9")] +public partial class AggregateRootId : Urn +{ + public AggregateRootId(string tenant, string arName, string id) + : base(tenant, $"{arName}{PARTS_DELIMITER}{id}") + { /* ... */ } + + public string Id { get; } + public string Tenant { get; } + public string AggregateRootName { get; } +} +``` +{% endcode %} + +Argument order is **`tenant, arName, id`** — a frequent source of copy-paste bugs. Three companion statics make parsing easy: + +```csharp +AggregateRootId parsed = AggregateRootId.Parse("urn:acme:order:a2c1..."); +bool ok = AggregateRootId.TryParse("urn:acme:order:a2c1...", out AggregateRootId id); +``` + +`Parse` throws `ArgumentException` on an invalid URN; `TryParse` returns `false`. + +## Deriving a typed aggregate id + +In practice you never pass around a raw `AggregateRootId`. You derive a named subclass per aggregate, decorate it with a stable `[DataContract(Name = "")]`, and forward the three constructor arguments: + +{% code title="ExperienceId.cs (grounded in locus.backend)" %} +```csharp +[DataContract(Name = "1e8bf099-2cca-4760-bfbd-f029ca02d359")] +public class ExperienceId : AggregateRootId +{ + protected ExperienceId() { } + + public ExperienceId(string id, string tenant) + : base(tenant, "experience", id) { } + + public ExperienceId(string tenant) + : this(Guid.NewGuid().ToString(), tenant) { } +} +``` +{% endcode %} + +The private parameterless constructor is required for deserialisation. The aggregate-root name (`"experience"`) is the only free-form part — keep it lower-case, short, and stable. Changing it later is a store-fork event. + +## EntityId + +For entities hanging off an aggregate root, Cronus provides `EntityId`: + +{% code title="EntityId.cs" %} +```csharp +public abstract class EntityId : EntityId + where TAggregateRootId : AggregateRootId +{ + protected EntityId() { } + + public EntityId(ReadOnlySpan idBase, TAggregateRootId rootId) { /* ... */ } + + protected abstract ReadOnlySpan EntityName { get; } + + public TAggregateRootId AggregateRootId { get; } +} +``` +{% endcode %} + +Deriving an entity id looks like this: + +```csharp +[DataContract(Name = "1d23c591-219f-491e-bfb1-a775fe2751b6")] +public class WalletId : EntityId +{ + protected override ReadOnlySpan EntityName => "wallet"; + + WalletId() { } + + public WalletId(string idBase, UserId rootId) : base(idBase.AsSpan(), rootId) { } +} +``` + +The NSS becomes `arname:arid/entityname:entityid`, so a full entity URN reads: + +``` +urn:acme:user:u-001/wallet:main +``` + +The hierarchical `/` separates the aggregate from the entity, the `:` separates name from id on each side. This is what `EntityId.EntityRegex()` validates against. + +## What does NOT exist + +Several names you may see in older code, blog posts, or scratch branches are **not** part of the current framework: + +* `AggregateRootId` — a generic base class was sketched but left commented-out inside `AggregateRootId.cs`; there is no such generic today. Derive from the non-generic `AggregateRootId` instead. +* `AggregateUrn` — not defined anywhere in `Elders.Cronus.DomainModeling`. The previous separate type was collapsed into `AggregateRootId`. +* `IUrn` — no such interface. `Urn` implements `IBlobId`. +* `IAggregateRootId` — no generic form of `IAggregateRootId` exists in master. +* `StringTenantId` — not in the current sources. Legacy sample code that references it is stale. + +If you are porting older code, delete the references and lean on `AggregateRootId` directly. + +## Guidelines + +{% hint style="success" %} +You **can** derive as many id types as you need — one per aggregate root and one per entity. + +You **should** give every id type a stable `[DataContract(Name = "")]` and a `protected` or `private` parameterless constructor. + +You **must** pass the constructor arguments in the order `(tenant, arName, id)` for `AggregateRootId`. Mixing them up will produce URNs that look valid but cannot be re-parsed against the aggregate-root-name regex. +{% endhint %} + +{% hint style="warning" %} +You **should not** encode mutable data (emails, display names, tenant-friendly slugs) into the NSS. Ids are forever; only stable keys belong there. +{% endhint %} + +## Related + +{% content-ref url="multitenancy.md" %} +[multitenancy.md](multitenancy.md) +{% endcontent-ref %} + +{% content-ref url="aggregate.md" %} +[aggregate.md](aggregate.md) +{% endcontent-ref %} + +{% content-ref url="entity.md" %} +[entity.md](entity.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/messages/README.md b/docs/cronus-framework/domain-modeling/messages/README.md index 15939643..e0956b24 100644 --- a/docs/cronus-framework/domain-modeling/messages/README.md +++ b/docs/cronus-framework/domain-modeling/messages/README.md @@ -1,2 +1,67 @@ # Messages +A _message_ is the unit of communication that flows through a Cronus bounded context. Every message implements `IMessage` and carries a `Timestamp`. Cronus recognises four kinds of messages and each has its own role in the domain model. + +| Message type | Intent | Dispatched by | +| ----------------- | ------------------------------------------------------------------------------------------- | ----------------------- | +| **Command** | Request a business change. May be accepted or rejected by the aggregate. | API, saga, port | +| **Event** | Record a business change that already happened inside the bounded context. | Aggregate root / entity | +| **Public event** | Announce a change to the outside world (published language). Carries the originating tenant. | Aggregate root | +| **Signal** | Trigger arbitrary side-effects (heartbeats, rebuilds, process pings). | Anything | + +{% hint style="info" %} +All messages get serialised across the transport. They **must** have a parameterless constructor and a `[DataContract]` attribute with a stable GUID `Name`. That GUID is the wire identity of the contract — do not change it once the message is in production. +{% endhint %} + +{% content-ref url="../../messaging/serialization.md" %} +[serialization.md](../../messaging/serialization.md) +{% endcontent-ref %} + +## Contracts at a glance + +```csharp +public interface IMessage { DateTimeOffset Timestamp { get; } } + +public interface ICommand : IMessage { } +public interface IEvent : IMessage { } +public interface IPublicEvent : IMessage { string Tenant { get; } } +public interface ISignal : IMessage { } +public interface IScheduledMessage : IMessage { DateTime PublishAt { get; } } +``` + +Each kind is documented on its own page: + +{% content-ref url="commands.md" %} +[commands.md](commands.md) +{% endcontent-ref %} + +{% content-ref url="events.md" %} +[events.md](events.md) +{% endcontent-ref %} + +{% content-ref url="public-events.md" %} +[public-events.md](public-events.md) +{% endcontent-ref %} + +{% content-ref url="signals.md" %} +[signals.md](signals.md) +{% endcontent-ref %} + +## Publishing + +To send a message from outside an aggregate (for example from an API controller, a port, or a saga) inject the typed `IPublisher` and `await` one of its `PublishAsync` overloads: + +```csharp +Task PublishAsync(TMessage message, Dictionary headers = null); +Task PublishAsync(TMessage message, DateTime publishAt, Dictionary headers = null); +Task PublishAsync(TMessage message, TimeSpan publishAfter, Dictionary headers = null); +``` + +{% hint style="success" %} +**You can / should / must** + +* every message **must** be immutable +* every message **must** carry a stable `[DataContract(Name = "")]` identifier +* you **should** override `ToString()` — Cronus uses it when writing structured logs +* you **must** treat the return value of `PublishAsync` as the publish outcome; `false` means the transport rejected the message +{% endhint %} diff --git a/docs/cronus-framework/domain-modeling/messages/application-services.md b/docs/cronus-framework/domain-modeling/messages/application-services.md index 7c2d7c91..4c92cd3c 100644 --- a/docs/cronus-framework/domain-modeling/messages/application-services.md +++ b/docs/cronus-framework/domain-modeling/messages/application-services.md @@ -1,58 +1,7 @@ # Application Services -This is a handler where commands are received and delivered to the addressed Aggregate. Such handler is called an [_ApplicationService_](application-services.md). This is the _write side_ in [CQRS](../../concepts/cqrs.md). - -An [_ApplicationService_](application-services.md) is a command handler for a specific [Aggregate](../aggregate.md). One aggregate has one [_ApplicationService_ ](application-services.md)whose purpose is to orchestrate how a command will be fulfilled. Its the ApplicationService responsibility to invoke the appropriate Aggregate methods and pass the command's payload. It mediates between Domain and infrastructure and it shields any domain model from the "outside". Only the Application Service interacts with the domain model. - -You can create an application service with Cronus by using the `AggregateRootApplicationService` base class. Specifying which commands the application service can handle is done using the `ICommandHandler` interface. - -`AggregateRootApplicationService` provides a property of type `IAggregateRepository` that you can use to load and save the aggregate state. There is also a helper method `Update(IAggregateRootId id, Action update)` that loads and aggregate based on the provided id invokes the action and saves the new state if there are any changes. - -```csharp -public class ConcertAppService : AggregateRootApplicationService, - ICommandHandler, - ICommandHandler -{ - ... - - public void Handle(AnnounceConcert command) - { - if (Repository.TryLoad(command.Id, out _)) - return; - - var concert = new Concert(...); - Repository.Save(concert); - } - - public void Handle(RegisterPerformer command) - { - Update(command.Id, x => x.RegisterPerformer(...)); - } - - ... -} -``` - -## Best Practices - -{% hint style="success" %} -**You can/should/must...** - -* an application service **can** load an aggregate root from the event store -* an application service **can** save new aggregate root events to the event store -* an application service **can** establish calls to the read model \(not a common practice but sometimes needed\) -* an application service **can** establish calls to external services -* you **can** do dependency orchestration -* an application service **must** be stateless -* an application service **must** update only one aggregate root. Yes, you can create one aggregate and update another one but think twice before doing so. -{% endhint %} - -{% hint style="warning" %} -**You should not...** - -* an application service **should not** update more than one aggregate root in a single command/handler -* you **should not** place domain logic inside an application service -* you **should not** use an application service to send emails, push notifications etc. Use a port or a gateway instead -* an application service **should not** update the read model -{% endhint %} +Application services are the handlers for commands — they live under _Handlers_, not _Messages_. This page exists only for backwards-compatible links; head over to the canonical page: +{% content-ref url="../handlers/application-services.md" %} +[application-services.md](../handlers/application-services.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/messages/commands.md b/docs/cronus-framework/domain-modeling/messages/commands.md index 6abb5151..f48bd173 100644 --- a/docs/cronus-framework/domain-modeling/messages/commands.md +++ b/docs/cronus-framework/domain-modeling/messages/commands.md @@ -1,42 +1,33 @@ # Commands -A command is a simple immutable object that is sent to the domain to trigger a state change. There should be a single command handler for each command. It is recommended to use imperative verbs when naming commands together with the name of the aggregate they operate on. - -{% content-ref url="../handlers/application-services.md" %} -[application-services.md](../handlers/application-services.md) -{% endcontent-ref %} - -It is possible for a command to get rejected if the data it holds is incorrect or inconsistent with the current state of the aggregate. +A **command** expresses the intent to change the state of a single aggregate. It is an immutable message, authored by the caller, that the domain model can accept or reject based on the current state and its invariants. One command maps to one aggregate and one application service. {% hint style="success" %} -**You can/should/must...** +**You can / should / must** * a command **must** be immutable -* a command **should** clearly state a business intent with a name in the imperative form -* a command **can** be rejected due to domain validation, error or other reason -* a command **must** update only one aggregate +* a command **must** name a business intent in the imperative form (`CreateTask`, `SuspendAccount`) +* a command **must** update at most one aggregate +* a command **can** be rejected when validation or an invariant fails — return early in the application service +* a command **should not** be broadcast from the UI directly; go through an API {% endhint %} ## Defining a command -You can define a command with Cronus using the `ICommand` markup interface. All commands get serialized and deserialized, that's why you need to keep the parameterless constructor and specify data contracts. - -{% content-ref url="../../messaging/serialization.md" %} -[serialization.md](../../messaging/serialization.md) -{% endcontent-ref %} +Implement the `ICommand` marker. Keep a private parameterless constructor (serializers need it) and assign all properties through the public constructor so instances are effectively immutable. +{% code title="CreateTask.cs" %} ```csharp [DataContract(Name = "857d960c-4b91-49cc-98fd-fa543906c52d")] public class CreateTask : ICommand { - public CreateTask() { } + CreateTask() { } public CreateTask(TaskId id, UserId userId, string name, DateTimeOffset timestamp) { if (id is null) throw new ArgumentNullException(nameof(id)); if (userId is null) throw new ArgumentNullException(nameof(userId)); - if (name is null) throw new ArgumentNullException(nameof(name)); - if (timestamp == default) throw new ArgumentNullException(nameof(timestamp)); + if (string.IsNullOrWhiteSpace(name)) throw new ArgumentException("Name is required", nameof(name)); Id = id; UserId = userId; @@ -44,62 +35,73 @@ public class CreateTask : ICommand Timestamp = timestamp; } - [DataMember(Order = 1)] - public TaskId Id { get; private set; } - - [DataMember(Order = 2)] - public UserId UserId { get; private set; } + [DataMember(Order = 1)] public TaskId Id { get; private set; } + [DataMember(Order = 2)] public UserId UserId { get; private set; } + [DataMember(Order = 3)] public string Name { get; private set; } + [DataMember(Order = 4)] public DateTimeOffset Timestamp { get; private set; } - [DataMember(Order = 3)] - public string Name { get; private set; } - - [DataMember(Order = 4)] - public DateTimeOffset Timestamp { get; private set; } - - public override string ToString() - { - return $"Create a task with id '{Id}' and name '{Name}' for user [{UserId}]."; - } + public override string ToString() => $"Create task '{Name}' ({Id}) for user {UserId}."; } ``` +{% endcode %} {% hint style="info" %} -Cronus uses the `ToString()` method for logging, so you can override it to generate user-readable logs. Otherwise, the name of the command class will be used for log messages. +Cronus uses `ToString()` when writing structured logs. Override it to get readable output; otherwise the class name alone will appear in the logs. {% endhint %} ## Publishing a command -To publish a command, inject an instance of`IPublisher` into your code and invoke the `Publish()` method passing the command. This method will return `true` if the command has been published successfully through the configured transport. You can also use one of the overrides of the `Publish()` method to delay or schedule a command. +Inject `IPublisher` and `await` the `PublishAsync` method. The call returns `true` when the transport accepted the command. A `false` result means the command was not dispatched and the caller must decide how to recover. +{% code title="TaskController.cs" %} ```csharp [ApiController] [Route("[controller]/[action]")] public class TaskController : ControllerBase { - private readonly IPublisher _publisher; + private readonly IPublisher publisher; public TaskController(IPublisher publisher) { - _publisher = publisher; + this.publisher = publisher; } [HttpPost] - public IActionResult CreateTask(CreateTaskRequest request) + public async Task CreateTask(CreateTaskRequest request, CancellationToken ct) { - string id = Guid.NewGuid().ToString(); - string Userid = Guid.NewGuid().ToString(); - TaskId taskId = new TaskId(id); - UserId userId = new UserId(Userid); - var expireDate = DateTimeOffset.UtcNow; - expireDate.AddDays(request.DaysActive); - - CreateTask command = new CreateTask(taskId, userId, request.Name, expireDate); - - if (_publisher.Publish(command) == false) - { - return Problem($"Unable to publish command. {command.Id}: {command.Name}"); - }; - return Ok(id); + var taskId = new TaskId(request.Tenant, "task", Guid.NewGuid().ToString()); + var userId = new UserId(request.Tenant, "user", request.UserId); + + var command = new CreateTask(taskId, userId, request.Name, DateTimeOffset.UtcNow); + + if (await publisher.PublishAsync(command).ConfigureAwait(false) == false) + return Problem($"Unable to publish {command}."); + + return Accepted(taskId.Value); } } ``` +{% endcode %} + +{% hint style="info" %} +Commands are handled asynchronously by an application service running inside the Cronus host. The API returns `202 Accepted` because the command has been queued — not yet executed. +{% endhint %} + +## Delaying or scheduling a command + +`IPublisher` exposes overloads for deferred delivery: + +```csharp +await publisher.PublishAsync(command, TimeSpan.FromMinutes(5)); // delay +await publisher.PublishAsync(command, DateTime.UtcNow.AddHours(1)); // schedule +``` + +Deferred delivery requires the scheduled-message transport to be configured. See the transport-specific documentation (e.g. RabbitMQ) for setup. + +## Handling a command + +Commands are consumed by an _application service_ — the write-side entry point for a specific aggregate. + +{% content-ref url="../handlers/application-services.md" %} +[application-services.md](../handlers/application-services.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/messages/events.md b/docs/cronus-framework/domain-modeling/messages/events.md index d9bd010e..2f8fbf6b 100644 --- a/docs/cronus-framework/domain-modeling/messages/events.md +++ b/docs/cronus-framework/domain-modeling/messages/events.md @@ -1,54 +1,85 @@ # Events -An event is something significant that has happened in the domain. It encapsulates all relevant data of the action that happened. +A domain **event** is a fact: something that already happened inside the bounded context. It is the unit of change in an event-sourced system — the aggregate's state is computed by replaying its events, and projections are built by handling them. {% hint style="success" %} -**You can/should/must...** +**You can / should / must** * an event **must** be immutable -* an event **must** represent a domain event that already happened with a name in the past tense -* an event **can** be dispatched only by one aggregate +* an event **must** be named in the past tense (`TaskCreated`, `AccountSuspended`) +* an event **must** be emitted by exactly one aggregate via `Apply(IEvent)` +* an event **should** carry every piece of information a consumer might need — an event is a historical record and will never be enriched +* an event **must** keep its `[DataContract(Name = "")]` forever; renaming or retyping breaks replay {% endhint %} -To create an event with Cronus, just use the `IEvent` markup interface. +## Defining an event +Implement `IEvent`. Keep the parameterless constructor private so serializers can hydrate the instance, and expose every property with a private setter so the event is immutable from the outside. + +{% code title="TaskCreated.cs" %} ```csharp [DataContract(Name = "728fc4e7-628b-4962-bd68-97c98aa05694")] public class TaskCreated : IEvent { TaskCreated() { } - public TaskCreated(TaskId id, UserId userId, string name, DateTimeOffset timestamp) + public TaskCreated(TaskId id, UserId userId, string name, DateTimeOffset deadline, DateTimeOffset timestamp) { Id = id; UserId = userId; Name = name; - CreatedAt = DateTimeOffset.UtcNow; + Deadline = deadline; Timestamp = timestamp; } - [DataMember(Order = 1)] - public TaskId Id { get; private set; } + [DataMember(Order = 1)] public TaskId Id { get; private set; } + [DataMember(Order = 2)] public UserId UserId { get; private set; } + [DataMember(Order = 3)] public string Name { get; private set; } + [DataMember(Order = 4)] public DateTimeOffset Deadline { get; private set; } + [DataMember(Order = 5)] public DateTimeOffset Timestamp { get; private set; } - [DataMember(Order = 2)] - public UserId UserId { get; private set; } + public override string ToString() => $"Task '{Name}' ({Id}) created for user {UserId}."; +} +``` +{% endcode %} - [DataMember(Order = 3)] - public string Name { get; private set; } +{% hint style="info" %} +Cronus uses `ToString()` when writing structured logs. Override it to produce human-readable output; otherwise only the class name appears in log scopes. +{% endhint %} - [DataMember(Order = 4)] - public DateTimeOffset CreatedAt { get; private set; } +## Emitting an event - [DataMember(Order = 5)] - public DateTimeOffset Timestamp { get; private set; } +Events are never published directly — they are always emitted through an aggregate root (or an entity inside one) by calling the protected `Apply` method. The aggregate's state handler (`public void When(TEvent e)`) is invoked synchronously and the event is added to the uncommitted stream. Persistence happens when the application service calls `repository.SaveAsync(aggregate)`. - public override string ToString() +```csharp +public class TaskAggregate : AggregateRoot +{ + TaskAggregate() { } + + public TaskAggregate(TaskId id, UserId userId, string name, DateTimeOffset deadline) { - return $"Task with id '{Id}' and name '{Name}' for user [{UserId}] at {CreatedAt} has been created."; + Apply(new TaskCreated(id, userId, name, deadline, DateTimeOffset.UtcNow)); } } ``` -{% hint style="info" %} -Cronus uses the `ToString()` method for logging, so you can override it to generate user-readable logs. Otherwise, the name of the event class will be used for log messages. +{% hint style="warning" %} +Do not inject `IPublisher` into application code. Domain events belong to the aggregate that produced them; Cronus publishes them to subscribers after the aggregate commit is persisted. Publishing events manually breaks event-sourcing guarantees. {% endhint %} + +## Subscribing to an event + +Any handler that needs to react to an event (projection, saga, port, trigger) declares `IEventHandler`: + +```csharp +public interface IEventHandler where T : IEvent +{ + Task HandleAsync(T @event); +} +``` + +See the dedicated handler pages for the specifics of each subscriber kind. + +{% content-ref url="../handlers/projections.md" %} +[projections.md](../handlers/projections.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/messages/public-events.md b/docs/cronus-framework/domain-modeling/messages/public-events.md index a82de3e2..916e5b5c 100644 --- a/docs/cronus-framework/domain-modeling/messages/public-events.md +++ b/docs/cronus-framework/domain-modeling/messages/public-events.md @@ -1,4 +1,91 @@ # Public Events -[https://github.com/Elders/Cronus/issues/277](https://github.com/Elders/Cronus/issues/277) +A **public event** is an event you explicitly choose to share with the outside world — other bounded contexts, integration consumers, downstream services. It is the Cronus expression of a _published language_: a deliberate, versioned contract that outlives the internal representation of your domain events. +A public event implements `IPublicEvent`, which adds a `Tenant` property on top of `IMessage`: + +```csharp +public interface IPublicEvent : IMessage +{ + string Tenant { get; } +} +``` + +{% hint style="info" %} +Public events are delivered over a separate transport exchange (for example `PublicRabbitMQ` in `appsettings.json`). Consumers in other services subscribe to them through their own Cronus host. +{% endhint %} + +{% hint style="success" %} +**You can / should / must** + +* a public event **must** be immutable and forward-compatible — once published it lives forever +* a public event **must** carry the `Tenant` that produced it +* a public event **should** contain enough context to be consumed without round-tripping back to the source +* you **should** keep public events thinner than internal events — only what the outside world actually needs +* you **must not** change existing `[DataContract(Name)]` GUIDs or property `Order` values +{% endhint %} + +## Defining a public event + +{% code title="AccountSuspended_public.cs" %} +```csharp +[DataContract(Name = "c6d9d1ae-5e54-4e1e-9121-9ab0c4f3f7a5")] +public class AccountSuspended_public : IPublicEvent +{ + AccountSuspended_public() { } + + public AccountSuspended_public(string tenant, string accountUrn, DateTimeOffset timestamp) + { + Tenant = tenant; + AccountUrn = accountUrn; + Timestamp = timestamp; + } + + [DataMember(Order = 1)] public string Tenant { get; private set; } + [DataMember(Order = 2)] public string AccountUrn { get; private set; } + [DataMember(Order = 3)] public DateTimeOffset Timestamp { get; private set; } + + public override string ToString() => $"Account {AccountUrn} suspended in tenant {Tenant}."; +} +``` +{% endcode %} + +{% hint style="warning" %} +Do not reuse internal domain IDs directly as wire payloads. Convert them to their string URN (`id.Value`) before putting them on a public event; the receiving service will not share your assembly types. +{% endhint %} + +## Emitting a public event + +Public events are emitted from the aggregate root via an overload of `Apply`: + +```csharp +public void Suspend() +{ + if (state.IsSuspended) return; + + Apply(new AccountSuspended(state.Id, DateTimeOffset.UtcNow)); + Apply(new AccountSuspended_public(state.Id.Tenant, state.Id.Value, DateTimeOffset.UtcNow)); +} +``` + +Cronus tracks uncommitted public events separately from domain events (see `IUnderstandPublishedLanguage.UncommittedPublicEvents`). They are published **after** the aggregate commit has been persisted, so the outside world only ever sees events that actually happened. + +## Handling a public event + +Use `IPublicEventHandler` when another bounded context needs to react to the published language: + +```csharp +public class OnAccountSuspendedPort : IPort, + IPublicEventHandler +{ + public Task HandleAsync(AccountSuspended_public @event) + { + // translate external fact into a local command / projection update + return Task.CompletedTask; + } +} +``` + +{% content-ref url="../published-language.md" %} +[published-language.md](../published-language.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/messages/signals.md b/docs/cronus-framework/domain-modeling/messages/signals.md index 2bec7bc1..d36e87c7 100644 --- a/docs/cronus-framework/domain-modeling/messages/signals.md +++ b/docs/cronus-framework/domain-modeling/messages/signals.md @@ -1,4 +1,147 @@ # Signals -[https://github.com/Elders/Cronus/issues/262](https://github.com/Elders/Cronus/issues/262) +A **signal** is a fire-and-forget trigger that tells something in the system to run. Unlike a command it has no aggregate target, no invariants, and no success/failure contract — its only purpose is to fan out work. +Cronus itself uses signals for heartbeats, projection rebuilds, and index maintenance. In your domain you might use them to pulse periodic reports, announce cron-like events, or ping a saga to re-evaluate its state. + +```csharp +public interface ISignal : IMessage { } +``` + +{% hint style="success" %} +**You can / should / must** + +* a signal **must** be immutable and self-contained +* a signal **should not** encode business decisions — it's a wake-up, not an order +* a signal **can** be broadcast to many handlers at once; design them to be independent +* a signal handler **must** be idempotent — the same signal can arrive more than once +{% endhint %} + +## Defining a signal + +{% code title="RefreshDailyReportSignal.cs" %} +```csharp +[DataContract(Name = "c04b3c09-4ad4-4a62-b2f6-d5d86d4f0e55")] +public class RefreshDailyReportSignal : ISignal +{ + RefreshDailyReportSignal() { } + + public RefreshDailyReportSignal(string tenant, DateTimeOffset timestamp) + { + Tenant = tenant; + Timestamp = timestamp; + } + + [DataMember(Order = 1)] public string Tenant { get; private set; } + [DataMember(Order = 2)] public DateTimeOffset Timestamp { get; private set; } + + public override string ToString() => $"Refresh daily report for {Tenant}."; +} +``` +{% endcode %} + +## Handling a signal + +Implement `ISignalHandle`: + +```csharp +public interface ISignalHandle where T : ISignal +{ + Task HandleAsync(T signal); +} +``` + +```csharp +public class DailyReportTrigger : ITrigger, + ISignalHandle +{ + private readonly IPublisher commandPublisher; + + public DailyReportTrigger(IPublisher commandPublisher) + { + this.commandPublisher = commandPublisher; + } + + public async Task HandleAsync(RefreshDailyReportSignal signal) + { + var cmd = new RebuildDailyReport(signal.Tenant, signal.Timestamp); + await commandPublisher.PublishAsync(cmd).ConfigureAwait(false); + } +} +``` + +## Publishing a signal + +Inject `IPublisher` and call `PublishAsync`: + +```csharp +await signalPublisher.PublishAsync(new RefreshDailyReportSignal("acme", DateTimeOffset.UtcNow)); +``` + +You can also schedule a signal with one of the `PublishAsync` overloads that accepts a `DateTime` or a `TimeSpan`. + +{% hint style="info" %} +Because a signal fans out, multiple handlers can answer it. If the work must happen only once per tick, make each handler idempotent and use the signal metadata (`Timestamp`, tenant) to deduplicate. +{% endhint %} + +## Examples in Cronus itself + +Two real signal types ship inside the framework and are good references when you write your own. + +### `HeartbeatSignal` + +`CronusHeartbeat` wakes up every `Cronus:Heartbeat:IntervalInSeconds` seconds, builds a [`HeartbeatSignal`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/Heartbeat/HeartbeatSignal.cs) and publishes it through `IPublisher`. The signal carries the bounded context and the full tenant list, so downstream services can use it as a liveness probe — see [Observability](../../extensibility/observability.md) for the consumer side. + +```csharp +[DataContract(Namespace = "cronus", Name = "c80739a6-b5dc-483e-8c11-06a85542416e")] +public sealed class HeartbeatSignal : ISignal +{ + HeartbeatSignal() { Tenants = new List(); } + + public HeartbeatSignal(string boundedContext, List tenants) + { + BoundedContext = boundedContext; + Tenants = tenants; + Timestamp = DateTimeOffset.Now; + Tenant = "cronus"; + MachineName = Environment.MachineName; + EnvironmentConfig = Environment.GetEnvironmentVariable("ASPNETCORE_ENVIRONMENT"); + } + + [DataMember(Order = 0)] public string Tenant { get; private set; } + [DataMember(Order = 1)] public string BoundedContext { get; private set; } + [DataMember(Order = 2)] public List Tenants { get; private set; } + [DataMember(Order = 3)] public DateTimeOffset Timestamp { get; private set; } + [DataMember(Order = 4)] public string MachineName { get; private set; } + [DataMember(Order = 5)] public string EnvironmentConfig { get; private set; } +} +``` + +The publisher on the heartbeat side typically attaches a TTL header so a delayed consumer can drop a stale beat: + +```csharp +var headers = new Dictionary { { MessageHeader.TTL, "5000" } }; +await publisher.PublishAsync(signal, headers); +``` + +### `PublicEventsPlayer` + +[`PublicEventsPlayer`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Players/PublicEventsPlayer.cs) is a system trigger that handles `ReplayPublicEventsRequested` — a signal asking the host to republish a slice of the public-event stream to a newly-added subscriber. It is a small but representative example of an `ISystemTrigger` that reacts to a signal by kicking off a [job](../../jobs.md): + +```csharp +public sealed class PublicEventsPlayer : ISystemTrigger, + ISignalHandle +{ + private readonly ICronusJobRunner jobRunner; + private readonly ReplayPublicEvents_JobFactory jobFactory; + private readonly ILogger logger; + + public async Task HandleAsync(ReplayPublicEventsRequested signal) + { + ReplayPublicEvents_Job job = jobFactory.CreateJob(signal); + await jobRunner.ExecuteAsync(job).ConfigureAwait(false); + } +} +``` + +The pattern — receive a signal, build a job, hand it to the runner — is the canonical way to express "kick off long-running work in response to an ambient nudge". diff --git a/docs/cronus-framework/domain-modeling/multitenancy.md b/docs/cronus-framework/domain-modeling/multitenancy.md index 61466a87..52073860 100644 --- a/docs/cronus-framework/domain-modeling/multitenancy.md +++ b/docs/cronus-framework/domain-modeling/multitenancy.md @@ -30,4 +30,175 @@ The Cronus framework supports **multitenancy**, enabling a single application in By adhering to these practices, developers can leverage Cronus's multitenancy capabilities to build scalable, secure, and efficient applications that serve multiple clients effectively. +Cronus treats the **tenant** as a first-class dimension of every message. One running service can host many tenants side-by-side; their event streams, projections, message routing, and per-tenant services are all isolated by a single string. +## What a tenant is in Cronus + +A tenant in Cronus is a short alphanumeric string (for example `acme`, `contoso`, `northwind`) that scopes everything the host does while processing a message: + +* the keyspace / table prefix used by the event store, +* the keyspace / table prefix used by the projection store, +* the RabbitMQ exchange and queue names, +* any service you register with a per-tenant lifetime. + +The tenant is not a property you carry around manually. Cronus takes the inbound message, resolves the tenant from it once, pins that tenant on the current async scope, and every component downstream that asks "which tenant am I in?" gets the same answer. + +## Configuring the tenant list + +The set of tenants a host serves is controlled by the `Cronus:Tenants` configuration key. See [`configuration.md`](../configuration.md) for the full schema. + +{% code title="appsettings.json" %} +```json +{ + "cronus": { + "boundedcontext": "billing", + "tenants": [ "acme", "contoso" ] + } +} +``` +{% endcode %} + +Each entry is lower-cased, trimmed, and validated by `TenantsOptions` against the regex `^\b([\w\d_]+$)`, so tenant names may only contain alphanumeric characters and underscores. The binding is performed by `TenantsOptionsProvider` under the setting key `cronus:tenants`. + +{% code title="TenantsOptions.cs" %} +```csharp +public class TenantsOptions +{ + [Required(AllowEmptyStrings = false, ErrorMessage = "The configuration `Cronus:Tenants` is required.")] + [CollectionRegularExpression(@"^\b([\w\d_]+$)")] + public IEnumerable Tenants { get; set; } +} +``` +{% endcode %} + +Inject `IOptionsMonitor` anywhere you need to enumerate the configured tenants at runtime. + +## How the tenant travels + +While handling a message, Cronus creates an `IServiceScope` and attaches a `CronusContext` to it. The context holds the tenant and the scope's service provider: + +{% code title="CronusContext.cs" %} +```csharp +public sealed class CronusContext +{ + public CronusContext(string tenant, IServiceProvider serviceProvider) + { + if (string.IsNullOrEmpty(tenant)) + throw new ArgumentException( + "Unknown tenant. CronusContext is not properly built. " + + "Make sure that you have properly configured `cronus:tenants`."); + if (serviceProvider is null) throw new ArgumentNullException(nameof(serviceProvider)); + + Tenant = tenant; + ServiceProvider = serviceProvider; + Trace = new Dictionary(); + } + + public string Tenant { get; private set; } + public IServiceProvider ServiceProvider { get; private set; } + public Dictionary Trace { get; } + public bool IsNotInitialized => string.IsNullOrEmpty(Tenant) || ServiceProvider is null; + public bool IsInitialized => IsNotInitialized == false; +} +``` +{% endcode %} + +The context is exposed via `ICronusContextAccessor`, which stores it in an `AsyncLocal` so it flows across `await` boundaries without you passing it around. The `IsNotInitialized` short-circuit exists so callers outside a message-handling scope can detect that there is no current tenant rather than accidentally reading `null`. + +## The tenant-resolver chain + +Cronus resolves the tenant from the inbound message via a two-layer chain. + +* A non-generic **dispatcher**, `TenantResolver : ITenantResolver`, receives the raw `object` and looks up a typed resolver for its runtime type. It also implements the single-tenant fallback: if no resolver returns a value but `Cronus:Tenants` has exactly one entry, that entry is used. +* Typed resolvers implement `ITenantResolver`. `DefaultTenantResolver` ships implementations for `AggregateRootId`, `AggregateCommit`, `IMessage`, `IBlobId`, `CronusMessage`, and `string`. For `IMessage`, it first looks for a `Tenant` property, then falls back to scanning properties of type `IBlobId` and parsing their URN. + +{% code title="TenantResolver.Resolve" %} +```csharp +string tenant = resolverCache.GetTenantFrom(source); + +if (string.IsNullOrEmpty(tenant) == false) + return tenant; + +if (tenants.Tenants.Count() == 1) + return tenants.Tenants.Single(); + +throw new UnableToResolveTenantException("Unable to resolve tenant."); +``` +{% endcode %} + +Registration is automated by `MultitenancyDiscovery`: every non-abstract type implementing `ITenantResolver` found in the loaded assemblies is registered as a singleton for each `ITenantResolver` interface it closes. To add your own resolver (for example one that pulls the tenant from a custom transport header), just implement `ITenantResolver` and let discovery pick it up. + +## Per-tenant services + +Two helpers make tenant-scoped singletons painless: + +* `SingletonPerTenant` resolves, caches, and returns a `T` per tenant, setting `IHaveTenant.Tenant` on the instance if the type implements that marker. +* `SingletonPerTenantContainer` is the shared dictionary behind `SingletonPerTenant` and disposes cached instances when the host shuts down. + +Register them with `AddTenantSingleton` from [`CronusServiceCollectionExtensions`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/CronusServiceCollectionExtensions.cs): + +{% code title="CronusServiceCollectionExtensions.cs (extract)" %} +```csharp +public static IServiceCollection AddTenantSingleton(this IServiceCollection services) + where TService : class + where TImplementation : class, TService +{ + services.AddTransient(); + services.AddTransient(provider => + provider.GetRequiredService>().Get()); + + return services; +} +``` +{% endcode %} + +`AddTenantSupport` wires `SingletonPerTenant<>` as transient and `SingletonPerTenantContainer<>` as singleton, so every resolution of `TService` transparently dispatches to the correct tenant-specific instance. + +## Aggregate ids carry the tenant + +Every `AggregateRootId` encodes the tenant in its URN. The constructor takes the tenant first: + +```csharp +public AggregateRootId(string tenant, string arName, string id) +``` + +See [`ids.md`](ids.md) for the full URN layout. `DefaultTenantResolver.Resolve(AggregateRootId id)` simply returns `id.Tenant`; this is why messages that expose an `AggregateRootId` property do not need to carry a separate `Tenant` string. + +## End-to-end example + +```csharp +// 1. Boot Cronus with two tenants. +var builder = Host.CreateDefaultBuilder(args) + .ConfigureServices((ctx, services) => services.AddCronus(ctx.Configuration)); + +// 2. Register a tenant-scoped service. +builder.ConfigureServices(s => s.AddTenantSingleton()); + +// 3. Inside any handler, read the current tenant from the context. +public class OrderCreatedHandler : IEventHandler +{ + private readonly ICronusContextAccessor ctx; + public OrderCreatedHandler(ICronusContextAccessor ctx) => this.ctx = ctx; + + public Task HandleAsync(OrderCreated @event) + { + var tenant = ctx.CronusContext.Tenant; // "acme" or "contoso" + // ... + return Task.CompletedTask; + } +} +``` + +## Guidelines + +{% hint style="success" %} +You **can** read `ICronusContextAccessor.CronusContext.Tenant` anywhere inside a message-handling scope. + +You **should** prefer `AddTenantSingleton` over manually caching per-tenant state. + +You **must** register every tenant in `Cronus:Tenants` before sending messages for it — `DefaultCronusContextFactory` rejects unknown tenants at scope creation time. +{% endhint %} + +{% hint style="warning" %} +You **should not** put the tenant in the URL or a request body as a plain string and trust it. Resolve it from an authenticated claim or the aggregate id instead — that is what `ITenantResolver` is for. +{% endhint %} diff --git a/docs/cronus-framework/domain-modeling/published-language.md b/docs/cronus-framework/domain-modeling/published-language.md index 22b9f164..19b5ec3d 100644 --- a/docs/cronus-framework/domain-modeling/published-language.md +++ b/docs/cronus-framework/domain-modeling/published-language.md @@ -1,4 +1,140 @@ # Published Language -[https://github.com/Elders/Cronus/issues/203](https://github.com/Elders/Cronus/issues/203) +A **published language** is the shared vocabulary between two bounded contexts — the set of messages they agree on so one can consume what the other emits without coupling internal models. In Cronus the published language is defined by the messages you mark with `[DataContract]`. +## The DataContract convention + +Every serialisable message, aggregate id, entity id, and value object in Cronus carries `[DataContract(Name = "")]`. The GUID is the **stable serialization id** for the type. It is not a version number and not a human-friendly name — it is the thing the wire format and the event store use to address a CLR type. + +If you rename the class `OrderPlaced` to `OrderSubmitted`, the GUID in the attribute stays the same. Serialized events already on disk continue to resolve to the new class name, because look-up goes through the GUID, not through `typeof(T).FullName`. + +`MessageInfo` implements this indirection: + +{% code title="MessageInfo.cs (extract)" %} +```csharp +private static string GetAndCacheContractIdFromAttribute(Type contractType) +{ + DataContractAttribute contract = contractType + .GetCustomAttributes(false).Where(attr => attr is DataContractAttribute) + .SingleOrDefault() as DataContractAttribute; + + if (contract == null || String.IsNullOrEmpty(contract.Name)) + { + throw new Exception(String.Format( + "The message type '{0}' is missing a DataContract attribute. " + + "Example: [DataContract(\"00000000-0000-0000-0000-000000000000\")]", + contractType.FullName)); + } + + return contract.Name; +} +``` +{% endcode %} + +A type without `[DataContract(Name = ...)]` will throw the moment Cronus tries to serialise it — there is no fallback. + +## The Namespace field is the bounded context + +`DataContract.Namespace` doubles as the bounded context for the type: + +{% code title="MessageInfo.cs (extract)" %} +```csharp +public static string GetBoundedContext(this Type messageType, string defaultBoundedContext = "implicit") +{ + // ... cached look-up ... + if (contract is null == false && contract.IsNamespaceSetExplicitly) + boundedContext = contract.Namespace; + + return boundedContext.ToLower(); +} +``` +{% endcode %} + +If a type sets `Namespace` explicitly, that value wins; otherwise the default `"implicit"` is used. The value is always lower-cased, because RabbitMQ exchanges and queues are case-sensitive and Cronus keeps the routing keys canonical. + +See [`bounded-context.md`](bounded-context.md) for how that value flows into RabbitMQ routing and Cassandra keyspaces. + +## Real examples + +The convention is uniform across the Elders ecosystem. A few live attributes: + +* Framework-level: `Urn` and `AggregateRootId` both opt in. + +```csharp +// Cronus.DomainModeling/Urn.cs +[DataContract(Name = "d3ff08b5-38e2-4aaf-b3a8-ccc423ed096d")] +public class Urn : IEquatable, IBlobId { /* ... */ } + +// Cronus.DomainModeling/AggregateRootId.cs +[DataContract(Name = "b78e63f3-1443-4e82-ba4c-9b12883518b9")] +public partial class AggregateRootId : Urn { /* ... */ } +``` + +* An event in a product code-base (no explicit namespace, so it routes under the service's own bounded context): + +```csharp +// locus.backend/Experiences/Events/ExperienceMovedToBin.cs +[DataContract(Name = "945df1c4-ae46-4849-a094-e83a67abc95b")] +public class ExperienceMovedToBin : IEvent +{ + [DataMember(Order = 1)] public ExperienceId Id { get; private set; } + [DataMember(Order = 2)] public UserId MovedBy { get; private set; } + [DataMember(Order = 3)] public DateTime UpdatedAt { get; private set; } +} +``` + +* A contract shared across services — the namespace is set so the routing stays stable regardless of which host serialises it: + +```csharp +// Elders.IdentityAndAccess.Contracts/Profiles/ProfileName.cs +[DataContract(Namespace = BC.IdentityAndAccess, + Name = "6f353a50-34ec-47dc-ba9f-1862caff7010")] +public sealed record ProfileName +{ + [DataMember(Order = 1)] public string Name { get; init; } + [DataMember(Order = 2)] public string FirstName { get; init; } + /* ... */ +} +``` + +{% hint style="info" %} +Cronus does not ship a `ValueObject` base class. Use `record` for value-object semantics, or hand-write `Equals`/`GetHashCode` on a plain class. Older Cronus packages exposed a `ValueObject` base — code you find in legacy services (e.g. older `Elders.IdentityAndAccess` or `locus.backend` branches) may still reference it. +{% endhint %} + +Note the companion convention: a static constants class (`BC`) holds the bounded-context strings so each contract references the same literal. + +## Why this matters for event sourcing + +Event sourcing persists events forever. If a rename, a move, or a refactor broke the mapping between a serialized payload and a CLR type, replay would stop working. The GUID in `[DataContract(Name = ...)]` is the insurance against that: it pins the serialization id to the type across every rename and every namespace move. + +`[DataMember(Order = N)]` on every persisted property is the field-level equivalent: removing or renaming a field without keeping its `Order` number stable breaks replay. + +## Versioning discipline + +When the shape of an event has to change in a way that is not backwards-compatible, do not mutate the existing type. Create a new type with a new GUID, then add a migration to transform old payloads into the new shape during replay. + +{% hint style="success" %} +You **can** rename the CLR type, move it between namespaces, or change its members' CLR names — as long as `[DataContract(Name)]` and the `[DataMember(Order)]` values are preserved. + +You **should** treat the `Name` GUID as effectively public API: once a message with that contract has been persisted or published, the GUID is burned in. + +You **must** mint a new GUID for every semantically new message type. Never reuse an old GUID for a type with new meaning. +{% endhint %} + +{% hint style="warning" %} +You **should not** change `[DataMember(Order = ...)]` values on an existing contract, and you **should not** delete persisted fields. Add a new contract with a new GUID and migrate instead. +{% endhint %} + +## Related + +{% content-ref url="bounded-context.md" %} +[bounded-context.md](bounded-context.md) +{% endcontent-ref %} + +{% content-ref url="messages/events.md" %} +[events.md](messages/events.md) +{% endcontent-ref %} + +{% content-ref url="messages/public-events.md" %} +[public-events.md](messages/public-events.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/domain-modeling/signals.md b/docs/cronus-framework/domain-modeling/signals.md deleted file mode 100644 index 720ae1cb..00000000 --- a/docs/cronus-framework/domain-modeling/signals.md +++ /dev/null @@ -1,2 +0,0 @@ -# Signals - diff --git a/docs/cronus-framework/domain-modeling/value-object.md b/docs/cronus-framework/domain-modeling/value-object.md index 3a785c8e..0daaff94 100644 --- a/docs/cronus-framework/domain-modeling/value-object.md +++ b/docs/cronus-framework/domain-modeling/value-object.md @@ -1,37 +1,107 @@ # Value Object -Value objects represent immutable and atomic data. They are distinguishable only by the state of their properties and do not have an identity or any identity tracking mechanism. Two value objects with the exact same properties can be considered equal. You can read more about value objects in [this](https://martinfowler.com/bliki/ValueObject.html) article. +A **value object** is an immutable, identity-less piece of the domain model. Two value objects are equal when their properties are equal — there is no "same-ness" tracked separately from the data itself. Prices, coordinates, phone numbers, date ranges, colour codes: these are all value objects. See Martin Fowler's [short piece](https://martinfowler.com/bliki/ValueObject.html) for the canonical definition. -To define a value object with Cronus, create a class that inherits the base helper class `ValueObject`. Keep all related to the value object business rules and data within the class. +{% hint style="warning" %} +Cronus does **not** ship a `ValueObject` base class. Older versions of the library had one, but it has been removed. You are free to use any immutable type that supports value-based equality. +{% endhint %} + +The two idiomatic options on modern .NET are records and hand-written immutable classes. + +## Option 1 — record (recommended) + +`record` types give you structural equality for free: +{% code title="Money.cs" %} ```csharp [DataContract(Name = "1b6187f0-88c7-46d5-a22d-b39301765412")] -public class Performer: ValueObject +public record Money { - Performer() {} + Money() { } - public Performer(string name, string coverImage) + public Money(decimal amount, string currency) { - // null check - Name = name; - CoverImage = coverImage; + if (amount < 0) throw new ArgumentOutOfRangeException(nameof(amount)); + if (string.IsNullOrWhiteSpace(currency)) throw new ArgumentException("Currency is required", nameof(currency)); + + Amount = amount; + Currency = currency; } - [DataMember(Order = 1)] - public string Name { get; private set; } + [DataMember(Order = 1)] public decimal Amount { get; init; } + [DataMember(Order = 2)] public string Currency { get; init; } - [DataMember(Order = 2)] - public string CoverImage { get; private set; } + public Money Add(Money other) + { + if (Currency != other.Currency) throw new InvalidOperationException("Cannot add different currencies."); + return new Money(Amount + other.Amount, Currency); + } } ``` +{% endcode %} + +Two `Money(10, "EUR")` instances are equal by definition. `GetHashCode`, `Equals` and the `==` / `!=` operators are generated for you. + +## Option 2 — immutable class with manual equality + +If you need more control (base classes, private backing fields, bespoke comparisons) write a regular class: + +{% code title="GeoCoordinate.cs" %} +```csharp +[DataContract(Name = "5d0b3b92-87bf-4f3c-b1a6-70de54d2cbfd")] +public sealed class GeoCoordinate : IEquatable +{ + GeoCoordinate() { } + + public GeoCoordinate(double latitude, double longitude) + { + if (latitude < -90 || latitude > 90) throw new ArgumentOutOfRangeException(nameof(latitude)); + if (longitude < -180 || longitude > 180) throw new ArgumentOutOfRangeException(nameof(longitude)); + + Latitude = latitude; + Longitude = longitude; + } + + [DataMember(Order = 1)] public double Latitude { get; private set; } + [DataMember(Order = 2)] public double Longitude { get; private set; } + + public bool Equals(GeoCoordinate other) => + other is not null && Latitude == other.Latitude && Longitude == other.Longitude; + + public override bool Equals(object obj) => Equals(obj as GeoCoordinate); + + public override int GetHashCode() => HashCode.Combine(Latitude, Longitude); + + public static bool operator ==(GeoCoordinate left, GeoCoordinate right) => + left is null ? right is null : left.Equals(right); + + public static bool operator !=(GeoCoordinate left, GeoCoordinate right) => !(left == right); +} +``` +{% endcode %} + +{% hint style="success" %} +**You can / should / must** + +* a value object **must** be immutable once constructed +* a value object **must** validate its inputs in the constructor — an invalid value never exists +* a value object **should** carry behaviour that makes sense for the value (`Money.Add`, `DateRange.Overlaps`) +* a value object **must** override equality and `GetHashCode` (records do it for you) +* a value object **should** keep a parameterless constructor and `[DataContract]` attributes so it round-trips through serialization +{% endhint %} + +## Collections of value objects -The base class `ValueObject` implements the `IEqualityComparer` and `IEquatable` interfaces. When comparing two value objects of the same type the properties from the first are being compared with the properties of the second using reflection. The base class also overrides the `==` and `!=` operators. +If a property is a collection of value objects, make sure the collection itself supports element-by-element equality. Standard `List` does not — two `List` with the same contents are not equal under default equality. Either use `HashSet` with a value-equality comparer, or compute equality explicitly. {% hint style="info" %} -If a value object contains a collection of items, make sure that the items are also value objects and the collection supports item-by-item comparison. Otherwise, you will have to override the default comparison algorithm. +If you ever find yourself writing "mutate" methods on a value object, step back: you probably want an _entity_ instead. {% endhint %} -Keep a parameterless constructor and specify a data contract for serialization. +## Serialization reminder -{% page-ref page="../messaging/serialization.md" %} +Value objects travel inside events, commands and projection state, so they must serialize cleanly: +{% content-ref url="../messaging/serialization.md" %} +[serialization.md](../messaging/serialization.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/event-store/README.md b/docs/cronus-framework/event-store/README.md index 65ca356a..4bc96ed6 100644 --- a/docs/cronus-framework/event-store/README.md +++ b/docs/cronus-framework/event-store/README.md @@ -1,4 +1,77 @@ # Event Store -[https://github.com/Elders/Cronus/issues/265](https://github.com/Elders/Cronus/issues/265) +The event store is the append-only log of domain events that backs every Cronus aggregate. It is the single source of truth for the write side of your service: everything your service can "remember" about what happened — every command outcome, every state change — is derived from the events that live in this log. Projections, indices and read models are all derived artefacts that can be thrown away and rebuilt from the event store. +## What the event store is + +Every time an aggregate successfully handles a command it produces zero or more events. Cronus groups those events together with the aggregate root id, the new revision and a timestamp into an [`AggregateCommit`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/AggregateCommit.cs) and appends it to the event store. An aggregate is later rehydrated by loading all commits for its id and replaying the events through the `When` methods of its state. + +The contract is deliberately small. It appends, loads, paginates and deletes — that is all: + +```csharp +public interface IEventStore +{ + Task AppendAsync(AggregateCommit aggregateCommit); + Task AppendAsync(AggregateEventRaw eventRaw); + Task LoadAsync(IBlobId aggregateId); + Task DeleteAsync(AggregateEventRaw eventRaw); + Task LoadWithPagingAsync(IBlobId aggregateId, PagingOptions pagingOptions); + Task LoadAggregateEventRaw(IndexRecord indexRecord); +} +``` + +## Core abstractions + +The event-store subsystem exposes a handful of types that you will see repeatedly throughout the framework: + +* [`IEventStore`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/IEventStore.cs) — append, load and delete aggregate commits. The canonical implementation you talk to from domain code is wrapped by [`CronusEventStore`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/CronusEventStore.cs), which adds structured logging on top of the backend store. +* [`IEventStorePlayer`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/IEventStorePlayer.cs) — enumerates every event in the store without caring about aggregate boundaries. Used by index and projection rebuilds. See [EventStore Player](eventstore-player.md). +* [`AggregateCommit`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/AggregateCommit.cs) — the on-the-wire unit of a write: `AggregateRootId`, `Revision`, list of private `IEvent`s, list of `IPublicEvent`s and a timestamp. +* [`EventStream`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/EventStream.cs) — the result of loading a single aggregate. It holds the ordered list of commits and exposes `TryRestoreFromHistory(out T aggregateRoot)` to rebuild an aggregate instance. +* [`AggregateRepository`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/AggregateRepository.cs) — the repository you normally interact with from an application service. It delegates to `IEventStore` and handles integrity checks, atomic actions and duplicate-timestamp detection. + +A write therefore looks like: application service loads the aggregate via `AggregateRepository.LoadAsync`, calls a domain method, which `Apply`s events, then `SaveAsync` wraps those events into an `AggregateCommit` and calls `IEventStore.AppendAsync`. + +## Canonical backend: Cassandra + +The production-grade event-store backend for Cronus is [`Cronus.Persistence.Cassandra`](https://github.com/Elders/Cronus.Persistence.Cassandra). It has been in production use since 2013 and is marked `olympus` in the [ecosystem](https://github.com/Elders/Cronus). Cassandra is a natural fit: the access pattern of an event store (append-only, partitioned by aggregate id, ordered by revision) maps cleanly to Cassandra's wide rows, and its distributed replication story covers the durability requirements without extra effort. + +If you are wiring up a new service, use the Cassandra persister. The connection strings and replication settings are documented under `Cronus:Persistence:Cassandra:*` in [Configuration](../configuration.md#cronus-persistence-cassandra). + +## Secondary backends + +Cronus ships with a handful of alternative stores; their maturity and intended use are described in the [ecosystem](https://github.com/Elders/Cronus) reference: + +* [`Cronus.Persistence.CosmosDb`](https://github.com/Elders/Cronus.Persistence.CosmosDb) — alternative cloud-native store. +* [`Cronus.Persistence.MSSQL`](https://github.com/Elders/Cronus.Persistence.MSSQL) — `styx`. Worked for Cronus v1, but a relational database is a poor match for an append-only log and we do not recommend it for new work. +* [`Cronus.Persistence.Git-`](https://github.com/Elders/Cronus.Persistence.Git-) — `tartarus`. Exists "just for fun". + +## Schema evolution + +The events you persist today will outlive your current code. When a business rule changes, you rename a field or split an event in two, the old events stay on disk forever — you do not get to "edit migrations". Cronus provides a set of migration primitives for those situations (copy an event store with transformations, delete events, validate a store after a migration). See [Migrations](migrations/README.md) and [Copy EventStore](migrations/copy-eventstore.md). + +## Indices + +Reading events by aggregate id is cheap. Reading events by type, or counting them, is not — the store is partitioned by aggregate, not by event type. Cronus maintains a secondary index subsystem to make those queries possible. See [Indices](../indices.md) for the list of built-in indices and how they are rebuilt. + +## Projections + +A projection is a read model derived from events. It lives in its own store, keeps its own versions and can be rebuilt from scratch by replaying events through [`IEventStorePlayer`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/IEventStorePlayer.cs). See [Projections](../projections/README.md) and [Handlers/Projections](../domain-modeling/handlers/projections.md). + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **must** pick a backend that supports append-only semantics — Cassandra is the recommended choice +* you **should** treat the event store as the only durable source of truth; everything else is derived +* you **can** add new backends behind `IEventStore` without changing your domain code +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **must not** mutate events in place; changes to events require an explicit migration +* you **must not** delete events because "they are not used anymore"; they are still needed to rebuild projections and to audit history +* you **should not** bypass `AggregateRepository` when writing from domain code; direct `IEventStore.AppendAsync` skips integrity checks +{% endhint %} diff --git a/docs/cronus-framework/event-store/eventstore-player.md b/docs/cronus-framework/event-store/eventstore-player.md index fc04444e..cc716321 100644 --- a/docs/cronus-framework/event-store/eventstore-player.md +++ b/docs/cronus-framework/event-store/eventstore-player.md @@ -1,2 +1,54 @@ # EventStore Player +The event-store _player_ is the read counterpart of `IEventStore` used by rebuilds. Where `IEventStore.LoadAsync(IBlobId)` loads the stream of a single aggregate, the player enumerates the whole store — scoped by event type, timestamp window and pagination token — without caring about aggregate boundaries. It is the mechanism that powers projection rebuilds, index rebuilds and public-event replay. + +## The contract + +The player contract lives at [`Cronus/src/Elders.Cronus/EventStore/IEventStorePlayer.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/IEventStorePlayer.cs): + +```csharp +public interface IEventStorePlayer +{ + Task EnumerateEventStore(PlayerOperator @operator, PlayerOptions replayOptions, CancellationToken cancellationToken = default); +} +``` + +A caller hands the player two values: + +* A [`PlayerOperator`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/IEventStorePlayer.cs) — a bag of callbacks the player invokes per raw event, per aggregate stream and on pagination checkpoints. The two you will set most often are `OnLoadAsync` (one raw event at a time) and `NotifyProgressAsync` (checkpoint signal, used to persist a pagination token between batches). +* A [`PlayerOptions`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/ReplayOptions.cs) — `EventTypeId`, `PaginationToken`, `BatchSize` (defaults to `1000`), `After` and `Before` bounds and `MaxDegreeOfParallelism` (defaults to `2`). + +Raw events travel as [`AggregateEventRaw`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/AggregateEventRaw.cs) so a player can move bytes without deserialising them. The typed variant [`AggregateStream`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/EventStream.cs) groups raw events of one aggregate into `AggregateCommitRaw` by revision — useful when you want to see a single aggregate's commits together. + +## How it is used + +The player is the workhorse under several framework jobs: + +* [`RebuildProjection_Job`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Rebuilding/RebuildProjection_Job.cs) iterates once per event type that the projection handles and calls `EnumerateEventStore` for each. The `OnLoadAsync` callback deserialises the raw bytes, writes a projection commit and updates the [`ProgressTracker`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Rebuilding/ProgressTracker.cs). `NotifyProgressAsync` persists the pagination token into job data so the work survives a process restart. +* [`ReplayPublicEvents_Job`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Players/ReplayPublicEvents_Job.cs) republishes the stream of one public-event type so a newly-added subscriber can catch up. +* The index rebuild jobs under [`Cronus/src/Elders.Cronus/EventStore/Index/`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/) use the same mechanism to walk every event and update the secondary indices. + +The typed sibling `IEventStorePlayer` exists so that different backends can register their own players into DI without clashing. + +## When to use it + +Reach for the player when you need to walk every event of a given type (or a time slice of them) and do something idempotent with each — rebuild a projection, backfill a new index, republish a stream to a new subscriber. The player already knows how to paginate, resume from a token and report progress. + +Reach for a one-off [migration](migrations/README.md) instead when you need to _transform_ the contents of the store — copy it into a new keyspace with some events rewritten, or remove events that were persisted in error. Migrations give you a destination store; the player only reads. + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **should** make `OnLoadAsync` idempotent; a replay may re-deliver events if the process restarts +* you **should** persist the pagination token via `NotifyProgressAsync` if you run long replays +* you **can** cap the blast radius with `After` / `Before` when backfilling historical data +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **must not** mutate the source store from within a player — use a [migration](migrations/README.md) if you need that +* you **should not** assume an event is delivered exactly once; design for at-least-once +{% endhint %} diff --git a/docs/cronus-framework/event-store/migrations/README.md b/docs/cronus-framework/event-store/migrations/README.md index 0876c182..e1e41c57 100644 --- a/docs/cronus-framework/event-store/migrations/README.md +++ b/docs/cronus-framework/event-store/migrations/README.md @@ -1,37 +1,68 @@ # Migrations -## What is data migration? +A migration is a one-time transformation applied to the durable state of a Cronus service. You reach for one whenever the shape of persisted data has to change in a way that the runtime cannot repair on its own — renaming a contract that is still referenced in historical events, scrubbing a field that legally has to leave the store, splitting an event into two, or moving the entire store into a new keyspace. -Data migration is the process of moving data from one system to another. And there are many reasons why a system may require such a move. To name most common ones: +## When do you migrate events vs projections -* Natural system evolution which requires the data to be optimized for performance or maintainability. -* Legal issues where some parts of the data have to be deleted or encrypted -* Bad data created by a bug in the system -* Business reason. When businesses merge or split. +There is a hard asymmetry between event stores and projection stores in Cronus, and it drives the whole migrations design: -{% hint style="warning" %} -It is important that the business value of the data is not changed during the process. -{% endhint %} +* **Projections are derived and disposable.** You can always drop the projection tables and replay the events back in to rebuild them. Projection-shape changes are handled by the [projection versioning](../../projections/versioning.md) mechanism, not by migrations. Cronus will detect that the shape hash changed, request a new version and replay events from the event store through the projection handler. +* **Events are forever.** Every change that ever happened to the business is recorded as an event, and the events are the only record. If an event's shape must change, you need to migrate the event-store data itself — and because the original store is the source of truth for the entire system you cannot mutate it blindly. -There are many different strategies when and how to do data migration. You must carefully plan and execute because damages could be significant. +A rule of thumb: if you are changing the shape of a _write_ (an event, an aggregate-commit, a tenant-owned keyspace), you need a migration. If you are changing the shape of a _read_ (a projection, an index, a dashboard), you rebuild the derived artefact from the existing events. -## Challenges +## What Cronus provides -Depending on the data volume the migration process could take hours, even days. During that time there are many things which could fail and corrupt the data in a irreversible way. To avoid such scenarios you should always migrate the data into a new storage repository. +The abstractions live under [`Cronus/src/Elders.Cronus/Migrations/`](../../../../src/Elders.Cronus/Migrations/). The shapes you will interact with most are: -{% hint style="info" %} -Always migrate the data into a new storage repository. -{% endhint %} +* [`IMigration`](../../../../src/Elders.Cronus/Migrations/IMigration.cs) and `IMigration` — the single-event predicate-plus-transformation: -Make sure the migration process does not overwhelm the live system. You should be in control when the data is being migrated so you could pause the migration during peek times of the live system. To achieve this, use a separate process to run data migration. Always keep in mind that migrating data takes from your system resources and you must account for that. + ```csharp + public interface IMigration : IMigration + where T : class where V : class + { + bool ShouldApply(T current); + V Apply(T current); + } + ``` -{% hint style="info" %} -Use a separate process to run data migration. -{% endhint %} + Implementations decide whether a given `AggregateEventRaw` or `AggregateCommit` should be rewritten, and return the replacement. +* [`MigrationRunnerBase`](../../../../src/Elders.Cronus/Migrations/MigrationRunnerBase.cs) — the base class all runners inherit from. It holds an `IEventStorePlayer` (the source) and an `IEventStore` (the destination) and declares `Task RunAsync(IEnumerable> migrations)`. +* [`CopyEventStore`](../../../../src/Elders.Cronus/Migrations/CopyEventStore.cs) — the runner used for the _"move everything from store A to store B, applying these transformations on the way"_ workflow. See [Copy EventStore](copy-eventstore.md) for the end-to-end description. +* [`DeleteEventStoreEvents`](../../../../src/Elders.Cronus/Migrations/DeleteEventStoreEvents.cs) — the runner that walks the store and deletes events for which `migration.ShouldApply` returns true. Use it with caution. +* [`ValidateEventStore`](../../../../src/Elders.Cronus/Migrations/ValidateEventStore.cs) — re-plays the source events into the target to verify that the migration preserves what it should. +* [`CronusMigrator`](../../../../src/Elders.Cronus/Migrations/CronusMigrator.cs) and [`MigrationHandler`](../../../../src/Elders.Cronus/Migrations/MigrationHandler.cs) — the live pipeline: given an incoming `AggregateCommit`, apply each registered `IMigration` in order and then invoke `IMigrationCustomLogic.OnAggregateCommitAsync`. This is how the migrator service processes new events that arrive while a migration is still running. +* [`AggregateCommitMigrationWorkflow`](../../../../src/Elders.Cronus/Migrations/AggregateCommitMigrationWorkflow.cs) — a [`Workflow>`](../../../../src/Elders.Cronus/Migrations/MigrationWorkflowBase.cs) step that produces zero, one or many resulting commits from a single input commit. +* [`IMigrationEventStorePlayer`](../../../../src/Elders.Cronus/Migrations/IMigrationEventStorePlayer.cs) — a marker `IEventStorePlayer` that isolates the migration player from the rest of the system, so the live application is not accidentally driven by the historical replay. + +The flag that turns the whole subsystem on is `Cronus:MigrationsEnabled` (see [Configuration](../../configuration.md#cronus-migrationsenabled)). It defaults to `false` because you almost never want a production host to be also a migrator. + +## A sensible workflow + +The pattern that comes up most in practice: -When you are migrating a +1. Stand up a separate _migrator_ host, configured with the new, target store as its `IEventStore` and the old store as its `IEventStorePlayer`. +2. Subscribe the migrator to the same live events the production host subscribes to, and write them straight into the new store through `CronusMigrator.MigrateAsync` so nothing is lost while the backfill runs. +3. Run [`CopyEventStore`](copy-eventstore.md) against the old store to copy history into the new store, applying the required `IMigration` transformations on the way. +4. Run [`ValidateEventStore`](../../../../src/Elders.Cronus/Migrations/ValidateEventStore.cs) to confirm the new store is internally consistent. +5. Flip the production host to read from the new store, retire the old one. -## How to do +Related packaging lives in the satellite [`Cronus.Migration.Middleware`](https://github.com/Elders/Cronus.Migration.Middleware), which ships the middleware needed to drop the above pipeline into a host. -1. Create a separate process which migrates the existing data into the new data repository -2. Live system must push any new data to the migration service. Could be easily achieved by sending it to a message broker. +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **must** migrate into a new store and keep the old one until you have validated the new one +* you **should** run migrations in a dedicated process (`Cronus:MigrationsEnabled = true`) +* you **should** make each `IMigration.Apply` deterministic so a re-run produces the same output +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **must not** migrate events in place without a backup; the original is irreplaceable +* you **should not** use a migration where a projection rebuild or a new projection version would do +* you **must not** run migrations in your production host; it competes with live traffic for I/O and for the event stream +{% endhint %} diff --git a/docs/cronus-framework/event-store/migrations/copy-eventstore.md b/docs/cronus-framework/event-store/migrations/copy-eventstore.md index f7437b61..205ccca6 100644 --- a/docs/cronus-framework/event-store/migrations/copy-eventstore.md +++ b/docs/cronus-framework/event-store/migrations/copy-eventstore.md @@ -1,67 +1,109 @@ # Copy EventStore -## **Issue at hand** +`CopyEventStore` is the migration runner you use when the goal is _"take every event from store A, optionally transform it, write it into store B"_. It is the workhorse behind every production migration where the event-store schema or contents need to change. -An issue that came up in the past was that we serialized a huge amount of information in an event. The event contained a structure that in itself had a very innocent-looking property called `TimeZoneInfo`: +## The runner -``` - [DataContract(Namespace = BC.ContextName, Name = "dce741fb-8671-42b8-af59-d30aaae27bad")] - public struct Cycle - { - [DataMember(Order = 1)] - private DateTimeOffset _start; +`CopyEventStore` lives at [`Cronus/src/Elders.Cronus/Migrations/CopyEventStore.cs`](../../../../src/Elders.Cronus/Migrations/CopyEventStore.cs) and extends [`MigrationRunnerBase`](../../../../src/Elders.Cronus/Migrations/MigrationRunnerBase.cs): - [DataMember(Order = 2)] - private DateTimeOffset _end; - - [DataMember(Order = 3)] - private TimeSpan _duration; +```csharp +public class CopyEventStore + : MigrationRunnerBase + where TSourceEventStorePlayer : IEventStorePlayer + where TTargetEventStore : IEventStore +{ + public override async Task RunAsync(IEnumerable> migrations) + { + PlayerOperator @operator = new PlayerOperator() + { + OnLoadAsync = target.AppendAsync + }; - [DataMember(Order = 4)] - private readonly TimeZoneInfo _timezone; + PlayerOptions playerOptions = new PlayerOptions(); + await source.EnumerateEventStore(@operator, playerOptions); } +} ``` -After releasing the software, we noticed that the project was taking up an unusually large amount of space. After checking out a couple of persisted events, we found out that each time we used the struct `Cycle`, we persisted some 6200 lines of serialized json. Of which, the 6000 lines were attributed to the `TimeZoneInfo`.This severely impacted event serialization and deserialization. The issue came up after we had done the following assignment +There are two open generics — the _source player_ and the _target event store_. The player drives enumeration of the old store; the target receives each event. The runner itself is a thin loop: for every `AggregateEventRaw` the player yields, it calls `target.AppendAsync` on the destination. Each migration you pass is expected to have already inspected the raw event and returned the replacement — the runner treats the result as opaque bytes and writes them. + +> The current implementation applies migrations via the `IMigration` pipeline your host wires into the player operator chain. Expect to see a small migrator service that sets this up — see [Migrations](README.md) for the overall topology. +## An end-to-end example + +A migration we ran in production: an event contained a struct `Cycle` with a `TimeZoneInfo` member. Serialising `TimeZoneInfo` produced roughly 6000 lines of JSON per event, because every Windows time-zone record is serialised in full. The fix was to replace the `TimeZoneInfo` with a `string` timezone id. Because the old, bloated shape was already in production, changing only the code was not enough — the persisted bytes had to be rewritten. + +### Step 1 — change the contract + +The struct definition kept its `DataContract` name (it is the identity of the contract, not of the field) but its last member changed: + +```csharp +[DataContract(Namespace = BC.ContextName, Name = "dce741fb-8671-42b8-af59-d30aaae27bad")] +public struct Cycle +{ + [DataMember(Order = 1)] private DateTimeOffset _start; + [DataMember(Order = 2)] private DateTimeOffset _end; + [DataMember(Order = 3)] private TimeSpan _duration; + [DataMember(Order = 4)] private readonly string _timezoneId; +} ``` + +### Step 2 — write the migration + +Implement [`IMigration`](../../../../src/Elders.Cronus/Migrations/IMigration.cs) and mutate the serialized payload so that every occurrence of the old `Cycle` shape is rewritten with the new `_timezoneId`: + +```csharp +public class TimeZoneInfoToIdMigration : IMigration { - ... - _timezone = TZConvert.GetTimeZoneInfo("Central Standard Time"); - ... + public bool ShouldApply(AggregateEventRaw current) + { + // Cheap test on the raw bytes: only apply when the old shape marker is present. + // Keep this fast — it runs for every event in the source store. + return BytesContain(current.Data, OldCycleMarker); + } + + public AggregateEventRaw Apply(AggregateEventRaw current) + { + byte[] rewritten = RewriteCyclePayload(current.Data); + return new AggregateEventRaw( + current.AggregateRootId, + rewritten, + current.Revision, + current.Position, + current.Timestamp); + } } ``` -## **Decision** +### Step 3 — stand up a migrator host -We decided that in order to lower the amount of data, we needed to migrate the event store while keeping up a live version of the old one, to avoid downtime. +The migrator runs in its own process with `Cronus:MigrationsEnabled = true` (see [Configuration](../../configuration.md#cronus-migrationsenabled)). Two things happen concurrently: -## **Migration Challenges** +1. The migrator subscribes to the live event stream of the old service. Every incoming `AggregateCommit` goes through [`CronusMigrator.MigrateAsync`](../../../../src/Elders.Cronus/Migrations/CronusMigrator.cs), which applies each `IMigration` in order and then writes the result into the new store. This keeps the new store consistent while the backfill runs. +2. A background task runs `CopyEventStore.RunAsync(new[] { new TimeZoneInfoToIdMigration() })` against the historical data of the old store. The source player enumerates every event; the runner appends each one (rewritten or not) to the target. -In order to avoid having downtime, we decided to create a single deployable service (let's call it Migrator) that subscribed to the same events as the original application service. However, the Migrator would write the events directly in the new event store. Furthermore, the Migrator would be responsible for once it boots, to start copying data over from the old event store while applying the needed changes. In our case, we needed to modify all events that had the `Cycle` in them, and replace the `TimeZoneInfo` with just a `TimeZoneId` which is a simple string. +### Step 4 — validate, cut over, retire -## **How To Do this** +Once both parts have finished — the live-subscription and the historical copy — run [`ValidateEventStore`](../../../../src/Elders.Cronus/Migrations/ValidateEventStore.cs) to confirm the old and new stores agree, flip production traffic over to the new store, and retire the old one. Keep the old store around for at least one billing cycle in case you need to reopen the investigation. -### Changing the structure +## Notes -We changed the structure of the Cycle to this: +* The source and target of `CopyEventStore` are open generics, so you can copy across backends. A typical cross-backend migration is Cassandra → new Cassandra keyspace; you can also move from one store implementation to another (`Cronus.Persistence.MSSQL` → `Cronus.Persistence.Cassandra`). +* `CopyEventStore` is idempotent at the _event_ level only if the target event store treats re-appends as a no-op. In practice Cassandra's primary-key deduplication does the right thing, but if you re-run a copy you should expect the runner to rewrite already-migrated events. +* The pagination token and retries are the responsibility of the source `IEventStorePlayer`; see [EventStore Player](../eventstore-player.md) for the hooks you will wire up. -``` - [DataContract(Namespace = BC.ContextName, Name = "dce741fb-8671-42b8-af59-d30aaae27bad")] - public struct Cycle - { - [DataMember(Order = 1)] - private DateTimeOffset _start; +## Best Practices - [DataMember(Order = 2)] - private DateTimeOffset _end; +{% hint style="success" %} +**You can/should/must...** - [DataMember(Order = 3)] - private TimeSpan _duration; +* you **should** keep `ShouldApply` cheap — it runs for every event in the store +* you **should** produce deterministic output in `Apply`; re-running the migration must produce byte-for-byte identical results +{% endhint %} - [DataMember(Order = 4)] - private readonly string _timezoneId; - } -``` +{% hint style="warning" %} +**You should not...** -### Creating the project +* you **must not** run `CopyEventStore` against your production host; stand up a dedicated migrator +* you **must not** rewrite the `DataContract` name of an event during a copy — keep the contract id, change the payload +{% endhint %} diff --git a/docs/cronus-framework/extensibility/README.md b/docs/cronus-framework/extensibility/README.md new file mode 100644 index 00000000..d1448172 --- /dev/null +++ b/docs/cronus-framework/extensibility/README.md @@ -0,0 +1,27 @@ +# Extensibility + +Cronus is built on a reflection-based extensibility seam called **discoveries**. Every satellite package — `Cronus.Transport.RabbitMQ`, `Cronus.Persistence.Cassandra`, `Cronus.Projections.Cassandra`, and so on — ships one or more discovery classes. When `services.AddCronus(configuration)` is called at boot, Cronus scans the loaded assemblies, invokes every discovery it finds, and applies the resulting `ServiceDescriptor` entries to the host's DI container. Your own code can ship discoveries too; that is how you inject or replace services in a way that composes cleanly with the framework defaults. + +On top of discoveries sit three more extensibility points, covered in the pages below: + +* **`[CronusStartup]`** — a way to schedule `ICronusStartup` code to run at a particular phase of the boot sequence. +* **Fault handling** — the retry policies Cronus ships with, and how to replace them. +* **Observability** — the `DiagnosticListener`, `ActivitySource` and heartbeat subsystem Cronus wires up by default, and how to bolt on your own exporter. + +## In this section + +{% content-ref url="discoveries.md" %} +[discoveries.md](discoveries.md) +{% endcontent-ref %} + +{% content-ref url="startup-attribute.md" %} +[startup-attribute.md](startup-attribute.md) +{% endcontent-ref %} + +{% content-ref url="fault-handling.md" %} +[fault-handling.md](fault-handling.md) +{% endcontent-ref %} + +{% content-ref url="observability.md" %} +[observability.md](observability.md) +{% endcontent-ref %} diff --git a/docs/cronus-framework/extensibility/discoveries.md b/docs/cronus-framework/extensibility/discoveries.md new file mode 100644 index 00000000..0c8040ee --- /dev/null +++ b/docs/cronus-framework/extensibility/discoveries.md @@ -0,0 +1,169 @@ +# Discoveries + +A discovery is a class that Cronus runs at boot to decide which services should be added to the host's `IServiceCollection`. Every satellite package ships one or more discoveries; your own code can ship them too. This is how Cronus is assembled out of a shared core plus pluggable satellites without any of them having a compile-time reference to the host's `Program.cs`. + +## How the pieces fit together + +The moving parts live under [`Elders.Cronus.Discoveries`](https://github.com/Elders/Cronus/tree/master/src/Elders.Cronus/Discoveries). They are, in rough dependency order: + +* [`DiscoveryBase`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Discoveries/DiscoveryBase.cs) — abstract base that implements `IDiscovery` and defines the one method you override, `DiscoverFromAssemblies(DiscoveryContext)`. Only the generic form exists; there is no non-generic `DiscoveryBase`. +* [`IDiscovery`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Discoveries/IDiscovery.cs) — exposes a `Name` and a `Discover(DiscoveryContext)` method that returns `IDiscoveryResult`. +* [`DiscoveryContext`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Discoveries/DiscoveryContext.cs) — the read-only state the scanner hands to every discovery. It holds `Assemblies` (the set of assemblies Cronus has loaded), `Configuration` (the `IConfiguration` passed to `AddCronus`), a derived `Types` enumeration, and the helper `FindService()` which scans the assemblies for concrete implementations of `TService`. **It does not hold `IServiceCollection`** — that is the job of `CronusServicesProvider`. +* [`IDiscoveryResult` and `DiscoveryResult`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Discoveries/IDiscoveryResult.cs) — the result a discovery returns. It wraps an `IEnumerable` plus an optional `Action` for registrations that don't fit the `DiscoveredModel` shape (typically `AddOptions` calls). Both are generic only; there is no non-generic form. +* [`DiscoveredModel`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Discoveries/IDiscoveryResult.cs) — extends the standard `Microsoft.Extensions.DependencyInjection.ServiceDescriptor` with two flags: `CanOverrideDefaults` (use `Services.Replace`) and `CanAddMultiple` (use `Services.Add`). If neither flag is set the model is applied with `Services.TryAdd`, i.e. it registers the service only if nothing else has claimed it. The class is non-generic; there is no `DiscoveredModel`. +* [`DiscoveryScanner`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Discoveries/DiscoveryScanner.cs) — the reflection-based invoker. Its only public member is `IEnumerable> Scan(DiscoveryContext context)`, which returns one result per discovery type it finds. +* [`CronusServicesProvider`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Discoveries/CronusServicesProvider.cs) — the object that applies a discovery result to the `IServiceCollection`. It calls the `AddServices` action first, then walks `Models` and dispatches each one to `Services.Add`, `Services.Replace`, or `Services.TryAdd` depending on the two flags above. + +## What the scanner actually does + +`DiscoveryScanner.Scan` looks like this: + +```csharp +public IEnumerable> Scan(DiscoveryContext context) +{ + IEnumerable allTypes = context.Assemblies + .SelectMany(asm => asm + .GetLoadableTypes() + .Where(type => type.IsAbstract == false && type.IsClass && typeof(IDiscovery).IsAssignableFrom(type))); + + IEnumerable> discoveries = allTypes + .Where(candidate => allTypes.Where(t => t.BaseType == candidate).Any() == false) // filter out discoveries which inherit from each other. We remove the base discoveries + .Select(dt => (IDiscovery)FastActivator.CreateInstance(dt)); + + foreach (var discovery in discoveries) + { + logger.LogInformation("Discovered {name}.", discovery.Name); + + yield return discovery.Discover(context); + } +} +``` + +Two details are worth calling out: + +1. The filter `t.BaseType == candidate` removes any discovery that has a subclass elsewhere in the assemblies. That is how the `EventStoreDiscovery` in Cronus core is replaced by `CassandraEventStoreDiscovery` in the satellite — only the subclass runs. +2. Discoveries are instantiated by `FastActivator.CreateInstance` with no constructor parameters. A discovery class must have a public parameterless constructor. + +## A canonical publisher discovery + +Here is the RabbitMQ publisher discovery in full. It registers the three publisher lanes Cronus has (generic `IPublisher<>`, public-event publisher, signal publisher) and marks each as overriding the in-memory defaults: + +```csharp +public class RabbitMqPublisherDiscovery : DiscoveryBase> +{ + protected override DiscoveryResult> DiscoverFromAssemblies(DiscoveryContext context) + { + return new DiscoveryResult>(GetModels(), services => services + .AddOptions() + .AddOptions()); + } + + IEnumerable GetModels() + { + yield return new DiscoveredModel(typeof(BoundedContextRabbitMqNamer), typeof(BoundedContextRabbitMqNamer), ServiceLifetime.Singleton); + yield return new DiscoveredModel(typeof(PublicMessagesRabbitMqNamer), typeof(PublicMessagesRabbitMqNamer), ServiceLifetime.Singleton); + yield return new DiscoveredModel(typeof(SignalMessagesRabbitMqNamer), typeof(SignalMessagesRabbitMqNamer), ServiceLifetime.Singleton); + + yield return new DiscoveredModel(typeof(PrivateRabbitMqPublisher<>), typeof(PrivateRabbitMqPublisher<>), ServiceLifetime.Singleton); + yield return new DiscoveredModel(typeof(PublicRabbitMqPublisher), typeof(PublicRabbitMqPublisher), ServiceLifetime.Singleton); + yield return new DiscoveredModel(typeof(SignalRabbitMqPublisher), typeof(SignalRabbitMqPublisher), ServiceLifetime.Singleton); + + var publisherModel = new DiscoveredModel(typeof(IPublisher<>), typeof(PrivateRabbitMqPublisher<>), ServiceLifetime.Singleton); + publisherModel.CanOverrideDefaults = true; + yield return publisherModel; + + var publicPublisherModel = new DiscoveredModel(typeof(IPublisher), typeof(PublicRabbitMqPublisher), ServiceLifetime.Singleton); + publicPublisherModel.CanOverrideDefaults = true; + yield return publicPublisherModel; + + var signalPublisherModel = new DiscoveredModel(typeof(IPublisher), typeof(SignalRabbitMqPublisher), ServiceLifetime.Singleton); + signalPublisherModel.CanOverrideDefaults = true; + yield return signalPublisherModel; + + yield return new DiscoveredModel(typeof(RabbitMqInfrastructure), typeof(RabbitMqInfrastructure), ServiceLifetime.Singleton); + + yield return new DiscoveredModel(typeof(ConnectionResolver), typeof(ConnectionResolver), ServiceLifetime.Singleton); + yield return new DiscoveredModel(typeof(PublisherChannelResolver), typeof(PublisherChannelResolver), ServiceLifetime.Singleton); + } +} +``` + +Source: [`Cronus.Transport.RabbitMQ/src/Elders.Cronus.Transport.RabbitMQ/Publisher/RabbitMqPublisherDiscovery.cs`](https://github.com/Elders/Cronus.Transport.RabbitMQ/blob/master/src/Elders.Cronus.Transport.RabbitMQ/Publisher/RabbitMqPublisherDiscovery.cs). + +What to notice: + +* The type parameter on `DiscoveryBase>` is the *conceptual service* the discovery owns — it is informational, not a constraint. The scanner does not act on it. +* `AddServices` is where the `AddOptions` calls live, because `AddOptions` does not fit the `DiscoveredModel` shape (it registers three services — `IConfigureOptions`, `IOptionsChangeTokenSource`, `IOptionsFactory`). +* The publisher models set `CanOverrideDefaults = true` because they are meant to replace the in-memory publishers registered by `InMemoryPublisherDiscovery`. + +## Extending an existing discovery + +A discovery can inherit from another discovery and extend its model set. The Cassandra event store does exactly that: + +```csharp +public class CassandraEventStoreDiscovery : EventStoreDiscovery +{ + protected override DiscoveryResult DiscoverFromAssemblies(DiscoveryContext context) + { + IEnumerable models = base.DiscoverFromAssemblies(context).Models + .Concat(GetModels(context)) + .Concat(DiscoverCassandraTableNameStrategy(context)); + + return new DiscoveryResult(models, services => services.AddOptions()); + } + + IEnumerable GetModels(DiscoveryContext context) + { + yield return new DiscoveredModel(typeof(IEventStore<>), typeof(CassandraEventStore<>), ServiceLifetime.Transient) { CanOverrideDefaults = true }; + yield return new DiscoveredModel(typeof(IEventStore), typeof(CassandraEventStore), ServiceLifetime.Transient) { CanOverrideDefaults = true }; + // ... + } +} +``` + +Source: [`Cronus.Persistence.Cassandra/src/Elders.Cronus.Persistence.Cassandra/CassandraEventStoreDiscovery.cs`](https://github.com/Elders/Cronus.Persistence.Cassandra/blob/master/src/Elders.Cronus.Persistence.Cassandra/CassandraEventStoreDiscovery.cs). + +Because `CassandraEventStoreDiscovery` inherits from `EventStoreDiscovery`, the scanner's "filter out base discoveries" rule kicks in: `EventStoreDiscovery` will not run on its own — only the subclass runs — which is exactly what you want. + +## Which assemblies are scanned + +The assemblies fed to every `DiscoveryContext` come from [`AssemblyLoader`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/AssemblyLoader.cs). Its static constructor runs once, on first access, and does the following: + +1. Takes `Assembly.GetEntryAssembly().Location` and derives the directory the host was launched from. +2. Enumerates every `*.dll` and `*.exe` file in that directory. +3. Skips a hard-coded list of unmanaged runtime files (`sni.dll`, `coreclr.dll`, `clrjit.dll`, and so on) and anything whose file name contains one of the wildcards `microsoft`, `api-ms`, `sos_`, `mscordaccore`, `mscor`. +4. For every remaining file, checks if the assembly is already loaded in `AppDomain.CurrentDomain.GetAssemblies()`; if not, loads it with `AssemblyLoadContext.Default.LoadFromAssemblyPath`. +5. Adds the loaded assemblies to the static `Assemblies` dictionary keyed by full name. + +`DiscoveryContext.Assemblies` is this dictionary. If the directory the host runs from does not contain your satellite's DLL, the discovery will not run. This is why the recommended hosting pattern is to deploy every satellite package into the host process's output directory. + +## Writing your own discovery + +A discovery is the right extension point whenever you want to add, replace, or reconfigure services that Cronus will consume at runtime — a custom publisher, a custom event store, a custom handler factory, and so on. The checklist: + +1. Create a class that derives from `DiscoveryBase`, where `TCronusService` is the conceptual service the discovery owns. It is informational; pick whatever type documents intent. +2. Give the class a public parameterless constructor so `FastActivator.CreateInstance` can build it. +3. Override `DiscoverFromAssemblies(DiscoveryContext context)` and return a `DiscoveryResult` containing: + * An `IEnumerable` — one entry per service you want to register. + * An optional `Action` for registrations that do not fit the `DiscoveredModel` shape (typically `AddOptions`). +4. On each `DiscoveredModel`, set `CanOverrideDefaults = true` if you want to replace a default registration, or `CanAddMultiple = true` if you want the service to be registered alongside others. Leave both false if you only want to fill in when nothing else has. +5. Make sure the assembly containing your discovery class is deployed into the host process's directory so `AssemblyLoader` picks it up. + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* a discovery **can** add new services, replace defaults, or register multiple implementations +* a discovery **can** inherit from another discovery to extend its model set — the scanner then runs only the subclass +* a discovery **must** have a public parameterless constructor +* a discovery **must** be deterministic — every run on the same assemblies and configuration produces the same registrations +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* a discovery **should not** read external resources (databases, HTTP endpoints) — it runs during `AddCronus`, well before the process is ready for that +* a discovery **should not** resolve services from the container it is about to mutate +* a discovery **should not** assume an execution order relative to other discoveries; use `CanOverrideDefaults` if the final wiring depends on overriding something +{% endhint %} diff --git a/docs/cronus-framework/extensibility/fault-handling.md b/docs/cronus-framework/extensibility/fault-handling.md new file mode 100644 index 00000000..b6c34d98 --- /dev/null +++ b/docs/cronus-framework/extensibility/fault-handling.md @@ -0,0 +1,152 @@ +# Fault handling + +Cronus retries on its own whenever a transient failure is worth one more attempt. There are two separate retry stacks in the codebase — knowing which one kicks in for your scenario is half of the useful knowledge on this page. + +## The two stacks + +### 1. Publisher retries — `RetryableOperation` + +Every publisher that derives from [`Publisher`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Publisher.cs) wraps its `Publish` call in a retry loop built from [`RetryableOperation`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Userfull/RetryableOperation.cs): + +```csharp +public abstract class Publisher : PublisherBase where TMessage : IMessage +{ + private RetryPolicy retryPolicy; + + public Publisher(IEnumerable handlers) : base(handlers) + { + retryPolicy = new RetryPolicy(RetryableOperation.RetryPolicyFactory.CreateLinearRetryPolicy(5, TimeSpan.FromMilliseconds(300))); + } + + public override bool Publish(TMessage message, Dictionary messageHeaders) + { + bool isPublished = RetryableOperation.TryExecute(() => base.Publish(message, messageHeaders), retryPolicy); + + return isPublished; + } +} +``` + +The default is **5 attempts with a fixed 300 ms delay** between them. The `RetryableOperation.RetryPolicyFactory` ships three delegate factories: `CreateLinearRetryPolicy`, `CreateExponentialRetryPolicy`, and `CreateInfiniteLinearRetryPolicy`. The `RetryPolicy` type this loop uses is `public delegate ShouldRetry RetryPolicy()` — a delegate, not a class. It returns a `bool` from `Publish`; there is no exception thrown when all retries are exhausted, the method simply returns `false`. + +> **Note on the synchronous overload.** The snippet above is the in-repo `Publisher` base, which exposes a synchronous `Publish(...)` for the framework's own retry loop. The user-facing `IPublisher` contract that ships in the [`Cronus.DomainModeling`](https://github.com/Elders/Cronus.DomainModeling) NuGet (11.0.x) is **async-only** — the methods you call from your code are `PublishAsync(...)`. The synchronous form is a framework-internal extension point, not part of the published surface. + +### 2. Subscriber retries — `InMemoryRetryWorkflow` + +Subscriber-side retries are implemented by [`InMemoryRetryWorkflow`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/FaultHandling/InMemoryRetryWorkflow.cs), which wraps an inner workflow in a class-based `RetryPolicy` from [`Elders.Cronus.FaultHandling`](https://github.com/Elders/Cronus/tree/master/src/Elders.Cronus/FaultHandling): + +```csharp +public class InMemoryRetryWorkflow : Workflow where TContext : class +{ + private RetryPolicy retryPolicy; + + readonly Workflow workflow; + + public InMemoryRetryWorkflow(Workflow workflow, ILogger logger) + { + this.workflow = workflow; + var retryStrategy = new Incremental(5, TimeSpan.FromMilliseconds(250), TimeSpan.FromMilliseconds(500));//Total 3 etries + retryPolicy = new RetryPolicy(new TransientErrorCatchAllStrategy(), retryStrategy, logger); + } + + protected override async Task RunAsync(Execution execution) + { + if (execution is null) throw new ArgumentNullException(nameof(execution)); + + await retryPolicy.ExecuteActionAsync(() => workflow.RunAsync(execution.Context)); + } +} +``` + +This uses the richer [`RetryPolicy`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/FaultHandling/RetryPolicy.cs) class combined with an [`ITransientErrorDetectionStrategy`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/FaultHandling/ITransientErrorDetectionStrategy.cs) and a [`RetryStrategy`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/FaultHandling/RetryStrategy.cs). The defaults for `RetryStrategy` are: + +* `DefaultClientRetryCount` — `10` +* `DefaultRetryInterval` — `1 s` +* `DefaultClientBackoff` — `10 s` +* `DefaultMaxBackoff` — `30 s` +* `DefaultMinBackoff` — `1 s` +* `DefaultRetryIncrement` — `1 s` + +The three built-in strategies live in [`FaultHandling/Strategies`](https://github.com/Elders/Cronus/tree/master/src/Elders.Cronus/FaultHandling/Strategies): + +* `FixedInterval` — constant delay between retries +* `Incremental` — linearly increasing delay +* `ExponentialBackoff` — randomised exponential delay + +and the two built-in transient-error detection strategies are `TransientErrorCatchAllStrategy` (everything is transient) and `TransientErrorIgnoreStrategy` (nothing is transient — never retries). + +## Replacing the defaults + +Both stacks can be replaced; the mechanism is the same as for any other Cronus default — register your replacement with a discovery marked `CanOverrideDefaults = true`. + +For the subscriber workflow, swap the concrete `InMemoryRetryWorkflow<>` registration for your own: + +```csharp +public class CustomRetryWorkflowDiscovery : DiscoveryBase> +{ + protected override DiscoveryResult> DiscoverFromAssemblies(DiscoveryContext context) + { + var model = new DiscoveredModel( + typeof(InMemoryRetryWorkflow), + typeof(CustomRetryWorkflow), + ServiceLifetime.Transient) + { + CanOverrideDefaults = true + }; + + return new DiscoveryResult>(new[] { model }); + } +} +``` + +For the publisher retry loop, either subclass `Publisher` with your own retry configuration or, in a transport, replace the publisher type altogether with a discovery — the RabbitMQ and CosmosDb satellites both do this. + +## Example — a custom retry strategy + +A retry strategy that only treats network-level exceptions as transient and backs off exponentially: + +```csharp +public sealed class NetworkOnlyTransientStrategy : ITransientErrorDetectionStrategy +{ + public bool IsTransient(Exception ex) + => ex is HttpRequestException or SocketException or TimeoutException; +} + +public static class CustomRetryPolicies +{ + public static RetryPolicy NetworkExponential(ILogger logger) + { + var strategy = new ExponentialBackoff( + retryCount: 5, + minBackoff: TimeSpan.FromMilliseconds(250), + maxBackoff: TimeSpan.FromSeconds(10), + deltaBackoff: TimeSpan.FromMilliseconds(500)); + + return new RetryPolicy(new NetworkOnlyTransientStrategy(), strategy, logger); + } +} +``` + +Pass the resulting `RetryPolicy` into your replacement workflow. The built-in `InMemoryRetryWorkflow` hard-codes `TransientErrorCatchAllStrategy` plus `Incremental` — if you want anything else, you replace the whole workflow. + +## Circuit breaking with Cronus.Hystrix + +For more advanced fault tolerance — circuit breakers, bulkheads, fallbacks — there is a legacy satellite called [`Cronus.Hystrix`](https://github.com/Elders/Cronus.Hystrix). It predates the current extensibility model and is documented separately; mention it here only because some older services still depend on it. + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **can** replace the retry workflow via a discovery marked `CanOverrideDefaults = true` +* you **should** pair a custom `RetryStrategy` with an `ITransientErrorDetectionStrategy` that tells transient from permanent errors; retrying a validation error wastes time +* you **should** log at each retry; the built-in `RetryPolicy` does this through the `ILogger` you inject +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **should not** retry forever in a subscriber — unbounded retries block the whole queue. `CreateInfiniteLinearRetryPolicy` exists but is appropriate only for the publisher loop, not for handlers +* you **should not** catch and swallow exceptions inside a handler just to avoid the retry — the subscriber cannot tell success from failure if you do +* you **should not** mix `Elders.Cronus.FaultHandling.RetryPolicy` (the class) with `Elders.Cronus.RetryPolicy` (the delegate) — they live in different namespaces on purpose +{% endhint %} diff --git a/docs/cronus-framework/extensibility/observability.md b/docs/cronus-framework/extensibility/observability.md new file mode 100644 index 00000000..aec2c996 --- /dev/null +++ b/docs/cronus-framework/extensibility/observability.md @@ -0,0 +1,169 @@ +# Observability + +Cronus ships with three built-in pieces you can tap into: + +1. A `DiagnosticListener` and an `ActivitySource` for distributed tracing. +2. Structured log scopes carrying the tenant, aggregate id, message id and handler name. +3. A heartbeat signal emitted on a configurable interval so monitors can tell the host is alive. + +Everything below is wired up for you the moment you call `services.AddCronus(configuration)`. + +## `DiagnosticListener` and `ActivitySource` + +`AddCronus` calls [`AddOpenTelemetry`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/CronusServiceCollectionExtensions.cs) which registers the two low-level primitives that every distributed-tracing stack on .NET builds on: + +```csharp +internal static IServiceCollection AddOpenTelemetry(this IServiceCollection services) +{ + // https://github.com/dotnet/aspnetcore/blob/f3f9a1cdbcd06b298035b523732b9f45b1408461/src/Hosting/Hosting/src/WebHostBuilder.cs#L334 + // By default aspnet core registers a DiagnosticListener and if we add our own you will loose the http insights + // However, for worker services we need to register our own Listener. + if (services.Any(x => x.ServiceType == typeof(DiagnosticListener)) == false) + { + services.AddSingleton(new DiagnosticListener("cronus")); + + services.AddSingleton(new ActivitySource("Elders.Cronus", "11.0.0")); + } + + return services; +} +``` + +Two singletons are added — but only if no `DiagnosticListener` has been registered yet. On an ASP.NET Core host, ASP.NET Core registers its own listener first; Cronus piggy-backs on it. On a worker-service host, Cronus registers a new one named `cronus`. The `ActivitySource` is always named `Elders.Cronus` with version `11.0.0`. + +### Activities emitted by Cronus + +Two places start and stop `Activity` objects: + +* **Publish path** — [`ActivityPublishHandler`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/PublisherBase.cs) in the publisher pipeline. It starts an activity named `Publish {messageTypeName}` for every outgoing message, propagates `telemetry_traceparent` through the message headers, and writes the completed activity to the diagnostic listener under the name `Elders.Cronus.Hosting.Workflow`. +* **Handle path** — [`DiagnosticsWorkflow`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Workflow/DiagnosticsWorkflow.cs) in every subscriber. It starts an activity named `{HandlerType}__{MessageType}` for every handler invocation, reads the incoming `telemetry_traceparent` so the span joins the upstream trace, and writes to the listener under the same activity name `Elders.Cronus.Hosting.Workflow`. + +Every activity carries at least one tag: + +* `cronus_messageId` — the ID of the `CronusMessage` the activity tracks. + +### Log scopes + +`DiagnosticsWorkflow` also enriches every log scope with: + +* `cronus_tenant` — the tenant the message belongs to, via `Message.GetTenant()`. +* `cronus_arid` — the aggregate root id, when the message payload exposes one. + +These constants are defined in [`Log`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Userfull/CronusLogger.cs): + +```csharp +public static class Log +{ + public const string Tenant = "cronus_tenant"; + public const string AggregateId = "cronus_arid"; + public const string AggregateName = "cronus_arname"; + + public const string MessageId = "cronus_messageId"; + public const string MessageData = "cronus_messageData"; + public const string MessageType = "cronus_messageType"; + + public const string MessageHandler = "cronus_messageHandler"; + + // ... plus job and projection keys +} +``` + +Because these are plain string constants used as log-scope keys and activity-tag keys, they flow through any structured logger and any OpenTelemetry exporter with no further configuration on Cronus's side. Your tenant and bounded-context dimensions are already there; all you have to do is scrape them. + +## Subscribing your own listener + +The built-in `DiagnosticListener` is named `cronus`. To observe everything Cronus emits, subscribe to it from your host: + +```csharp +DiagnosticListener.AllListeners.Subscribe(new AllListenersObserver()); + +class AllListenersObserver : IObserver +{ + public void OnCompleted() { } + public void OnError(Exception error) { } + + public void OnNext(DiagnosticListener listener) + { + if (listener.Name == "cronus") + { + listener.Subscribe(new CronusEventsObserver()); + } + } +} + +class CronusEventsObserver : IObserver> +{ + public void OnCompleted() { } + public void OnError(Exception error) { } + + public void OnNext(KeyValuePair kv) + { + if (kv.Key == "Elders.Cronus.Hosting.Workflow" && kv.Value is Activity activity) + { + // record duration, tags, parent span id, etc. + } + } +} +``` + +For most production setups you will not write this yourself — you will let the OpenTelemetry SDK subscribe for you via `AddSource("Elders.Cronus")`. + +## Turning on an OTLP exporter + +A minimal OTLP wiring in a host that already calls `services.AddCronus(...)`: + +```csharp +using OpenTelemetry.Resources; +using OpenTelemetry.Trace; + +services.AddOpenTelemetry() + .ConfigureResource(r => r.AddService("my-bounded-context")) + .WithTracing(tracing => tracing + .AddSource("Elders.Cronus") + .AddOtlpExporter(otlp => + { + otlp.Endpoint = new Uri("http://otel-collector:4317"); + })); +``` + +`AddSource("Elders.Cronus")` is the one line that matters — it tells the OpenTelemetry SDK to listen to the `ActivitySource` Cronus creates. Every handle and publish activity then goes out over OTLP, with the `cronus_messageId`, `cronus_tenant` and `cronus_arid` tags Cronus set. + +## Heartbeat + +Cronus also registers a background [heartbeat service](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/Heartbeat/CronusHeartbeatService.cs) that publishes an [`HeartbeatSignal`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/Heartbeat/HeartbeatSignal.cs) on a timer, carrying the current bounded-context name and the full tenant list. This is useful as a liveness probe from outside the host: downstream services that consume signals will either see the heartbeat on schedule, or infer that the host is dead. + +```csharp +internal static IServiceCollection AddCronusHeartbeat(this IServiceCollection services) +{ + services.AddOptions(); + services.AddSingleton(); + services.AddHostedService(); + + return services; +} +``` + +The interval is configured by `Cronus:Heartbeat:IntervalInSeconds` — an unsigned integer between `5` and `3600` seconds, defaulting to `5`. Validated by a `[Range]` attribute on [`HeartbeatOptions`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/Heartbeat/HeartbeatOptions.cs). + +{% content-ref url="../configuration.md" %} +[configuration.md](../configuration.md) +{% endcontent-ref %} + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **can** subscribe to the `cronus` `DiagnosticListener` for custom telemetry sinks +* you **should** call `AddSource("Elders.Cronus")` in your OpenTelemetry setup; it is the one-line integration +* you **should** keep the heartbeat interval short (`5`–`30` seconds) for production services and long (up to `3600`) for batch hosts +* you **must** enrich your log sinks to print `cronus_tenant` and `cronus_arid`; they are the two most useful dimensions in a multitenant event-sourced system +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **should not** register a second `DiagnosticListener` with the name `cronus`; there is already one +* you **should not** filter the heartbeat signal out of every exporter — it is often the only signal you have that a worker is alive +* you **should not** add extra tags by mutating `Activity.Current` inside a handler without checking whether it is null; the activity only exists when a listener is enabled +{% endhint %} diff --git a/docs/cronus-framework/extensibility/startup-attribute.md b/docs/cronus-framework/extensibility/startup-attribute.md new file mode 100644 index 00000000..14ffdc29 --- /dev/null +++ b/docs/cronus-framework/extensibility/startup-attribute.md @@ -0,0 +1,121 @@ +# `[CronusStartup]` and boot phases + +Once `AddCronus` has run — and every discovery has registered its services — Cronus still has to invoke one-time startup code (create Cassandra keyspaces, provision RabbitMQ exchanges, register event-store indices for each tenant, and so on). That one-time code lives in classes that implement [`ICronusStartup`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/ICronusStartup.cs): + +```csharp +public interface ICronusStartup +{ + void Bootstrap(); +} + +public interface ICronusTenantStartup +{ + void Bootstrap(); +} +``` + +`ICronusStartup` runs **once per host**. `ICronusTenantStartup` runs **once per tenant per host** — Cronus creates a scoped `ICronusContext` for each tenant before calling it. + +Startup classes are discovered by [`CronusStartupScanner`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/StartupScanner.cs) during `CronusBooter.BootstrapCronus()`. The scanner finds every concrete class that implements the interface, orders them by the phase declared on `[CronusStartup]`, resolves each from the service provider, and calls `Bootstrap()`. + +## The `Bootstraps` enum + +The phase is declared with [`CronusStartupAttribute`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/CronusStartupAttribute.cs): + +```csharp +[AttributeUsage(AttributeTargets.Class, Inherited = false, AllowMultiple = false)] +public class CronusStartupAttribute : Attribute +{ + public CronusStartupAttribute() : this(Bootstraps.Runtime) { } + + public CronusStartupAttribute(Bootstraps bootstraps) + { + Bootstraps = bootstraps; + } + + public Bootstraps Bootstraps { get; } +} +``` + +The enum is small and stable. Each value is literally the numeric rank used by the scanner's `OrderBy` — smaller runs earlier. + +| Value | Integer | Intent | +| --- | ---: | --- | +| `Environment` | `0` | Prepare the environment for Cronus (set process-wide switches, configure loggers) | +| `ExternalResource` | `10` | Provision external resources such as a database keyspace or a message broker exchange | +| `Configuration` | `20` | Finalise configuration and options | +| `Aggregates` | `30` | One-time work for aggregates | +| `Ports` | `40` | One-time work for ports | +| `Sagas` | `50` | One-time work for sagas | +| `EventStoreIndices` | `55` | Register per-tenant event-store indices (see [`EventStoreIndicesStartup`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/EventStoreIndicesStartup.cs)) | +| `Projections` | `60` | One-time work for projections | +| `Gateways` | `70` | One-time work for gateways | +| `Runtime` | `1000` | Anything else; this is the default when the attribute is omitted | + +The values with a `0`, `10`, `20` and `1000` ordering leave room for satellites to slot in between phases without ever colliding with the framework's own numbers. That is deliberate — you can use `(Bootstraps)15` if you really need to run between `ExternalResource` and `Configuration`, because the enum is just an `int` under the hood. + +## Not to be confused with discoveries + +The `[CronusStartup]` attribute is only for `ICronusStartup` and `ICronusTenantStartup` implementations. **It does not affect discoveries.** Discoveries are found by `DiscoveryScanner`, not by `CronusStartupScanner`, and the scanner does not look at this attribute at all. A discovery runs when `AddCronus` is called; a startup runs later, when `CronusBooter.BootstrapCronus()` is called. Do not put `[CronusStartup(Bootstraps.X)]` on a discovery expecting the phase to apply — it will be silently ignored. + +If you need a discovery to run in a particular order relative to others, the right tool is the `CanOverrideDefaults` flag on `DiscoveredModel`, not a bootstrap phase. + +## Writing a custom startup + +Example — a startup that provisions a third-party resource during the `ExternalResource` phase: + +```csharp +[CronusStartup(Bootstraps.ExternalResource)] +public class SearchIndexStartup : ICronusStartup +{ + private readonly ISearchIndexProvisioner provisioner; + private readonly ILogger logger; + + public SearchIndexStartup(ISearchIndexProvisioner provisioner, ILogger logger) + { + this.provisioner = provisioner; + this.logger = logger; + } + + public void Bootstrap() + { + logger.LogInformation("Provisioning search index..."); + provisioner.EnsureIndexExists(); + } +} +``` + +For the class to be resolvable, it must be registered in the container. The idiomatic way is to ship it as part of a discovery in the same assembly: + +```csharp +public class SearchIndexDiscovery : DiscoveryBase +{ + protected override DiscoveryResult DiscoverFromAssemblies(DiscoveryContext context) + { + return new DiscoveryResult(new[] + { + new DiscoveredModel(typeof(SearchIndexStartup), typeof(SearchIndexStartup), ServiceLifetime.Singleton), + new DiscoveredModel(typeof(ISearchIndexProvisioner), typeof(SearchIndexProvisioner), ServiceLifetime.Singleton) + }); + } +} +``` + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* an `ICronusStartup` **can** provision external resources, register indices, warm caches +* an `ICronusStartup` **must** be safe to run repeatedly — the host may restart +* an `ICronusTenantStartup` **must** be safe to run per tenant, repeatedly +* you **should** pick the smallest phase number that still satisfies your ordering requirements +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **should not** put `[CronusStartup]` on a discovery — it does nothing there +* you **should not** rely on startup phases for fine-grained ordering within a phase; classes with the same rank run in an unspecified order +* you **should not** do heavy, long-running work in a startup — startups block the boot sequence +{% endhint %} diff --git a/docs/cronus-framework/indices.md b/docs/cronus-framework/indices.md index 79ebe4e3..66f45afd 100644 --- a/docs/cronus-framework/indices.md +++ b/docs/cronus-framework/indices.md @@ -1,4 +1,84 @@ # Indices -[https://github.com/Elders/Cronus/issues/267](https://github.com/Elders/Cronus/issues/267) +The event store is partitioned by aggregate id, which is exactly what you want for the write side — loading every commit of an aggregate by id is a single hot read. It is not what you want for read queries that are shaped differently ("every event of type X", "every commit whose payload references aggregate Y"). Cronus maintains a set of _secondary indices_ to answer those queries without scanning the whole store. +The index subsystem lives under [`Cronus/src/Elders.Cronus/EventStore/Index/`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/). Three indices ship with the framework. + +## What is an index + +An `IEventStoreIndex` is a write-time hook: every time a message is processed, Cronus dispatches it through the list of registered indices and each index records what it needs to. + +```csharp +public interface IEventStoreIndex : IMessageHandler +{ + Task IndexAsync(CronusMessage message); +} + +public interface ICronusEventStoreIndex : IEventStoreIndex, ISystemHandler +{ } +``` + +See [`IEventStoreIndex`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/IEventStoreIndex.cs). Implementations are system handlers — Cronus owns them; you do not write your own indices in application code. + +The payloads each index writes live in an `IIndexStore` (and an `IIndexStatusStore` tracks the lifecycle — `NotPresent` / `Building` / `Present`, expressed by [`IndexStatus`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/IndexStatus.cs)). + +## The shipping indices + +### EventToAggregateRootId + +File: [`EventToAggregateRootId.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/EventToAggregateRootId/EventToAggregateRootId.cs). Contract id `3d59f948-870f-4b12-ada6-9603627aaab6`. + +This index records _"event of type X was written, here is the aggregate root id and the revision"_. It answers the question "give me every event of type X" — which is the primary access pattern of projection rebuilds: a projection interested in events of a given type walks this index to find the aggregates it should load events from. Public events are indexed too, but only when the event originates in the host's own bounded context (so a subscriber is not indexed for a foreign public event passing through). + +### MessageCounterIndex + +File: [`MessageCounterIndex.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/MessageCounterIndex.cs). Contract id `f8c532eb-57ad-469f-9002-6c286bdd88f2`. + +A counter of "how many events of this type does the store contain". Used to display progress of a rebuild (`counter / total`) — the [`ProgressTracker`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Rebuilding/ProgressTracker.cs) reads the counter to compute the total it needs to chew through. The counter is updated through `IMessageCounter.IncrementAsync`. + +### ProjectionIndex + +File: [`ProjectionIndex.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/ProjectionIndex.cs). Contract id `37336a18-573a-4e9e-b4a2-085033b74353`. + +This one bridges events and projections. When a `CronusMessage` arrives, the index looks up every registered projection type, checks whether any of its `IEventHandler` interfaces would accept the message, and — if so — routes the message to `IProjectionWriter.SaveAsync(projectionType, event)`. It is the mechanism behind the live projection updates. + +## Lifecycle + +An index transitions through three states: + +* `NotPresent` — the index has never been built (new tenant, fresh deploy, or it has been dropped). +* `Building` — an index-rebuild job is running. +* `Present` — the index is up to date. + +The state lives in [`EventStoreIndexStatus`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/Handlers/EventStoreIndexStatus.cs) — a system projection whose contract id is `1bcdb806-dbd0-45e7-b781-e3d2fd0589c1`. Its `State.Status` moves from `NotPresent` to `Building` (on `EventStoreIndexRequested`) to `Present` (on `EventStoreIndexIsNowPresent`). + +## Rebuilding + +Rebuilding an index is orchestrated by [`EventStoreIndexBuilder`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/Handlers/EventStoreIndexBuilder.cs) — a system saga that reacts to `EventStoreIndexRequested` and schedules the appropriate rebuild job: + +* [`RebuildIndex_EventToAggregateRootId_Job`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/EventToAggregateRootId/RebuildIndex_EventToAggregateRootId_Job.cs) for the `EventToAggregateRootId` index. +* [`RebuildIndex_MessageCounter_Job`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/RebuildIndex_MessageCounter_Job.cs) for the message-counter index. + +The saga routes between them based on the requested index's contract id. The rebuild jobs reuse the framework-wide [Jobs](jobs.md) machinery, so they survive process restarts, coordinate across cluster nodes and report progress via `IClusterOperations.PingAsync`. + +Each job enumerates the event store through an [`IEventStorePlayer`](event-store/eventstore-player.md), checkpoints its pagination token into `IJobData` and keeps the index status at `Building` until the last page has been read — at which point the saga finalises the request and publishes `EventStoreIndexIsNowPresent`, taking the index to `Present`. + +### Triggering a rebuild manually + +The saga reacts to a `RebuildIndexCommand` (see [`RebuildIndexCommand.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Index/Commands/RebuildIndexCommand.cs)). From the administration-side tooling you publish it against the `EventStoreIndexManager` aggregate for the target tenant, wait for the saga to run its course, and monitor `EventStoreIndexStatus` for the state change to `Present`. + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **should** rebuild the `EventToAggregateRootId` index first when bootstrapping a new tenant; projection rebuilds depend on it +* you **should** monitor the `IndexStatus` projection and alert if an index stays in `Building` longer than expected +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **must not** write your own `IEventStoreIndex` — the subsystem is system-owned +* you **should not** trigger an index rebuild during peak hours; it competes with live traffic for I/O +{% endhint %} diff --git a/docs/cronus-framework/jobs.md b/docs/cronus-framework/jobs.md index 81172ba6..d4fbe8b9 100644 --- a/docs/cronus-framework/jobs.md +++ b/docs/cronus-framework/jobs.md @@ -1,4 +1,146 @@ # Jobs -[https://github.com/Elders/Cronus/issues/268](https://github.com/Elders/Cronus/issues/268) +A Cronus _job_ is a long-running, idempotent piece of work coordinated across the cluster. Think "rebuild this projection", "rebuild this index", "replay these public events to a new subscriber" — work that cannot finish inside the timeout of a single message handler and must survive a process restart. Jobs are how those tasks are modelled. +## The subsystem + +Job code lives under [`Cronus/src/Elders.Cronus/Cluster/Job/`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/). The core types are: + +```csharp +public interface ICronusJob : ICronusJobb + where TData : class +{ + public TData Data { get; } + + Task SyncInitialStateAsync(IClusterOperations cluster, CancellationToken cancellationToken = default); + Task RunAsync(IClusterOperations cluster, CancellationToken cancellationToken = default); +} + +public enum JobExecutionStatus +{ + Completed, + Canceled, + Failed, + Running +} +``` + +See [`ICronusJob.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/ICronusJob.cs) and [`JobExecutionStatus.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/JobExecutionStatus.cs). A job pairs a strongly-typed `TData` (its durable state, implementing [`IJobData`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/IJobData.cs)) with a `RunAsync` method that pushes the work forward one increment at a time. + +The entry point is [`ICronusJobRunner`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/ICronusJobRunner.cs): + +```csharp +public interface ICronusJobRunner : IDisposable +{ + Task ExecuteAsync(ICronusJob job, CancellationToken cancellationToken = default); + JobManager JobManager { get; } +} +``` + +Callers build a job (typically via a factory, not by hand), hand it to the runner, and receive a `JobExecutionStatus`. The runner registers the job under its `Name` in the `JobManager` so it can be cancelled by id. + +The default runner is the single-process, no-cluster [`InMemoryCronusJobRunner`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/InMemory/InMemoryCronusJobRunner.cs), which runs against a [`NoClusterOperations`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/InMemory/NoClusterOperations.cs). For the multi-host story the Consul-backed runner in [`Cronus.Cluster.Consul`](https://github.com/Elders/Cronus.Cluster.Consul) coordinates which node runs which job — see [Cluster](cluster/README.md). + +## The base class + +Most jobs do not implement `ICronusJob` directly; they extend [`CronusJob`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/CronusJob.cs): + +```csharp +public abstract class CronusJob : ICronusJob + where TData : class, IJobData, new() +{ + public abstract string Name { get; set; } + public TData Data { get; protected set; } + + protected abstract Task RunJobAsync(IClusterOperations cluster, CancellationToken cancellationToken = default); +} +``` + +The base class handles the two things every job cares about: + +* `SyncInitialStateAsync` — ping the cluster for the last-known `TData` so a new host picks up the work where a previous one stopped. +* `Override(fromCluster, fromLocal)` — merge the cluster state with the local initial data. The default implementation prefers `fromCluster` unless the cluster state is older than the local one _and_ complete. + +From inside `RunJobAsync` the job reads and writes `Data`, and periodically calls `cluster.PingAsync(Data)` to publish its progress. See [`IClusterOperations`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/IClusterOperations.cs) for the contract. + +## Discovery + +Jobs are wired up through [`JobDiscovery`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/JobDiscovery.cs). It walks the assemblies, registers every implementation of `ICronusJob` as a transient service, wires `ICronusJobRunner` to `InMemoryCronusJobRunner` by default, and builds a `TypeContainer>` so the system knows how to resolve them. + +The same discovery also registers factory types for the framework-owned jobs: + +* `RebuildIndex_EventToAggregateRootId_JobFactory`, `RebuildIndex_MessageCounter_JobFactory` (see [Indices](indices.md)) +* `ReplayPublicEvents_JobFactory` (see [EventStore Player](event-store/eventstore-player.md)) +* `RebuildProjection_JobFactory`, `RebuildProjectionSequentially_JobFactory` (see [Projections / Versioning](projections/versioning.md)) + +## When to write a job + +Reach for a job when the work is all of: + +* Idempotent — re-running the same increment against the same `TData` must be safe. +* Too long for a message handler timeout, or requires pagination through large data sets. +* Cluster-coordinated — only one host should advance it at a time. +* Stateful — progress should survive process restarts. + +If the work is short, stateless and tenant-scoped, write a signal handler or a saga instead. + +## A small example + +A sketch of a minimal job — the only interesting thing it does is increment a counter until it reaches some target and ping the cluster each iteration: + +```csharp +public sealed class CountToTenJob : CronusJob +{ + public CountToTenJob(ILogger logger) : base(logger) { } + + public override string Name { get; set; } = "count-to-ten"; + + protected override async Task RunJobAsync(IClusterOperations cluster, CancellationToken cancellationToken = default) + { + while (Data.Counter < 10) + { + Data.Counter++; + Data.Timestamp = DateTimeOffset.UtcNow; + Data = await cluster.PingAsync(Data, cancellationToken).ConfigureAwait(false); + + if (cancellationToken.IsCancellationRequested) + return JobExecutionStatus.Canceled; + } + + Data.IsCompleted = true; + Data = await cluster.PingAsync(Data, cancellationToken).ConfigureAwait(false); + return JobExecutionStatus.Completed; + } +} + +public sealed class CountToTenJobData : IJobData +{ + public bool IsCompleted { get; set; } + public DateTimeOffset Timestamp { get; set; } = DateTimeOffset.UtcNow; + public int Counter { get; set; } +} +``` + +You execute it through the runner — never by calling `RunAsync` directly, because the runner is what registers the cancellation and tracks the job name: + +```csharp +var status = await jobRunner.ExecuteAsync(countToTenJob); +``` + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **must** make `RunJobAsync` idempotent; the cluster will re-invoke it after failures +* you **should** ping the cluster after every meaningful state change so another host can resume the work +* you **should** check `cancellationToken` between units of work; the runner cancels jobs by name +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **must not** use a job where a signal handler or saga would do +* you **should not** store large blobs inside `IJobData`; the cluster is not a document store +* you **should not** share mutable state between `RunJobAsync` invocations; rely on `Data` and `PingAsync` +{% endhint %} diff --git a/docs/cronus-framework/messaging/README.md b/docs/cronus-framework/messaging/README.md index ef08e688..09a32e41 100644 --- a/docs/cronus-framework/messaging/README.md +++ b/docs/cronus-framework/messaging/README.md @@ -1,2 +1,78 @@ # Messaging +Messaging is the transport layer that ties the pieces of a Cronus service — and of a federation of Cronus services — together. Commands flow from the edge to the application services; events flow out of the aggregates to the projections, ports, sagas and triggers that react to them; signals coordinate across tenants and across hosts. All of that travels through a small, transport-agnostic contract. + +## The shape of messaging in Cronus + +Every message that crosses a boundary is wrapped in a [`CronusMessage`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/CronusMessage.cs). It carries the payload (a typed `IMessage` or the serialized `PayloadRaw` bytes plus a contract id) and a `Dictionary` of headers — tenant, bounded context, causation id, traceparent, and the usual routing metadata. Every subscriber operates on `CronusMessage` and inspects those headers to decide how to handle the payload. + +On the publishing side the entry point is [`IPublisher`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/PublisherBase.cs), with a specialisation per message kind (`IPublisher`, `IPublisher`, `IPublisher`, `IPublisher`, `IPublisher`). Publishers are implemented by transport packages; the framework ships the shared pipeline types [`PublisherBase`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/PublisherBase.cs) and [`Publisher`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Publisher.cs) (which adds a linear retry policy on publish failure). + +On the subscription side the entry point is [`ISubscriber`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/MessageProcessing/ISubscriber.cs): + +```csharp +public interface ISubscriber +{ + string Id { get; } + IEnumerable GetInvolvedMessageTypes(); + Type HandlerType { get; } + Task ProcessAsync(CronusMessage message); +} +``` + +A subscriber declares the message types it is interested in and knows how to dispatch a `CronusMessage` into its handler. The abstract [`SubscriberBase`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/MessageProcessing/SubscriberBase.cs) wires the handler type's `DataContract` id as the subscriber's id — that is the stable value the transport uses to route messages and that appears in headers like `RecipientHandlers`. + +Both sides meet in the consumer, [`IConsumer`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IConsumer.cs): + +```csharp +public interface IConsumer where T : IMessageHandler +{ + Task StartAsync(); + Task StopAsync(); +} +``` + +Each handler kind (application services, projections, sagas, ports, triggers, gateways) has its own consumer, which is started or stopped by the corresponding `Cronus:*Enabled` flag (see [Configuration](../configuration.md#cronus.applicationservicesenabled)). + +## Workflows + +A message is processed through a pipeline of [`Workflow`](../workflows.md) steps. The standard pipeline creates a scoped service provider and a per-message `CronusContext` ([`ScopedMessageWorkflow`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/MessageProcessing/ScopedMessageWorkflow.cs)), dispatches the message to the handler ([`MessageHandleWorkflow`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/MessageProcessing/MessageHandleWorkflow.cs)) and wraps everything in a diagnostic envelope ([`DiagnosticsWorkflow`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Workflow/DiagnosticsWorkflow.cs)). On the publishing side the pipeline sets tenant and bounded-context headers ([`CronusHeadersPublishHandler`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/PublisherBase.cs)), emits structured log lines ([`LoggingPublishHandler`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/PublisherBase.cs)) and propagates the OpenTelemetry traceparent ([`ActivityPublishHandler`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/PublisherBase.cs)). + +## Transports + +The transport abstraction sits behind `IPublisher` and the consumer contracts, so the framework itself does not care which broker is under it. Cronus ships with two implementations: + +* [`Cronus.Transport.RabbitMQ`](https://github.com/Elders/Cronus.Transport.RabbitMQ) — the canonical transport, marked `olympus`. One private RabbitMQ broker per trust boundary, with optional federation and an independent public broker for cross-boundary public events. Configuration lives under `Cronus:Transport:RabbitMQ:*` and `Cronus:Transport:PublicRabbitMQ:*` — see [Configuration](../configuration.md#cronus.transport.rabbitmq). +* [`Cronus.Transport.AzureServiceBus`](https://github.com/Elders/Cronus.Transport.AzureServiceBus) — a secondary transport used where Azure Service Bus is the operational default. + +RabbitMQ is the recommended transport for new services. It is the transport we have exercised in production hardest, the federation story maps naturally to multi-tenant and multi-bounded-context deployments, and the `Cronus:Transport:RabbitMQ:Consumer:FanoutMode` switch lets you flip the semantics from _competing consumers_ to _every node sees every message_ without changing the application code. + +## Serialization + +Messages need to become bytes before they go over the wire, and bytes again when a subscriber picks them up. That responsibility lives behind `ISerializer`. See [Serialization](serialization.md) for the shipped implementations, the contract-id convention and the rules for keeping `DataContract` attributes stable. + +## Related pages + +{% content-ref url="serialization.md" %} +[serialization.md](serialization.md) +{% endcontent-ref %} + +{% content-ref url="../workflows.md" %} +[workflows.md](../workflows.md) +{% endcontent-ref %} + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **should** standardise on RabbitMQ for new services unless you have an operational reason to pick Azure Service Bus +* you **should** rely on `CronusMessage.Headers` for routing data; the payload is the business contract, not the routing contract +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **must not** bypass the publisher pipeline; the pipeline is what stamps headers and trace ids +* you **should not** implement `ISubscriber` by hand when a handler kind already exists; write the handler and let the framework wire the subscriber +{% endhint %} diff --git a/docs/cronus-framework/messaging/serialization.md b/docs/cronus-framework/messaging/serialization.md index 66a64565..6d04f532 100644 --- a/docs/cronus-framework/messaging/serialization.md +++ b/docs/cronus-framework/messaging/serialization.md @@ -1,28 +1,85 @@ # Serialization -[https://github.com/Elders/Cronus/issues/269](https://github.com/Elders/Cronus/issues/269) +Every message that lives longer than a single process call — an event in the store, a command on the wire, a public event crossing a bounded context — goes through a serializer on the way out and through the same serializer on the way in. Serialization is the contract between "now" and "every version of the service that will ever run". -[ISerializer](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Serializer/ISerializer.cs#L5-L9) interface is simple. You can plug your own implementation in but should not change it once you are in production. +## The contract - The samples in this manual work with JSON and Proteus-protobuf serializers. very `ICommand`, `IEvent`, `ValueObject` or anything which is persisted is marked with a `DataContractAttribute` and the properties are marked with a `DataMemberAttribute`. [Here is a quick sample how this works \(just ignore the WCF or replace it with Cronus while reading\)](https://msdn.microsoft.com/en-us/library/bb943471%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396). We use `Guid` for the name of the DataContract because it is unique. +The interface in [`Cronus/src/Elders.Cronus/ISerializer.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/ISerializer.cs) is deliberately minimal: + +```csharp +public interface ISerializer +{ + byte[] SerializeToBytes(T message); + string SerializeToString(T message); + T DeserializeFromBytes(byte[] bytes); +} +``` + +The implementation you wire into DI is the one the entire framework uses — there is no per-message switch. You should not change it once a service is in production without doing a full event-store migration, because every byte stored before the switch was produced by the previous serializer. + +## The contract-id convention + +Every `IMessage` (every `ICommand`, `IEvent`, `IPublicEvent`, `ISignal`) and every value-typed record/class that persists bytes is annotated with a `DataContractAttribute` whose `Name` is a GUID: + +```csharp +[DataContract(Name = "f69daa12-171c-43a1-b049-be8a93ff137f")] +public class AggregateCommit : IMessage { ... } +``` + +That guid is the _contract id_ — the stable identifier the serializer writes alongside the payload. It is the id the system uses to look up the type when deserialising: + +* `GetContractId()` on a `Type` returns the guid. +* `GetTypeByContract(string contractId)` goes the other way. +* Serialized messages carry their contract id so the runtime can pick the right type to materialise into. + +Because the contract id is just a guid in an attribute, you can freely rename the C# class, the namespace, the fields (as long as `[DataMember(Order = N)]` is stable), or move it between assemblies, and the persisted bytes keep working. That is the property that makes long-lived event stores maintainable — the wire shape and the code shape are allowed to diverge. + +The rules that govern contract evolution are explained in full under [Published Language](../domain-modeling/published-language.md); the hint block at the bottom of this page summarises them. + +## The shipped serializers + +Two serializers live in the ecosystem: + +* [`Cronus.Serialization.NewtonsoftJson`](https://github.com/Elders/Cronus.Serialization.NewtonsoftJson) — the canonical serializer, marked `olympus`. JSON, driven by `[DataContract]` and `[DataMember]` attributes. Human-readable bytes in the store (very helpful for debugging), broad compatibility, and zero warm-up cost. This is the one you should use. +* [`Cronus.Serialization.Proteus`](https://github.com/Elders/Cronus.Serialization.Proteus) (legacy) — the protobuf-based serializer, marked `styx`. Faster once warm, more compact on disk — but it pays a significant warm-up penalty on large projects (the type graph is walked once on first use) and the implementation has a small protocol deviation from stock protobuf. The recommendation today is the JSON serializer. + +## Rules of thumb + +The pattern that keeps serialization safe long-term is exactly the pattern `DataContract` encodes: + +1. Each message type gets a `DataContractAttribute` with `Name` set to a new GUID. Never reuse guids. +2. Each persisted field gets `[DataMember(Order = N)]` where `N` is unique within the type and never changes. +3. Every type you persist has a private parameterless constructor (the serializer needs to build the instance before it fills it). +4. Collection fields are initialised in the constructor (otherwise a freshly-deserialised instance may expose a null list). + +The best-practices block below is the full `can / should / must` form. + +## Related pages + +{% content-ref url="../domain-modeling/published-language.md" %} +[published-language.md](../domain-modeling/published-language.md) +{% endcontent-ref %} + +{% content-ref url="README.md" %} +[README.md](README.md) +{% endcontent-ref %} ## Best Practices {% hint style="success" %} **You can/should/must...** -* you **must** add private parameterless constructor -* you **must** initialize all collections in the constructor\(s\) +* you **must** add a private parameterless constructor on every persisted type +* you **must** initialise all collection members in the constructor(s) * you **can** rename any class whenever you like even when you are already in production * you **can** rename any property whenever you like even when you are already in production -* you **can** add new properties +* you **can** add new properties — on deserialisation they get the default value {% endhint %} {% hint style="warning" %} **You should not...** -* you **must not** delete a class when already deployed to production -* you **must not** remove/change the `Name` of the `DataContractAttribute` when already deployed to production -* you **must not** remove/change the `Order` of the `DataMemberAttribute` when deployed to production. You can change the visibility modifier from `public` to `private` +* you **must not** delete a class when already deployed to production — the store still references its contract id +* you **must not** remove or change the `Name` of the `DataContractAttribute` on a deployed type +* you **must not** remove or change the `Order` of a `DataMemberAttribute` on a deployed type; you may change visibility (`public` → `private`) but never the number {% endhint %} - diff --git a/docs/cronus-framework/projections/README.md b/docs/cronus-framework/projections/README.md new file mode 100644 index 00000000..f7203981 --- /dev/null +++ b/docs/cronus-framework/projections/README.md @@ -0,0 +1,17 @@ +# Projections + +A projection is a read model derived from the events in the [Event Store](../event-store/README.md). It subscribes to the event types it cares about, maintains its own state in its own store, and exposes query-shaped answers to the rest of the service through `IProjectionReader`. The tactical reference — how to _write_ a projection, the `ProjectionDefinition` base class, the `Subscribe()` pattern — lives under [Domain Modeling / Handlers / Projections](../domain-modeling/handlers/projections.md). + +This section covers the two topics that are orthogonal to _how_ you write the handler, and that Cronus owns on your behalf once your projection is deployed: + +{% content-ref url="versioning.md" %} +[versioning.md](versioning.md) +{% endcontent-ref %} + +How a projection evolves over its lifetime. The shape of the handler changes across releases — the signature of `HandleAsync`, the fields on the state, the set of events subscribed to. Cronus detects that the shape has changed, issues a new _version_ of the projection, replays the events through the updated handler into a new storage slot, and promotes the new slot to _live_ once the replay completes. + +{% content-ref url="snapshots.md" %} +[snapshots.md](snapshots.md) +{% endcontent-ref %} + +How a projection avoids re-reading its entire history every time it is queried. Snapshots are periodic captures of a projection instance's state written to an `ISnapshotStore`; when the projection is next loaded, Cronus restores from the most recent snapshot and only replays the events that landed after it. diff --git a/docs/cronus-framework/projections/snapshots.md b/docs/cronus-framework/projections/snapshots.md new file mode 100644 index 00000000..5fa2cd18 --- /dev/null +++ b/docs/cronus-framework/projections/snapshots.md @@ -0,0 +1,14 @@ +# Projection snapshots + +{% hint style="warning" %} +**Snapshots are not currently shipped with Cronus.** Earlier revisions of the framework had a snapshot subsystem keyed off a marker interface plus pluggable strategies, but those types were removed and no replacement has been merged. This page is kept so existing deep links keep resolving. +{% endhint %} + +In the current codebase a projection's state is rebuilt by replaying every event of every type the projection handles — see [Versioning](versioning.md) and [Handlers / Projections](../domain-modeling/handlers/projections.md). There is no `ISnapshotStore`, no `IAmNotSnapshotable` marker, no `EventsCountSnapshotStrategy` / `TimeOffsetSnapshotStrategy` and no `Cronus:Projections:Cassandra:Snapshot…` configuration family in either [`Elders/Cronus`](https://github.com/Elders/Cronus) or [`Elders/Cronus.Projections.Cassandra`](https://github.com/Elders/Cronus.Projections.Cassandra) at the time of writing. + +If a projection's event volume per id grows large enough that full replay becomes expensive, the practical lever today is the projection-versioning machinery — bumping a projection's hash forces a fresh rebuild, after which the new version serves all reads. See [Versioning](versioning.md) for that flow. + +## Related + +* [Versioning](versioning.md) — how a projection's shape change kicks off a replay. +* [Handlers / Projections](../domain-modeling/handlers/projections.md) — how to write the projection. diff --git a/docs/cronus-framework/projections/versioning.md b/docs/cronus-framework/projections/versioning.md new file mode 100644 index 00000000..fa9460d4 --- /dev/null +++ b/docs/cronus-framework/projections/versioning.md @@ -0,0 +1,65 @@ +# Projection versioning + +A projection's shape is not stable across the lifetime of a service. You add a new field to the state; you subscribe to a new event; you change the way an existing event updates the state. The projection code is different after the change — but the rows already in the projection store were written by the _old_ code, against the _old_ shape, and querying them from the new code would return wrong data. Projection versioning is Cronus's answer to that: every shape change is a new version, old versions keep serving reads until the new one is ready, and the switchover is atomic. + +## What changes between versions + +The projection _hash_ is the fingerprint of the projection's handler type. It is computed by [`ProjectionHasher`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/ProjectionHasher.cs) from the type's contract id plus the set of `IEventHandler` interfaces it implements. When the hash changes, Cronus considers the projection's shape to have changed. + +A [`ProjectionVersion`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/ProjectionVersion.cs) bundles three pieces of information: the projection's contract id (`ProjectionName`), a numeric `Revision` and the current `Hash`. It also carries a `Status` — `New`, `Building`, `Fixing`, `Live`, `Canceled`, `Timedout`, `NotPresent` or `Unknown` — which drives the lifecycle. + +## The lifecycle + +Four system components cooperate to turn a shape change into a safe replay: + +* [`ProjectionVersionManager`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/ProjectionVersionManager.cs) — the aggregate. One instance per projection contract id per tenant, it owns the list of known versions and decides whether a new version should be requested. +* [`MarkupInterfaceProjectionVersioningPolicy`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/MarkupInterfaceProjectionVersioningPolicy.cs) — the default policy. A projection is versionable unless it implements the marker `INonVersionableProjection` (system projections opt out this way). +* [`VersionRequestTimebox`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/VersionRequestTimebox.cs) — the time window within which a version request is expected to complete. If the request has not finished by `FinishRequestUntil`, the manager cancels it. +* [`ProjectionBuilder`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/Handlers/ProjectionBuilder.cs) — the system saga that turns a `ProjectionVersionRequested` event into actual work. It decides whether the projection is "fast" (`IAmEventSourcedProjectionFast` or `IProjectionDefinition`) or sequential, builds the appropriate job, and hands it to the cluster. + +The end-to-end flow: + +1. **Detect change.** When the host starts, each projection's hash is computed. If it differs from the hash currently marked `Live` for that projection, `ProjectionVersionManager.NotifyHash(hash, policy, replayOptions)` is called. +2. **Request a version.** The manager checks that no replay is already in progress, then calls `Replay(hash, policy, replayOptions)`, which applies a `ProjectionVersionRequested` event with a fresh [`VersionRequestTimebox`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/VersionRequestTimebox.cs) that starts immediately and expires effectively "never" (the default is `int.MaxValue` milliseconds into the future). +3. **Start the builder.** The `ProjectionBuilder` saga handles `ProjectionVersionRequested` and schedules a `CreateNewProjectionVersion` timeout at the requested start. When it fires, the saga calls `GetJob(version, options, timebox)` to choose between [`RebuildProjection_Job`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Rebuilding/RebuildProjection_Job.cs) (fast — for event-sourced projections) and `RebuildProjectionSequentially_Job` (sequential — for projections that require strict event ordering), then runs the job through the [`ICronusJobRunner`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Cluster/Job/ICronusJobRunner.cs). +4. **Replay events.** Inside the job, [`ProgressTracker.InitializeAsync`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Rebuilding/ProgressTracker.cs) seeds the counter from the [`IMessageCounter`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/IMessageCounter.cs) so you can display "X of Y events processed". The job walks each event type the projection handles, calls [`IEventStorePlayer.EnumerateEventStore`](../event-store/eventstore-player.md) with the type id, deserialises each raw event and writes a projection commit into the new version's storage slot. Progress is pinged to the cluster on every page. +5. **Announce milestones.** A [`RebuildProjectionStarted`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Signals/RebuildProjectionStarted.cs) signal goes out when the replay starts; [`RebuildProjectionProgress`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Signals/RebuildProjectionProgress.cs) is published once a second for monitoring; [`RebuildProjectionFinished`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Signals/RebuildProjectionFinished.cs) marks the end. +6. **Promote to live.** When the job returns `JobExecutionStatus.Completed`, the saga publishes `FinalizeProjectionVersionRequest`. The manager handles it by applying `NewProjectionVersionIsNowLive`, which moves the new version's `Status` from `Building` to `Live`. From that moment `IProjectionReader` answers queries from the new version. +7. **Retain old versions.** The previous live version is not dropped. The Cassandra projection store keeps old versions around according to the `Cronus:Projections:Cassandra:TableRetention:*` options (see [Configuration](../configuration.md#cronus-projections-cassandra)). By default `DeleteOldProjectionTables` is `false`, so nothing is deleted; when enabled, `NumberOfOldProjectionTablesToRetain` keeps that many historical versions around before garbage-collecting the oldest. + +## What happens if something goes wrong + +* **Timeout.** If the timebox expires before the replay completes, `ProjectionVersionManager.VersionRequestTimedout` fires a `ProjectionVersionRequestTimedout` event and the version's status becomes `Timedout`. A new request can then be issued. +* **Cancelled replay.** Operators can pause a replay through `ProjectionVersionRequestPaused`; the builder responds by calling `jobRunner.JobManager.CancelAsync(job.Name)`. +* **Outdated building version.** If a newer version of the projection is already `Live` by the time a `Building` one catches up, `CancelVersionRequest` retires the stale one. +* **Disaster recovery.** For system projections like [`ProjectionVersionsHandler`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/Handlers/ProjectionVersionsHandler.cs), `Rebuild(hash, policy, options)` bypasses the usual checks and forces a fresh rebuild, because the versioning subsystem itself depends on this projection being correct. + +## A short example + +You rarely call versioning APIs yourself — the framework orchestrates the whole dance. What you do is add the new field to the state, change the `HandleAsync`, deploy, and let Cronus notice: + +```csharp +[DataContract(Name = "c94513d1-e5ee-4aae-8c0f-6e85b63a4e03")] +public class TaskProjection : ProjectionDefinition, + IEventHandler, + IEventHandler // new — previously only TaskCreated +{ + public TaskProjection() + { + Subscribe(x => new TaskId(x.Id.Tenant, x.Id.Id)); + Subscribe(x => new TaskId(x.Id.Tenant, x.Id.Id)); + } + + public Task HandleAsync(TaskCreated @event) { /* ... */ return Task.CompletedTask; } + public Task HandleAsync(TaskCompleted @event) { /* updated state shape */ return Task.CompletedTask; } +} +``` + +Next deploy, Cronus hashes the new handler, notices the hash has changed, requests a new `ProjectionVersion` and kicks off a replay. The old version keeps answering reads until the new one is `Live`. + +## Related + +* [Handlers / Projections](../domain-modeling/handlers/projections.md) — how to write the handler itself. +* [Jobs](../jobs.md) — the job runner the replay sits on top of. +* [Indices](../indices.md) — specifically the `EventToAggregateRootId` index, which the rebuild depends on to locate events efficiently. +* [Snapshots](snapshots.md) — note: snapshots are not currently shipped; that page documents the situation. diff --git a/docs/cronus-framework/unit-testing.md b/docs/cronus-framework/unit-testing.md index 6854a7d1..90acc48f 100644 --- a/docs/cronus-framework/unit-testing.md +++ b/docs/cronus-framework/unit-testing.md @@ -1,4 +1,104 @@ # Unit testing -[https://github.com/Elders/Cronus/issues/280](https://github.com/Elders/Cronus/issues/280) +Cronus aggregates are a joy to test. An aggregate is a pure state machine — given a history of events, applying a command produces zero or more new events. No databases, no message brokers, no mocks: just a sequence of inputs and a sequence of outputs. The `Elders.Cronus.Testing` helpers shipped in [`Cronus.DomainModeling`](https://github.com/Elders/Cronus.DomainModeling) lean into that, and the pattern you end up using is always the same. +## The `Aggregate.FromHistory(...)` pattern + +The shipped helper is `Aggregate` under the `Elders.Cronus.Testing` namespace. It hands you a fluent stream builder, replays the events through the aggregate's `When` handlers, and returns an instance that is ready to have a command executed against it. + +A representative test from the Cronus test suite — [`When_projection_version_with_status_building_is_outside_of_the_timebox - Copy.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus.Tests/Projections/When_projection_version_with_status_building_is_outside_of_the_timebox%20-%20Copy.cs) — reads: + +```csharp +using Elders.Cronus.Testing; + +ar = Aggregate + .FromHistory(stream => stream + .AddEvent(new ProjectionVersionRequested(id, new ProjectionVersion(...), ...)) + .AddEvent(new ProjectionVersionRequested(id, new ProjectionVersion(...), ...)) + .AddEvent(new ProjectionVersionRequestTimedout(id, new ProjectionVersion(...), ...))); + +// When: execute a method on the aggregate root +ar.Replay(hash, new MarkupInterfaceProjectionVersioningPolicy(), new ReplayEventsOptions()); + +// Then: assert on the uncommitted events produced by the command +ar.PublishedEvents().Count().ShouldEqual(2); +``` + +The three-step shape — `Given` the history, `When` the command, `Then` the published events — is the shape of every aggregate test. The `PublishedEvents()` helper filters the uncommitted events on the aggregate by type so you can assert on exactly the facts you expect the command to have produced. + +## Why this works + +The reason the test is so compact is structural: `AggregateRoot` is a reducer. Its state changes _only_ through `When(TEvent)` handlers, and the aggregate methods never do I/O — they compute the next event and call `Apply`, which runs through `When` and appends to `UncommittedEvents`. `Aggregate.FromHistory` is doing in a few lines what `AggregateRepository.LoadAsync` does at runtime: it creates a fresh instance of the aggregate and feeds the history through `ReplayEvents`. + +The same property makes aggregate tests uniquely valuable. Writing a failing test for a bug is usually one copy-paste away from the production event log that revealed it: the same events, in the same order, and the assertion that the next command produces the event that should have been produced. + +## A small full example + +Suppose you are modelling a `Concert` aggregate with an `Announce` method and a `RegisterPerformer` method. The test that a performer cannot be registered after the concert has already started: + +```csharp +using Elders.Cronus.Testing; +using Machine.Specifications; + +[Subject("Concert")] +public class When_registering_a_performer_after_the_concert_has_started +{ + Establish context = () => + { + concertId = new ConcertId("summer-festival", "eldersoss"); + + concert = Aggregate + .FromHistory(stream => stream + .AddEvent(new ConcertAnnounced(concertId, "Summer Festival", venue, startTime, duration)) + .AddEvent(new ConcertStarted(concertId, startTime))); + }; + + Because of = () => registerResult = Catch.Exception( + () => concert.RegisterPerformer(new Performer("Some Band"))); + + It should_throw = () => registerResult.ShouldBeOfExactType(); + + It should_not_publish_a_performer_registered_event = + () => concert.PublishedEvents().ShouldBeEmpty(); + + static ConcertId concertId; + static Concert concert; + static Exception registerResult; +} +``` + +Five lines of arrange; one line of act; two lines of assert. No fixtures, no setup, no async, no mocks — because the aggregate is a pure reducer. + +## Integration-style tests + +When you want to test the path _through_ the repository — integrity checks, atomic-action retries, the aggregate-commit interceptor — use the in-memory event store that Cronus wires by default in tests: + +```csharp +var services = new ServiceCollection() + .AddLogging() + .AddCronus(...) // uses InMemory everything + .BuildServiceProvider(); + +var repository = services.GetRequiredService(); + +await repository.SaveAsync(concert); // exercises the real IEventStore, with the in-memory backend +var roundtripped = await repository.LoadAsync(concertId); +``` + +This is slower than `Aggregate.FromHistory` and you should still prefer the pure-reducer test as the first-class unit test. Reach for the integration test when the question you are answering is about the repository, not about the aggregate. + +## Best Practices + +{% hint style="success" %} +**You can/should/must...** + +* you **should** write one aggregate test per business rule, shaped as _Given events → When method → Then events_ +* you **should** use the events directly from a failing production replay as your "Given" when you reproduce a bug +{% endhint %} + +{% hint style="warning" %} +**You should not...** + +* you **should not** mock `AggregateRepository` in an aggregate test; you are testing the reducer, not the repository +* you **must not** assert on the aggregate's private state; assert on the events, they are the only public output +{% endhint %} diff --git a/docs/cronus-framework/workflows.md b/docs/cronus-framework/workflows.md index 0bcda41a..15097a8f 100644 --- a/docs/cronus-framework/workflows.md +++ b/docs/cronus-framework/workflows.md @@ -1,34 +1,100 @@ # Workflows -[https://github.com/Elders/Cronus/issues/266](https://github.com/Elders/Cronus/issues/266) +Workflows are Cronus's message-processing pipeline. When a command, event, signal or scheduled message arrives from the transport, a workflow is responsible for resolving the right handler, invoking it inside the right scope, and surfacing any failures. The design mirrors the [ASP.NET Core middleware pipeline](https://learn.microsoft.com/aspnet/core/fundamentals/middleware/): each workflow wraps an inner workflow and can add cross-cutting behaviour — logging, activity tracing, retries — before or after the inner `RunAsync` call. -Workflows are the center of message processing. It is very similar to the [ASP.NET middleware pipeline](https://docs.microsoft.com/en-us/aspnet/core/fundamentals/middleware/?view=aspnetcore-3.1). +## The building block -With a workflow you can: +Every workflow inherits `Workflow`: -* define what logic will be executed when a message arrives -* execute an action before or after the actual execution -* override or stop a workflow pipeline +```csharp +public abstract class Workflow : WorkflowBase where TContext : class +{ + protected abstract Task RunAsync(Execution execution); +} +``` -## Default workflows +The context is the envelope that carries state down the pipeline. For message handling the context is `HandleContext`, which holds the `CronusMessage`, the handler type, and (once the scope has been created) the scoped `IServiceProvider`. -By default, all messages are handled in an isolated fashion via [`ScopedMessageWorkflow`](../../src/Elders.Cronus/MessageProcessing/ScopedMessageWorkflow.cs) using scopes. Once the scope is created then the next workflow ([`MessageHandleWorkflow`](../../src/Elders.Cronus/MessageProcessing/MessageHandleWorkflow.cs)) is invoked with the current message and scope. In addition, [`DiagnosticsWorkflow`](../../src/Elders.Cronus/Workflow/DiagnosticsWorkflow.cs) wraps the entire pipeline bringing insights into the performance of the message handling pipeline. +```csharp +public class HandleContext : IWorkflowContextWithServiceProvider +{ + public CronusMessage Message { get; } + public Type HandlerType { get; } + public IServiceProvider ServiceProvider { get; set; } +} +``` -#### ScopedMessageWorkflow +## The default message pipeline -The primary focus of the workflow is to prepare an isolated scope and context within which a message is being processed. Usually, you should not interact with this workflow directly. +When Cronus starts a subscriber (application services, projections, sagas, ports, triggers, gateways, indices) it composes a pipeline like this, from outer to inner: -The workflow creates an instance of [`IServiceScope`](https://docs.microsoft.com/en-us/dotnet/api/microsoft.extensions.dependencyinjection.iservicescope?view=dotnet-plat-ext-3.1) which allows using Dependency Injection in a familiar to a dotnet developer way. In addition, the workflow initializes an instance of [`CronusContext`](../../src/Elders.Cronus/MessageProcessing/CronusContext.cs) which holds information about the current [tenant ](domain-modeling/multitenancy.md)handling the message. +1. **`ExceptionEaterWorkflow`** — last line of defence; swallows and logs exceptions that escape the rest of the pipeline so the consumer does not crash. +2. **`DiagnosticsWorkflow`** — starts a `System.Diagnostics.Activity`, emits a structured log scope containing the tenant and aggregate root id, and writes a `handled in Xms` info log on success. +3. **`InMemoryRetryWorkflow`** — retries transient failures in-process before giving up. +4. **`ScopedMessageWorkflow`** — creates a fresh `IServiceScope` for the message, initialises a `CronusContext` with the tenant resolved from the headers, and attaches a logger scope. The inner workflow receives the scoped `ServiceProvider` via `HandleContext.ServiceProvider`. +5. **`MessageHandleWorkflow`** — resolves the handler via `CreateHandler`, invokes `BeginHandle` → `ActualHandle` → `EndHandle`, and routes exceptions into the `Error` sub-workflow. The default `ActualHandle` calls `DynamicMessageHandle` which dispatches to `HandleAsync(message)` on the resolved handler. -Additionally, Cronus uses structured logging and a new log scope is created every time a new message arrives so you could co-relate log messages. +You can see the composition in [`ApplicationServiceSubscriberWorkflow`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/MessageProcessing/ApplicationServiceSubscriberWorkflow.cs); each subscriber kind has its own factory that assembles a pipeline tailored to its needs. + +## Why it is generic + +Each workflow is parameterised by its context type so the compiler can enforce that a `DiagnosticsWorkflow` only wraps another `Workflow`. The generic parameter is the context, not the message — there is one `DiagnosticsWorkflow` type that works for any context that derives from `HandleContext`. + +```csharp +public sealed class DiagnosticsWorkflow : Workflow + where TContext : HandleContext +{ + public DiagnosticsWorkflow( + Workflow workflow, + DiagnosticListener diagnosticListener, + ActivitySource activitySource) { ... } +} +``` + +## Customising the pipeline + +`MessageHandleWorkflow` exposes five extension points — each is itself a `Workflow` that defaults to a no-op lambda: + +| Extension point | When it runs | Typical use | +| --------------- | --------------------------------------------------------------- | ------------------------------------- | +| `BeginHandle` | Before the actual handler invocation | Instrumentation, authorisation checks | +| `ActualHandle` | The handler call itself (default: `DynamicMessageHandle`) | Replace or wrap dispatch logic | +| `EndHandle` | After a successful handler call | Post-commit hooks, metrics | +| `Error` | When `BeginHandle` / `ActualHandle` / `EndHandle` throws | Error enrichment, dead-letter routing | +| `Finalize` | Always, at the end (after success or after `Error`) | Cleanup | + +Use `OnHandle(...)` to inject your own `Workflow` around the actual handler call: + +```csharp +messageHandleWorkflow.OnHandle(inner => + WorkflowExtensions.Lamda() + .Use(async ctx => + { + // pre-handle + await inner.RunAsync(ctx.Context).ConfigureAwait(false); + // post-handle + })); +``` {% hint style="info" %} -Read more about the [Dependency Injection](https://docs.microsoft.com/en-us/archive/msdn-magazine/2016/june/essential-net-dependency-injection-with-net-core) and [service lifetimes](https://docs.microsoft.com/en-us/aspnet/core/fundamentals/dependency-injection?view=aspnetcore-3.1#service-lifetimes) if this is a new concept for you. +You rarely need to touch workflows directly. Reach for the extensibility points (ports, sagas, triggers, gateways) first — workflows are the primitive behind those abstractions. {% endhint %} -#### MessageHandleWorkflow +## Diagnostics and tracing + +`DiagnosticsWorkflow` writes Activities to the `Elders.Cronus` `ActivitySource` and a `DiagnosticListener` named `"cronus"`. If you configure OpenTelemetry (or any other APM) to listen for that source, you get end-to-end traces of every message handled in your host. -TODO: Explain message handling workflow responsibilities +Log scopes include: - +* `cronus_messageHandler` — the handler type name +* `cronus_messageType` — the message payload type name +* `cronus_tenant` — the tenant extracted from the message headers +* `cronus_arid` — the aggregate root id if the message carries one +{% hint style="success" %} +**You can / should / must** + +* you **can** wrap the default `MessageHandleWorkflow` with your own `BeginHandle` / `EndHandle` to add cross-cutting behaviour +* you **should** keep workflow code synchronous-friendly — `DiagnosticListener` writes happen on the calling thread +* you **must** return `Task` promptly; any blocking operation in a workflow blocks the consumer +{% endhint %} diff --git a/docs/getting-started/quick-start/README.md b/docs/getting-started/quick-start/README.md index 3bf291d8..69fdcba1 100644 --- a/docs/getting-started/quick-start/README.md +++ b/docs/getting-started/quick-start/README.md @@ -1,20 +1,31 @@ --- description: >- - To help you get started quickly on the Cronus we will build an application - that will satisfy all future business requirements. + Build a small task-management service end-to-end with Cronus: API, worker, + commands, events, and projections. --- # Quick Start -### Business requirements +## Business requirements -* We need a new task management system. -* We need data to be consistent. -* We need to be able to reassign tasks inside the user group. -* We need an accurate progress report for every user. -* Groups progress report needs to be secured such that only group members can access it. -* We need a notification to the group members when a user finishes his task. -* We need a screen to view the historical changes in user activity. -* When users close their accounts we need to ask them why (optional survey). -* We need to generate a monthly report that indicates why lost users closed their accounts. +To drive the examples in the following pages, we'll build a task-management service that meets these real-world requirements: +* A new task-management system for multiple tenants. +* Data must be consistent — no partial updates, no silent loss of state. +* Tasks can be reassigned inside the same user group. +* Every user has an accurate progress report. +* Group progress reports are restricted to group members. +* When a user finishes a task, the group is notified. +* A screen shows the historical changes in user activity. +* When a user closes their account, an optional exit survey is recorded. +* A monthly report summarises why lost users closed their accounts. + +We won't implement every single bullet in the quick start. The goal is to get a working skeleton in place and see a command become an event become a projection. + +## Path through the quick start + +1. [Setup](setup.md) — create two processes (API + worker), add the Cronus packages, start Cassandra & RabbitMQ in Docker, and wire `AddCronus(Configuration)`. +2. [Persist first event](persist-first-event.md) — model `TaskAggregate`, publish a `CreateTask` command, persist a `TaskCreated` event. +3. [Explore projections](explore-projections.md) — build a `TaskProjection` over the events and query it from the API. + +By the end you will have a two-process Cronus service you can grow into a real domain. Each page ends with a pointer to the deeper documentation on the building block it introduced. diff --git a/docs/getting-started/quick-start/explore-projections.md b/docs/getting-started/quick-start/explore-projections.md index d27a4a49..e3230130 100644 --- a/docs/getting-started/quick-start/explore-projections.md +++ b/docs/getting-started/quick-start/explore-projections.md @@ -1,182 +1,165 @@ # Explore Projections -[Projections ](../../cronus-framework/domain-modeling/handlers/projections.md)are [queryable ](../../cronus-framework/domain-modeling/handlers/projections.md#querying-a-projection)models used for the reading part of our application. We can design projections in such a way that we can manage what data we want to store and by what will be searched. Events are the basis for projections data. +With a `TaskCreated` event in Cassandra we can now build a **projection** — a queryable read model derived from events. This page walks through adding a `TaskProjection` and exposing it through the API. -For using projections we should update the [configuration ](../../cronus-framework/configuration.md#cronus-projectionsenabled)file for both API and Service. +{% content-ref url="../../cronus-framework/domain-modeling/handlers/projections.md" %} +[projections.md](../../cronus-framework/domain-modeling/handlers/projections.md) +{% endcontent-ref %} -{% code title="appsettings.json" %} -```csharp - "Persistence": { /* ... */ }, - "Projections": { - "Cassandra": { - "ConnectionString": "Contact Points=127.0.0.1;Port=9042;Default Keyspace=taskmanager_projections" - } - } +## 1. Install the projections package + +Projections are persisted by the `Cronus.Projections.Cassandra` package. It should already be added to `TaskManager.Service` from the [setup](setup.md) step; double-check: + +```shell +dotnet list TaskManager.Service package | grep Projections ``` -{% endcode %} -And add some dependencies. +{% hint style="warning" %} +The correct package name is **`Cronus.Projections.Cassandra`** (plural). An older, obsolete variant called `Cronus.Projection.Cassandra` still exists on NuGet — don't install it. +{% endhint %} -```csharp -dotnet add package Cronus.Projection.Cassandra +Make sure the `Cronus:Projections:Cassandra:ConnectionString` is set in both API and worker `appsettings.json`: + +```json +"Projections": { + "Cassandra": { + "ConnectionString": "Contact Points=127.0.0.1;Port=9042;Default Keyspace=taskmanager_projections" + } +} ``` -### Create a projection for querying tasks +## 2. Define the projection -You can choose whitch implementation to use. You can get hte tasks(_commented in the controller_) with same name, or all tasks. +We want to query all tasks belonging to a given user. The projection ID will therefore be `UserId`, and the projection subscribes to `TaskCreated`. {% tabs %} {% tab title="TaskProjection" %} ```csharp [DataContract(Name = "c94513d1-e5ee-4aae-8c0f-6e85b63a4e03")] -public class TaskProjection : ProjectionDefinition, +public class TaskProjection : ProjectionDefinition, IEventHandler { public TaskProjection() { - //Id.NID - here we are subscribing by tenant - //in our case the tenant is: "tenant" - //so we well get all events - Subscribe(x => new TaskId(x.Id.NID)); + // one event can fan out to many projection instances — here, one per user + Subscribe(e => e.UserId); } public Task HandleAsync(TaskCreated @event) { - Data task = new Data(); - - task.Id = @event.Id; - task.UserId = @event.UserId; - task.Name = @event.Name; - task.Timestamp = @event.Timestamp; - - State.Tasks.Add(task); + // HandleAsync runs on every event; design it to be idempotent + if (State.Tasks.Any(x => x.Id.Equals(@event.Id))) + return Task.CompletedTask; + + State.Tasks.Add(new TaskProjectionState.Entry + { + Id = @event.Id, + Name = @event.Name, + CreatedAt = @event.Timestamp, + Deadline = @event.Deadline + }); return Task.CompletedTask; } - public IEnumerable GetTaskByName(string name) - { - return State.Tasks.Where(x => x.Name.Equals(name)); - } + + public IEnumerable WithName(string name) + => State.Tasks.Where(x => x.Name.Equals(name, StringComparison.OrdinalIgnoreCase)); } ``` {% endtab %} -{% tab title="TaskProjectionData" %} +{% tab title="TaskProjectionState" %} ```csharp [DataContract(Name = "c135893e-b9e3-453a-b0e0-53545094ec5d")] -public class TaskProjectionData +public class TaskProjectionState { - public TaskProjectionData() - { - Tasks = new List(); - } + public TaskProjectionState() { Tasks = new List(); } [DataMember(Order = 1)] - public List Tasks { get; set; } + public List Tasks { get; set; } [DataContract(Name = "317b3cbb-593a-4ffc-8284-d5f5c599d8ae")] - public class Data + public class Entry { - [DataMember(Order = 1)] - public TaskId Id { get; set; } - - [DataMember(Order = 2)] - public UserId UserId { get; set; } - - [DataMember(Order = 3)] - public string Name { get; set; } - - [DataMember(Order = 4)] - public DateTimeOffset CreatedAt { get; set; } - - [DataMember(Order = 5)] - public DateTimeOffset Timestamp { get; set; } + [DataMember(Order = 1)] public TaskId Id { get; set; } + [DataMember(Order = 2)] public string Name { get; set; } + [DataMember(Order = 3)] public DateTimeOffset CreatedAt { get; set; } + [DataMember(Order = 4)] public DateTimeOffset Deadline { get; set; } } } ``` {% endtab %} {% endtabs %} -Every time the event will occur it will be handled and persist in its state. +{% hint style="info" %} +`Subscribe(e => projectionId)` is how Cronus maps an event to a projection instance. Every time `TaskCreated` is handled, the framework asks the projection for the target ID (here, `e.UserId`), loads (or creates) the projection row for that ID, applies the event, and saves it. +{% endhint %} -### Read the state +## 3. Query the projection -Inject `IProjectionReader` that will be responsible for getting the projection state by Id on which projection was subscribed before: `Subscribe(x => x.UserId).` +Inject `IProjectionReader` into a controller and call `GetAsync(id)`. The reader returns a `ReadResult`; always branch on its `NotFound` / `HasError` / `IsSuccess` flags. +{% code title="TaskQueryController.cs" %} ```csharp [ApiController] [Route("[controller]/[action]")] -public class TaskController : ControllerBase +public class TaskQueryController : ControllerBase { -private readonly IPublisher _publisher; -private readonly IProjectionReader _projectionReader; - -public TaskController(IPublisher publisher, IProjectionReader reader) -{ - _publisher = publisher; - _projectionReader = reader; -} + private readonly IProjectionReader reader; -//.... create task code ..// + public TaskQueryController(IProjectionReader reader) + { + this.reader = reader; + } -[HttpGet] -public async Task GetTasksByName(string name) -{ + [HttpGet] + public async Task GetByUser(string userId, CancellationToken ct) + { + const string tenant = "tenant"; + var id = new UserId(tenant, userId); - ReadResult readResult = await _projectionReader.GetAsync(new TaskId("tenant")); + ReadResult result = await reader.GetAsync(id).ConfigureAwait(false); - if (readResult.IsSuccess == false) - return NotFound(); + if (result.NotFound) return NotFound(); + if (result.HasError) return Problem(result.Error); - var TasksByName = readResult.Data.GetTaskByName(name); + return Ok(result.Data.State.Tasks); + } +} +``` +{% endcode %} +{% hint style="info" %} +The first time the worker starts with a new projection, Cronus builds and activates a new _projection version_ by replaying the existing event stream into it. This can take a moment on large stores. The projection is considered live only once the version is `Live`; until then `NotFound` is a possible result. +{% endhint %} - return Ok(TasksByName); +## 4. Run it end-to-end - ////Get all tasks - //return Ok(readResult.Data.State.Tasks.Select(x => new TaskData - //{ - // CreatedAt = x.CreatedAt, - // Id = x.Id, - // Name = x.Name, - // Timestamp = x.Timestamp, - // UserId = x.UserId - //})); -} -``` +1. Start the worker and the API (`dotnet run --project ...`). +2. `POST /Task/CreateTask` with `userId=alice`. +3. Wait a moment for the worker to handle the command and update the projection. +4. `GET /TaskQuery/GetByUser?userId=alice` → the created task appears in the list. -### Connect Dashboard +If the projection returns `NotFound`, check the worker logs for `CronusWorkflowHandle` entries involving `TaskProjection`. A missing entry means the event was not routed — usually a `Subscribe` is missing or the tenant does not match. -(_The dashboard is not requerd_) +## 5. Optional — plug in the Cronus Dashboard -If we hit this controller immediately after the first start, it could lead to a probable read error. \ -We need to give it some time to initialize our new projection store and build new versions of the projections. For an empty event store, it could take less than a few seconds but in order not to wait for this and verify that all working properly, we will check it manually. +[Cronus Dashboard](https://cronus-dashboard.github.io/) is a browser UI that inspects running hosts: tenants, projections, versions, rebuilds, and event traffic. It talks to the Cronus RPC endpoint that the worker exposes. -[Cronus Dashboard](https://cronus-dashboard.github.io/) is a UI management tool for the Cronus framework.\ -It hosts inside our Application so add this missing code to our background service. +Enable the RPC endpoint in the worker's `appsettings.json`: -```csharp -protected override async Task ExecuteAsync(CancellationToken stoppingToken) +```json { - logger.LogInformation("Starting service..."); - cronusHost.Start(); - - // Dashboard configuration - cronusDashboard = CronusApi.GetHost(); - cronusApi.Provider = cronusDashboard.Services; - await cronusDashboard.StartAsync().ConfigureAwait(false); - - logger.LogInformation("Service started!"); + "Cronus": { + "RpcApiEnabled": true + } } ``` -Start our Cronus Service and API. - -In the dashboard select the `Connections` tab and click `New Connection`.\ -Set the predefined port for the Cronus endpoint: [http://localhost:7477](http://localhost:7477) and specify your connection name. Click `Check` and then `Add Connection`.\ -After you add a connection select it from the drop-down menu and navigate to the Projections tab.\ -You would be able to see all projections in the system. +Then open the dashboard, add a connection to `http://localhost:7477`, and navigate to the _Projections_ tab. A green "live" badge means the projection is synchronised with the event store. -![A live green badge means that the projection is synchronized with ES and ready to use.](<../../.gitbook/assets/image (1).png>) +## Where to next -Now we would be able to request a controller with `userId`. `GetAsync` method of `IProjectionReader` will restore all events related to projection and apply them to the state. +* [Aggregate](../../cronus-framework/domain-modeling/aggregate.md) — deeper on the write model. +* [Projections handler](../../cronus-framework/domain-modeling/handlers/projections.md) — snapshots, versioning, non-event-sourced projections. +* [Configuration](../../cronus-framework/configuration.md) — every Cronus option. diff --git a/docs/getting-started/quick-start/persist-first-event.md b/docs/getting-started/quick-start/persist-first-event.md index c07a870e..e5895e9f 100644 --- a/docs/getting-started/quick-start/persist-first-event.md +++ b/docs/getting-started/quick-start/persist-first-event.md @@ -1,8 +1,21 @@ # Persist First Event -### Create Ids, commands and events +With the skeleton from the [setup page](setup.md) running, we can model a tiny slice of the task manager and send a command that ends up as an event in Cassandra. -First, we need to add a UserId and TaskId to have the [Identifications ](../../cronus-framework/domain-modeling/ids.md)of these two entities +The slice is: + +1. Two aggregate IDs — `TaskId` and `UserId`. +2. A command — `CreateTask`. +3. An event — `TaskCreated`. +4. An aggregate root and its state — `TaskAggregate` / `TaskState`. +5. An application service — `TaskAppService`. +6. An API controller that publishes the command. + +Put commands/events/IDs in a shared project (for example `TaskManager.Contracts`) that both the API and the worker reference. Aggregates, states and app services live in the worker project only. + +## 1. IDs + +`AggregateRootId`'s ctor takes `(tenant, arName, id)` — in that order. {% tabs %} {% tab title="TaskId" %} @@ -12,7 +25,7 @@ public class TaskId : AggregateRootId { TaskId() { } - public TaskId(string id) : base("tenant", "task", id) { } + public TaskId(string tenant, string id) : base(tenant, "task", id) { } } ``` {% endtab %} @@ -24,165 +37,148 @@ public class UserId : AggregateRootId { UserId() { } - public UserId(string id) : base("tenant", "user", id) { } + public UserId(string tenant, string id) : base(tenant, "user", id) { } } ``` {% endtab %} {% endtabs %} -Then we need to create a Cronus [command](../../cronus-framework/domain-modeling/messages/commands.md) for task creation and an [Event](../../cronus-framework/domain-modeling/messages/events.md) that will indicate that the event has occurred. +{% hint style="warning" %} +The constructor order is `(tenant, arName, id)`. Older docs and NuGet packages exposed a generic `AggregateRootId` with a different order (`id, arName, tenant`); the generic form is commented out in current master and you should use the non-generic base instead. +{% endhint %} + +## 2. Command and event {% tabs %} -{% tab title="Command" %} +{% tab title="CreateTask" %} ```csharp [DataContract(Name = "857d960c-4b91-49cc-98fd-fa543906c52d")] public class CreateTask : ICommand { - public CreateTask() { } + CreateTask() { } - public CreateTask(TaskId id, UserId userId, string name, DateTimeOffset timestamp) + public CreateTask(TaskId id, UserId userId, string name, DateTimeOffset deadline, DateTimeOffset timestamp) { if (id is null) throw new ArgumentNullException(nameof(id)); if (userId is null) throw new ArgumentNullException(nameof(userId)); - if (name is null) throw new ArgumentNullException(nameof(name)); - if (timestamp == default) throw new ArgumentNullException(nameof(timestamp)); + if (string.IsNullOrWhiteSpace(name)) throw new ArgumentException("Name is required", nameof(name)); Id = id; UserId = userId; Name = name; + Deadline = deadline; Timestamp = timestamp; } - [DataMember(Order = 1)] - public TaskId Id { get; private set; } + [DataMember(Order = 1)] public TaskId Id { get; private set; } + [DataMember(Order = 2)] public UserId UserId { get; private set; } + [DataMember(Order = 3)] public string Name { get; private set; } + [DataMember(Order = 4)] public DateTimeOffset Deadline { get; private set; } + [DataMember(Order = 5)] public DateTimeOffset Timestamp { get; private set; } - [DataMember(Order = 2)] - public UserId UserId { get; private set; } - - [DataMember(Order = 3)] - public string Name { get; private set; } - - [DataMember(Order = 4)] - public DateTimeOffset Timestamp { get; private set; } - - public override string ToString() - { - return $"Create a task with id '{Id}' and name '{Name}' for user [{UserId}]."; - } + public override string ToString() => $"Create task '{Name}' ({Id}) for user {UserId}."; } ``` {% endtab %} -{% tab title="Event" %} +{% tab title="TaskCreated" %} ```csharp [DataContract(Name = "728fc4e7-628b-4962-bd68-97c98aa05694")] public class TaskCreated : IEvent { TaskCreated() { } - public TaskCreated(TaskId id, UserId userId, string name, DateTimeOffset timestamp) + public TaskCreated(TaskId id, UserId userId, string name, DateTimeOffset deadline, DateTimeOffset timestamp) { Id = id; UserId = userId; Name = name; - CreatedAt = DateTimeOffset.UtcNow; + Deadline = deadline; Timestamp = timestamp; } - [DataMember(Order = 1)] - public TaskId Id { get; private set; } + [DataMember(Order = 1)] public TaskId Id { get; private set; } + [DataMember(Order = 2)] public UserId UserId { get; private set; } + [DataMember(Order = 3)] public string Name { get; private set; } + [DataMember(Order = 4)] public DateTimeOffset Deadline { get; private set; } + [DataMember(Order = 5)] public DateTimeOffset Timestamp { get; private set; } - [DataMember(Order = 2)] - public UserId UserId { get; private set; } - - [DataMember(Order = 3)] - public string Name { get; private set; } - - [DataMember(Order = 4)] - public DateTimeOffset CreatedAt { get; private set; } - - [DataMember(Order = 5)] - public DateTimeOffset Timestamp { get; private set; } - - public override string ToString() - { - return $"Task with id '{Id}' and name '{Name}' for user [{UserId}] at {CreatedAt} has been created."; - } + public override string ToString() => $"Task '{Name}' ({Id}) created for user {UserId}."; } ``` {% endtab %} {% endtabs %} -### Create an Aggregate and Application Service +## 3. Aggregate and state -Add [Aggregate ](../../cronus-framework/domain-modeling/aggregate.md)that inherits [AggregateRoot ](../../cronus-framework/domain-modeling/aggregate.md#aggregate-root)with the generic [state](../../cronus-framework/domain-modeling/aggregate.md#aggregate-root-state). +The aggregate is the only object allowed to call `Apply`. The state folds each event into itself via a `When(TEvent)` handler. +{% code title="TaskAggregate.cs" %} ```csharp public class TaskAggregate : AggregateRoot { - public TaskAggregate() { } + TaskAggregate() { } public void CreateTask(TaskId id, UserId userId, string name, DateTimeOffset deadline) { - IEvent @event = new TaskCreated(id, userId, name, deadline); - Apply(@event); + Apply(new TaskCreated(id, userId, name, deadline, DateTimeOffset.UtcNow)); } } ``` +{% endcode %} -Apply method will pass the event to the state of an aggregate and change its state. - +{% code title="TaskState.cs" %} ```csharp public class TaskState : AggregateRootState { public override TaskId Id { get; set; } - public UserId UserId { get; set; } - public string Name { get; set; } - public DateTimeOffset CreatedAt { get; set; } - public DateTimeOffset Deadline { get; set; } - public void When(TaskCreated @event) + public void When(TaskCreated e) { - Id = @event.Id; - UserId = @event.UserId; - Name = @event.Name; - CreatedAt = @event.CreatedAt; - Deadline = @event.Timestamp; + Id = e.Id; + UserId = e.UserId; + Name = e.Name; + CreatedAt = e.Timestamp; + Deadline = e.Deadline; } } ``` +{% endcode %} + +## 4. Application service -Finally, we can create an [Application Service](../../cronus-framework/domain-modeling/handlers/application-services.md) for command handling. +`ApplicationService` gives you the `repository` field. `ICommandHandler.HandleAsync` is the async entry point — load the aggregate, decide whether to create it, save it. +{% code title="TaskAppService.cs" %} ```csharp -[DataContract(Name = "ef669879-5d35-4cb7-baea-39a7c46c9e13")] -public class TaskService : ApplicationService, -ICommandHandler +public class TaskAppService : ApplicationService, + ICommandHandler { public TaskService(IAggregateRepository repository) : base(repository) { } public async Task HandleAsync(CreateTask command) { - ReadResult taskResult = await repository.LoadAsync(command.Id).ConfigureAwait(false); - if (taskResult.NotFound) - { - var task = new TaskAggregate(); - task.CreateTask(command.Id, command.UserId, command.Name, DateTimeOffset.UtcNow); - await repository.SaveAsync(task).ConfigureAwait(false); - } + ReadResult existing = await repository.LoadAsync(command.Id).ConfigureAwait(false); + if (existing.IsSuccess) return; // idempotent — already created + + var task = new TaskAggregate(command.Id, command.UserId, command.Name, command.Deadline); + await repository.SaveAsync(task).ConfigureAwait(false); } } ``` +{% endcode %} -We register a handler by inheriting from `ICommandHandler<>.` When the command arrives we read the state of the aggregate, and if it is not found we create a new one and call `SaveAsync` to save its state to the database. +{% hint style="info" %} +`ReadResult` exposes `IsSuccess`, `NotFound`, `HasError`, and `Error` — use them to branch deliberately instead of catching exceptions. +{% endhint %} -### Create Controller and send a request +## 5. API controller -Now we need a controller to publish our commands and create tasks. +Inject `IPublisher` and `await publisher.PublishAsync(...)`. The method returns `true` when the transport accepted the command. {% tabs %} {% tab title="Controller" %} @@ -191,30 +187,28 @@ Now we need a controller to publish our commands and create tasks. [Route("[controller]/[action]")] public class TaskController : ControllerBase { - private readonly IPublisher _publisher; + private readonly IPublisher publisher; public TaskController(IPublisher publisher) { - _publisher = publisher; + this.publisher = publisher; } [HttpPost] - public IActionResult CreateTask(CreateTaskRequest request) + public async Task CreateTask(CreateTaskRequest request, CancellationToken ct) { - string id = Guid.NewGuid().ToString(); - string Userid = Guid.NewGuid().ToString(); - TaskId taskId = new TaskId(id); - UserId userId = new UserId(Userid); - var expireDate = DateTimeOffset.UtcNow; - expireDate.AddDays(request.DaysActive); - - CreateTask command = new CreateTask(taskId, userId, request.Name, expireDate); - - if (_publisher.Publish(command) == false) - { - return Problem($"Unable to publish command. {command.Id}: {command.Name}"); - }; - return Ok(id); + const string tenant = "tenant"; // must match Cronus:Tenants from appsettings + + var taskId = new TaskId(tenant, Guid.NewGuid().ToString()); + var userId = new UserId(tenant, request.UserId); + var deadline = DateTimeOffset.UtcNow.AddDays(request.DaysActive); + + var command = new CreateTask(taskId, userId, request.Name, deadline, DateTimeOffset.UtcNow); + + if (await publisher.PublishAsync(command).ConfigureAwait(false) == false) + return Problem($"Unable to publish {command}."); + + return Accepted(taskId.Value); } } ``` @@ -224,28 +218,52 @@ public class TaskController : ControllerBase ```csharp public class CreateTaskRequest { - [Required] - public string Name { get; set; } - - [Required] - public int DaysActive { get; set; } + [Required] public string UserId { get; set; } + [Required] public string Name { get; set; } + [Required] public int DaysActive { get; set; } } ``` {% endtab %} {% endtabs %} -Here we create _TaskId_ and _UserId_ and inject_`IPublisher`_to publish the command. After this, the command will be sent to RabbitMq and then handled in Application Service. +## 6. Run it -Now let's start our Service and API. \ -We should be able to make post requests to our Controller throw the Swagger and create our first Task in the system. It must be persisted in the [Event Store](../../cronus-framework/event-store/). +With both the worker and the API running (see [setup.md](setup.md)), `POST /Task/CreateTask`: -![I highly recommend debugging on the first run to better understand the flow of program execution.](<../../.gitbook/assets/image (10).png>) +```json +{ + "userId": "alice", + "name": "Write the quick start", + "daysActive": 7 +} +``` + +The API returns `202 Accepted` with the task URN (for example `urn:tenant:task:c9b3…`). Internally: + +1. The controller publishes `CreateTask` onto RabbitMQ. +2. The worker's subscriber picks it up and runs `TaskAppService.HandleAsync`. +3. The service constructs a new `TaskAggregate`, which applies a `TaskCreated` event. +4. `repository.SaveAsync(task)` writes the event to Cassandra. + +{% hint style="success" %} +If you watch the worker logs you should see a `CronusWorkflowHandle` line saying `TaskAppService handled CreateTask in X.XXXXms.` — that is the diagnostics workflow confirming the handler ran. +{% endhint %} -### Inspection of the Event Store +## 7. Inspect the Event Store + +Take the task URN from the response, Base64-encode it, and query the Cassandra `taskmanagerevents` table: + +```shell +echo -n 'urn:tenant:task:c9b3...' | base64 +# e.g. dXJuOnRlbmFudDp0YXNrOmM5YjMu... + +cqlsh -e "select * from taskmanager_es.taskmanagerevents where id = 'dXJuOnRlbmFudDp0YXNrOmM5YjMu...';" +``` -Download [DevCenter ](https://downloads.datastax.com/#devcenter)or any other UI tool for Cassandra. +You should see exactly one row — the `TaskCreated` event, serialised, tagged with its `[DataContract(Name = …)]` GUID. Restart the worker and run a `LoadAsync(taskId)`: Cronus rebuilds the aggregate by replaying that single event back into the state. -Let's take an Id from the response and encode it to Base64.\ -Than try: `select * from taskmanagerevents where id = 'dXJuOnRlbmFudDp0YXNrOmU1MjA1NTA3LWYyNmUtNGExMy05OTU4LTNjMzVlYzAwY2I1Yw=='` +Next stop — turning those events into a read model. -![Use DevCenter tool for Cassandra visualization.](<../../.gitbook/assets/image (9).png>) +{% content-ref url="explore-projections.md" %} +[explore-projections.md](explore-projections.md) +{% endcontent-ref %} diff --git a/docs/getting-started/quick-start/setup.md b/docs/getting-started/quick-start/setup.md index 9afd4634..9b16933b 100644 --- a/docs/getting-started/quick-start/setup.md +++ b/docs/getting-started/quick-start/setup.md @@ -1,49 +1,82 @@ --- -description: 'Prerequisite software: Docker' +description: Prepare a two-process TaskManager skeleton — API + worker — with Cronus, Cassandra and RabbitMQ. --- # Setup -### Creating a projects +This quick start builds a tiny task-management service in two processes: -Create a new console application project in a new folder using dotnet command. +* **TaskManager.Api** — an ASP.NET Core Web API that accepts requests from clients and publishes commands to Cronus. +* **TaskManager.Service** — a worker host that consumes commands, persists events, builds projections, and runs the rest of the Cronus pipeline. -``` -> dotnet new console --name TaskManager.Service -``` +Splitting API and worker is the recommended production topology. The API process stays fast and stateless; the worker process owns all long-running background work. -Also, create a Web API project using the same folder for communicating with our Service. Then add both projects to the common solution. +## Prerequisites +* .NET 8 or .NET 9 SDK +* Docker (for Cassandra and RabbitMQ) +* An IDE — Visual Studio, Rider, or VS Code + +## 1. Create the solution + +```shell +mkdir TaskManager && cd TaskManager +dotnet new sln --name TaskManager + +dotnet new webapi --name TaskManager.Api +dotnet new worker --name TaskManager.Service + +dotnet sln add TaskManager.Api TaskManager.Service ``` - dotnet new webapi --name TaskManager.Api -``` -Then we add the Cronus dependency. +## 2. Add the Cronus packages + +The API only publishes commands, so it needs the core package; the worker persists events, builds projections and moves data over RabbitMQ, so it needs the full transport and persistence story. ```shell -cd TaskManager.Api -dotnet add package Cronus - -cd ../TaskManager.Service -dotnet add package Cronus -dotnet add package Cronus.Transport.RabbitMQ -dotnet add package Cronus.Persistence.Cassandra -dotnet add package Cronus.Serialization.NewtonsoftJson -dotnet add package Microsoft.Extensions.Hosting +# API — publisher only +dotnet add TaskManager.Api package Cronus + +# Worker — full Cronus host +dotnet add TaskManager.Service package Cronus +dotnet add TaskManager.Service package Cronus.Transport.RabbitMQ +dotnet add TaskManager.Service package Cronus.Persistence.Cassandra +dotnet add TaskManager.Service package Cronus.Projections.Cassandra +dotnet add TaskManager.Service package Cronus.Serialization.NewtonsoftJson ``` -This is the minimum set of packages for our Cronus host to work. +{% hint style="warning" %} +The projections package is `Cronus.Projections.Cassandra` (plural). The older, singular `Cronus.Projection.Cassandra` is a different and obsolete package name. +{% endhint %} -### Run docker images +## 3. Start the backing services -* Setup Cassandra (Container memory is limited to 2GB):\ - `docker run --restart=always -d --name cassandra -p 9042:9042 -p 9160:9160 -p 7199:7199 -p 7001:7001 -p 7000:7000 cassandra` -* Setup RabbitMq (Container memory is limited to 512MB):\ - `docker run --restart=always -d --hostname node1 -e RABBITMQ_NODENAME=docker-UNIQUENAME-rabbitmq --name rabbitmq -p 15672:15672 -p 5672:5672 elders/rabbitmq:3.8.3` +From a clean Docker environment, the simplest way to get Cassandra and RabbitMQ running is: + +```shell +# Cassandra 4.0 — Cronus EventStore +docker run --restart=always -d --name cassandra \ + -p 9042:9042 \ + cassandra:4.0 + +# RabbitMQ 3.9.11 (eldersoss image — management plugin included) +docker run --restart=always -d --name rabbitmq \ + -p 5672:5672 -p 15672:15672 \ + -e RABBITMQ_DEFAULT_USER=user \ + -e RABBITMQ_DEFAULT_PASS=pass \ + -e RABBITMQ_DEFAULT_VHOST=rabbit \ + eldersoss/rabbitmq:3.9.11 +``` -### Setup configuration file +The image tags above match the ones used by the Elders platform's shared `docker-compose.yml`. Use them to keep local development aligned with CI/CD. -Add _appsettings.json_ with the following configuration into the project folder. +{% hint style="info" %} +The RabbitMQ management UI is at [http://localhost:15672](http://localhost:15672) (user `user` / pass `pass`). Cassandra has no bundled UI — use [DataStax DevCenter](https://downloads.datastax.com/#devcenter) or `cqlsh` to inspect tables. +{% endhint %} + +## 4. Write `appsettings.json` + +Both processes share the same Cronus configuration. Create identical `appsettings.json` files in `TaskManager.Api` and `TaskManager.Service`: //This should be int the Service and in the Api. @@ -54,97 +87,123 @@ Add _appsettings.json_ with the following configuration into the project folder. "BoundedContext": "taskmanager", "Tenants": [ "tenant" ], "Transport": { - "RabbitMQ": { - "Server": "127.0.0.1", - "VHost": "taskmanager" - }, - "PublicRabbitMQ": [ - { - "Server": "127.0.0.1", - "VHost": "unicom-public", - "FederatedExchange": { - "UpstreamUri": "guest:guest@localhost:5672", - "VHost": "unicom-public", - "UseSsl": false, - "MaxHops": 1 - } - } - ] + "RabbitMQ": { + "Server": "127.0.0.1", + "Port": 5672, + "VHost": "rabbit", + "Username": "user", + "Password": "pass" + } }, "Persistence": { - "Cassandra": { - "ConnectionString": "Contact Points=127.0.0.1;Port=9042;Default Keyspace=taskmanager_es" - } + "Cassandra": { + "ConnectionString": "Contact Points=127.0.0.1;Port=9042;Default Keyspace=taskmanager_es" + } }, "Projections": { - "Cassandra": { - "ConnectionString": "Contact Points=127.0.0.1;Port=9042;Default Keyspace=taskmanager_projections" - } - }, - "Cluster": { - "Consul": { - "Address": "127.0.0.1" - } - }, - "AtomicAction": { - "Redis": { - "ConnectionString": "127.0.0.1:6379" - } + "Cassandra": { + "ConnectionString": "Contact Points=127.0.0.1;Port=9042;Default Keyspace=taskmanager_projections" + } } } } ``` {% endcode %} -You can also see how the Cronus application can be configured in more detail in [Configuration.](../../cronus-framework/configuration.md) +The full list of configuration keys lives on the configuration page: -This is the code that your _Program.cs_ in TaskManager.Service should contain. +{% content-ref url="../../cronus-framework/configuration.md" %} +[configuration.md](../../cronus-framework/configuration.md) +{% endcontent-ref %} -{% code title="Program.cs" %} -```c# -using Cronus11Service; -using Elders.Cronus; +## 5. Turn off background consumers in the API -IHost host = Host.CreateDefaultBuilder(args) - .ConfigureServices((hostContext, services) => - { - services.AddHostedService(); - services.AddCronus(hostContext.Configuration); - - }) - .UseDefaultServiceProvider((context, options) => - { - options.ValidateScopes = context.HostingEnvironment.IsDevelopment(); - options.ValidateScopes = false; - options.ValidateOnBuild = false; - }) - .Build(); - -host.Run(); +The API process must not run application-service or projection consumers — that is the worker's job. Switch the relevant feature flags off in `TaskManager.Api/appsettings.json`: + +{% code title="TaskManager.Api/appsettings.json" %} +```json +{ + "Cronus": { + "ApplicationServicesEnabled": false, + "ProjectionsEnabled": false, + "PortsEnabled": false, + "SagasEnabled": false, + "GatewaysEnabled": false, + "TriggersEnabled": false + } +} ``` {% endcode %} -This is the code that you should add in the _Program.cs_ in TaskManager.Api. +{% hint style="success" %} +You **should** turn off every consumer in the API process. The API only needs the publisher side of Cronus — it pushes commands to the bus and returns. The worker process keeps the defaults (all consumers `true`) and does the heavy lifting. +{% endhint %} -{% code title="Program.cs" %} -```c# -builder.Services.AddCronus(builder.Configuration); +## 6. Wire the worker + +{% code title="TaskManager.Service/Program.cs" %} +```csharp +using Elders.Cronus; + +IHost host = Host.CreateDefaultBuilder(args) + .ConfigureServices((ctx, services) => + { + services.AddCronus(ctx.Configuration); + services.AddHostedService(); + }) + .Build(); + +await host.RunAsync(); -builder.Host.UseDefaultServiceProvider((context, options) => +sealed class CronusBackgroundService : BackgroundService { - options.ValidateScopes = context.HostingEnvironment.IsDevelopment(); - options.ValidateScopes = false; - options.ValidateOnBuild = false; + private readonly ICronusHost cronus; + public CronusBackgroundService(ICronusHost cronus) => this.cronus = cronus; + + protected override async Task ExecuteAsync(CancellationToken _) + { + await cronus.StartAsync().ConfigureAwait(false); + } } -); +``` +{% endcode %} + +`services.AddCronus(configuration)` registers every Cronus core service, scans your assemblies for application services, projections, sagas, ports, triggers and gateways, and binds the configuration options. `ICronusHost.StartAsync()` starts the subscribers based on your feature flags. -.... +## 7. Wire the API -app.UseCronusAspNetCore(); +{% code title="TaskManager.Api/Program.cs" %} +```csharp +using Elders.Cronus; +var builder = WebApplication.CreateBuilder(args); + +builder.Services.AddControllers(); +builder.Services.AddCronus(builder.Configuration); + +var app = builder.Build(); + +app.MapControllers(); +await app.RunAsync(); ``` {% endcode %} -### F5 +The API can now inject `IPublisher` into controllers and dispatch commands. + +## 8. Run both processes + +Open two terminals: + +```shell +# Terminal 1 — worker +dotnet run --project TaskManager.Service + +# Terminal 2 — API +dotnet run --project TaskManager.Api +``` + +If both processes start cleanly (no exceptions in the logs) your Cronus skeleton is ready. Proceed to the next page to model a domain and persist your first event. -![Ensure that service has been started properly.](../../.gitbook/assets/CronusStarting.gif) +{% content-ref url="persist-first-event.md" %} +[persist-first-event.md](persist-first-event.md) +{% endcontent-ref %} diff --git a/docs/message-handlers/application-services.md b/docs/message-handlers/application-services.md deleted file mode 100644 index fc35a3c6..00000000 --- a/docs/message-handlers/application-services.md +++ /dev/null @@ -1,53 +0,0 @@ -# Application Services - -This is a handler where commands are received and delivered to the addressed Aggregate. We call these handlers _ApplicationService_. This is the _write side_ in CQRS. - -## Communication Guide Table - -| Triggered by | Description | -| :--- | :--- | -| Command | A command is used to dispatch domain model changes. It can either be accepted or rejected depending on the domain model invariants | - -## Best Practices - -{% hint style="success" %} -**You can/should/must...** - -* an appservice **can** load an aggregate root from the event store -* an appservice **can** save new aggregate root events to the event store -* an appservice **can** establish calls to the ReadModel \(not common practice but sometimes needed\) -* an appservice **can** establish calls to external services -* you **can** do dependency orchestration -* an appservice **must** be stateless -* an appservice **must** update only one aggregate root. Yes, this means that you can create one aggregate and update another one but think twice -{% endhint %} - -{% hint style="warning" %} -**You should not...** - -* an appservice **should not** update more than one aggregate root in single command/handler -* you **should not** place domain logic inside an application service -* you **should not** use application service to send emails, push notifications etc. Use Port or Gateway instead -* an appservice **should not** update the ReadModel -{% endhint %} - -## Examples - -```csharp -public class AccountAppService : AggregateRootApplicationService, - ICommandHandler, - ICommandHandler, - ICommandHandler, - ICommandHandler, - ICommandHandler, - ICommandHandler -{ - public void Handle(SuspendAccount message) - { - Update(message.Id, account => account.Suspend()); - } - - ... -} -``` - diff --git a/docs/message-handlers/gateways.md b/docs/message-handlers/gateways.md deleted file mode 100644 index 9a6cd735..00000000 --- a/docs/message-handlers/gateways.md +++ /dev/null @@ -1,18 +0,0 @@ -# Gateways - -Compared to Port, which can dispatch a command, a Gateway can do the same but it also has a persistent state. A scenario could be sending commands to external BC, such as push notifications, emails etc. There is no need to event source this state and its perfectly fine if this state is wiped. Example: iOS push notifications badge. This state should be used only for infrastructure needs and never for business cases. Compared to Projection, which tracks events, projects their data, and are not allowed to send any commands at all, a Gateway can store and track metadata required by external systems. Furthermore, Gateways are restricted and not touched when events are replayed. - -## Communication Guide Table - -| Triggered by | Description | -| :--- | :--- | -| Event | Domain events represent business changes which have already happened | - -## Best Practices - -{% hint style="success" %} -**You can/should/must...** - -* a gateway **can** send new commands -{% endhint %} - diff --git a/docs/message-handlers/ports.md b/docs/message-handlers/ports.md deleted file mode 100644 index 02e9aac8..00000000 --- a/docs/message-handlers/ports.md +++ /dev/null @@ -1,22 +0,0 @@ -# Ports - -Port is the mechanism to establish communication between aggregates. Usually this involves one aggregate who triggered an event and one aggregate which needs to react. - -If you feel the need to do more complex interactions, it is advised to use Saga. The reason for this is that ports do not provide a transparent view of the business flow because they do not have persistent state. - -## Communication Guide Table - -| Triggered by | Description | -| :--- | :--- | -| Event | Domain events represent business changes which have already happened | - -## Best Practices - -{% hint style="success" %} -**You can/should/must...** - -* a port can send a command -{% endhint %} - - - diff --git a/docs/message-handlers/projections.md b/docs/message-handlers/projections.md deleted file mode 100644 index 17d00de2..00000000 --- a/docs/message-handlers/projections.md +++ /dev/null @@ -1,30 +0,0 @@ -# Projections - -Projection tracks events and project their data for specific purposes. - -## Communication Guide Table - -| Triggered by | Description | -| :--- | :--- | -| Event | Domain events represent business changes which have already happened | - -## Best Practices - -{% hint style="success" %} -**You can/should/must...** - -* a projection **must** be idempotent -* a projection **must not** issue new commands or events -{% endhint %} - -{% hint style="warning" %} -**You should not...** - -* a projection **should not** query other projections. All the data of a projection must be collected from the Events' data -* a projection **should not** establish calls to external systems -{% endhint %} - - - - - diff --git a/docs/message-handlers/sagas.md b/docs/message-handlers/sagas.md deleted file mode 100644 index b499847e..00000000 --- a/docs/message-handlers/sagas.md +++ /dev/null @@ -1,22 +0,0 @@ ---- -description: Sometimes called a Process Manager ---- - -# Sagas - -When we have a workflow, which involves several aggregates it is recommended to have the whole process described in a single place such as а Saga/ProcessManager. - -## Communication Guide Table - -| Triggered by | Description | -| :--- | :--- | -| Event | Domain events represent business changes which have already happened | - -## Best Practices - -{% hint style="success" %} -**You can/should/must...** - -* a saga **can** send new commands -{% endhint %} - diff --git a/docs/message-handlers/triggers.md b/docs/message-handlers/triggers.md deleted file mode 100644 index e92ff692..00000000 --- a/docs/message-handlers/triggers.md +++ /dev/null @@ -1,2 +0,0 @@ -# Triggers - From e8b288462634cc4a4a1f0307306c960637f17d40 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Tue, 28 Apr 2026 17:48:03 +0300 Subject: [PATCH 06/21] docs: Adds five framework reference pages (atomic actions, Aspire wiring, multi-process topology, projection markers, integrity validation) - atomic-actions.md: documents IAggregateRootAtomicAction + ILock, Redis/Consul/Missing implementations, configuration anchors and failure modes. - aspire-cronus-wiring.md: documents the AppHost/worker wiring contract, env-var-to-config mapping and cross-links to multi-process topology. - multi-process-topology.md: documents the Cronus:*Enabled toggle pattern and four common topologies (all-in-one, API+worker, read-only, migration). - projections/projection-markers.md: documents INonVersionableProjection and INonRebuildableProjection, including their interaction with MarkupInterfaceProjectionVersioningPolicy and ProjectionHasher. - integrity-validation-and-dangerzone.md: documents IIntegrityPolicy and the EventStreamIntegrityPolicy default rules; explicitly notes that no DangerZone bypass is shipped. - SUMMARY.md: adds entries for all five new pages. - versioning.md: adds a "see also" cross-link to projection-markers.md (existing INonVersionableProjection mention preserved). --- docs/SUMMARY.md | 5 + docs/cronus-framework/aspire-cronus-wiring.md | 128 ++++++++++++++ docs/cronus-framework/atomic-actions.md | 83 +++++++++ .../integrity-validation-and-dangerzone.md | 158 ++++++++++++++++++ .../multi-process-topology.md | 134 +++++++++++++++ .../projections/projection-markers.md | 121 ++++++++++++++ .../projections/versioning.md | 1 + 7 files changed, 630 insertions(+) create mode 100644 docs/cronus-framework/aspire-cronus-wiring.md create mode 100644 docs/cronus-framework/atomic-actions.md create mode 100644 docs/cronus-framework/integrity-validation-and-dangerzone.md create mode 100644 docs/cronus-framework/multi-process-topology.md create mode 100644 docs/cronus-framework/projections/projection-markers.md diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 8accb58f..3b88e61f 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -41,6 +41,7 @@ * [Copy EventStore](cronus-framework/event-store/migrations/copy-eventstore.md) * [Projections](cronus-framework/projections/README.md) * [Versioning](cronus-framework/projections/versioning.md) + * [Projection Markers](cronus-framework/projections/projection-markers.md) * [Snapshots](cronus-framework/projections/snapshots.md) * [Workflows](cronus-framework/workflows.md) * [Indices](cronus-framework/indices.md) @@ -49,7 +50,11 @@ * [Jobs](cronus-framework/cluster/jobs.md) * [Messaging](cronus-framework/messaging/README.md) * [Serialization](cronus-framework/messaging/serialization.md) +* [Atomic Actions](cronus-framework/atomic-actions.md) * [Configuration](cronus-framework/configuration.md) +* [Aspire and Cronus](cronus-framework/aspire-cronus-wiring.md) +* [Multi-process topology](cronus-framework/multi-process-topology.md) +* [Integrity Validation](cronus-framework/integrity-validation-and-dangerzone.md) * [Extensibility](cronus-framework/extensibility/README.md) * [Discoveries](cronus-framework/extensibility/discoveries.md) * [Startup Attribute](cronus-framework/extensibility/startup-attribute.md) diff --git a/docs/cronus-framework/aspire-cronus-wiring.md b/docs/cronus-framework/aspire-cronus-wiring.md new file mode 100644 index 00000000..63493d42 --- /dev/null +++ b/docs/cronus-framework/aspire-cronus-wiring.md @@ -0,0 +1,128 @@ +# Aspire and Cronus + +[.NET Aspire](https://learn.microsoft.com/dotnet/aspire/) is the canonical way to compose a Cronus solution today. The AppHost owns the topology — Cassandra, RabbitMQ, Redis, Consul, Elasticsearch, plus every Cronus process that consumes them — and it injects connection strings into each process as environment variables that map directly onto the `Cronus:*` configuration keys. The Cronus side stays standard ASP.NET Core hosting; the only thing the AppHost replaces is "how does the worker find the broker?". + +This page is the wiring contract: what the AppHost must do, what the worker must do, and how the two halves talk through `IConfiguration`. + +## What the AppHost is responsible for + +The AppHost is a [`DistributedApplication`](https://learn.microsoft.com/dotnet/aspire/fundamentals/app-host-overview) project. It does three things for Cronus: + +1. **Stand up infrastructure resources** — Cassandra, RabbitMQ, Redis, optionally Consul, Elasticsearch — using the standard Aspire builder API (`builder.AddRedis`, `builder.AddRabbitMQ`, `builder.AddContainer` for things without a first-party integration). +2. **Reference each resource from each Cronus process** with `WithReference(...)`, so Aspire knows the dependency graph and provides the connection strings. +3. **Map every connection string onto the `Cronus:*` configuration key the worker expects**, using `WithEnvironment(...)` and the double-underscore convention that ASP.NET Core configuration interprets as `Cronus:Foo:Bar`. + +The third step is the only Cronus-specific bit. The worker process expects (for example) `Cronus:Persistence:Cassandra:ConnectionString`. Aspire only knows it has a Cassandra container at some endpoint. The AppHost is the place where you say "the contact points the worker should use are this Aspire endpoint": + +```csharp +.WithEnvironment("Cronus__Persistence__Cassandra__ConnectionString", + () => $"Contact Points={cassandra.GetEndpoint("cql").Host};Port={cassandra.GetEndpoint("cql").Port};Default Keyspace=billing") +``` + +Pulled from the Locus AppHost — see [`Elders.Locus.AppHost/AppHost.cs`](https://github.com/Elders/locus.backend/blob/master/src/Elders.Locus.AppHost/AppHost.cs) for the whole file. + +## What the worker is responsible for + +The worker stays a vanilla ASP.NET Core or worker-service host. It calls `AddServiceDefaults()` (the Aspire defaults extension method), `AddCronus(configuration)`, and that is the wiring done. Nothing in the Cronus call chain knows about Aspire — the configuration providers built into `IConfiguration` already turn `Cronus__Persistence__Cassandra__ConnectionString` (env var) into `Cronus:Persistence:Cassandra:ConnectionString` (configuration key), and `AddCronus` reads from there. + +The Locus API's `Program.cs` shows the canonical ordering — service defaults first, then `AddCronus`: + +```csharp +var builder = WebApplication.CreateBuilder(args); +builder.AddServiceDefaults(); + +// ... + +builder.Services.AddCronusAspNetCore(); +builder.Services.AddCronus(configuration); +``` + +Source: [`Elders.Locus.Api/Program.cs`](https://github.com/Elders/locus.backend/blob/master/src/Elders.Locus.Api/Program.cs). + +[`AddCronus`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/CronusServiceCollectionExtensions.cs) takes the `IConfiguration` from the host, runs the discovery scan ([`DiscoveryScanner`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Discoveries/DiscoveryScanner.cs)) over every assembly that `AssemblyLoader` found in the output directory, and registers everything the satellites declare. There is no Aspire-specific call. + +## A minimal AppHost + +A worker host that needs Cassandra, RabbitMQ and Redis looks like this: + +```csharp +var builder = DistributedApplication.CreateBuilder(args); + +var redis = builder.AddRedis("redis") + .WithLifetime(ContainerLifetime.Persistent); + +var cassandra = builder.AddContainer("cassandra", "cassandra", "latest") + .WithLifetime(ContainerLifetime.Persistent) + .WithEndpoint(port: 9042, targetPort: 9042, name: "cql") + .WithEnvironment("CASSANDRA_CLUSTER_NAME", "BillingCluster"); + +var rabbitmq = builder.AddRabbitMQ("rabbitmq") + .WithLifetime(ContainerLifetime.Persistent) + .WithManagementPlugin(15672); + +builder.AddProject("billing-worker") + .WithReference(redis) + .WithReference(rabbitmq) + .WithEnvironment("Cronus__Persistence__Cassandra__ConnectionString", + () => $"Contact Points={cassandra.GetEndpoint("cql").Host};Port={cassandra.GetEndpoint("cql").Port};Default Keyspace=billing") + .WithEnvironment("Cronus__Projections__Cassandra__ConnectionString", + () => $"Contact Points={cassandra.GetEndpoint("cql").Host};Port={cassandra.GetEndpoint("cql").Port};Default Keyspace=billing_projections") + .WithEnvironment("Cronus__Transport__RabbitMQ__Server", + () => rabbitmq.GetEndpoint("tcp").Host) + .WithEnvironment("Cronus__Transport__RabbitMQ__Port", + () => rabbitmq.GetEndpoint("tcp").Port.ToString()) + .WithEnvironment("Cronus__Transport__RabbitMQ__Username", rabbitmq.Resource.UserNameReference) + .WithEnvironment("Cronus__Transport__RabbitMQ__Password", rabbitmq.Resource.PasswordParameter) + .WithEnvironment("Cronus__AtomicAction__Redis__ConnectionString", + () => $"{redis.GetEndpoint("tcp").Host}:{redis.GetEndpoint("tcp").Port}") + .WaitFor(redis) + .WaitFor(cassandra) + .WaitFor(rabbitmq); + +builder.Build().Run(); +``` + +The pattern repeats for every Cronus process the AppHost owns. + +## Configuration mapping cheat sheet + +| Aspire resource | Maps to Cronus key | +| --- | --- | +| `cassandra.GetEndpoint("cql")` | [`Cronus:Persistence:Cassandra:ConnectionString`](configuration.md#cronus-persistence-cassandra-connectionstring), [`Cronus:Projections:Cassandra:ConnectionString`](configuration.md#cronus-projections-cassandra-connectionstring) | +| `rabbitmq.GetEndpoint("tcp")` | [`Cronus:Transport:RabbitMQ:Server`](configuration.md#cronus-transport-rabbitmq-server), [`Cronus:Transport:RabbitMQ:Port`](configuration.md#cronus-transport-rabbitmq-port) | +| `rabbitmq.GetEndpoint("management")` | [`Cronus:Transport:RabbitMQ:AdminPort`](configuration.md#cronus-transport-rabbitmq-adminport) | +| `redis.GetEndpoint("tcp")` | [`Cronus:AtomicAction:Redis:ConnectionString`](configuration.md#cronus-atomicaction-redis-connectionstring) | +| `consul.GetEndpoint("http")` | `Cronus:Cluster:Consul:Address` (where used) | + +For the full key reference see: + +{% content-ref url="configuration.md" %} +[configuration.md](configuration.md) +{% endcontent-ref %} + +## Multi-process topologies + +The AppHost is also where you decide how many Cronus processes there are and what each one does. A typical Aspire-era Cronus solution has at least two — an API host that dispatches commands and a worker host that runs projections, sagas, ports and gateways. The split is controlled by `Cronus:*Enabled` flags injected per process. That story has its own page: + +{% content-ref url="multi-process-topology.md" %} +[multi-process-topology.md](multi-process-topology.md) +{% endcontent-ref %} + +## Best Practices + +{% hint style="success" %} +**You can / should / must** + +* the AppHost **must** map every required `Cronus:*` connection string with `WithEnvironment(...)` — Cronus does not auto-discover Aspire endpoints +* the AppHost **must** declare `WithReference(...)` for every infrastructure resource a process touches, so Aspire wires service discovery and credentials +* the AppHost **should** call `WaitFor(...)` on every infrastructure dependency so processes do not start before their broker is reachable +* the worker **should** call `AddServiceDefaults()` before `AddCronus(configuration)` so OpenTelemetry, health checks and resilience are in place when Cronus boots +{% endhint %} + +{% hint style="warning" %} +**You should not** + +* the AppHost **should not** use the double-colon form (`Cronus:Foo:Bar`) for environment variables — use the double-underscore form (`Cronus__Foo__Bar`) so Linux containers receive the value +* the worker **should not** call `AddCronus` more than once or before `IConfiguration` has all of its providers — the discovery scan runs eagerly during the call +* a process **should not** ship its own bootstrap of Cassandra/RabbitMQ/Redis next to the AppHost — let the AppHost own the resource lifetime +{% endhint %} diff --git a/docs/cronus-framework/atomic-actions.md b/docs/cronus-framework/atomic-actions.md new file mode 100644 index 00000000..900f36bc --- /dev/null +++ b/docs/cronus-framework/atomic-actions.md @@ -0,0 +1,83 @@ +# Atomic Actions + +An atomic action is the cross-process lock Cronus takes around the "load aggregate, append events" critical section. It is the safety net that turns optimistic event-store concurrency into something safe to run on multiple hosts at once: at most one node, anywhere in the cluster, may append the next revision of a given aggregate at any given moment. + +You will see two interfaces under [`Elders.Cronus.AtomicAction`](https://github.com/Elders/Cronus/tree/master/src/Elders.Cronus/AtomicAction) — [`IAggregateRootAtomicAction`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/AtomicAction/IAggregateRootAtomicAction.cs) and [`ILock`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/AtomicAction/ILock.cs). The first is what `AggregateRepository.SaveAsync` uses; the second is the primitive lock that the atomic action is built on. Most projects only ever wire up an implementation; very few write a new one. + +## When you need it + +Read paths do not need an atomic action — they replay the stream and produce a state. Write paths do, because two hosts that load the same aggregate at the same revision and both try to append revision `N+1` would otherwise race. The Cronus event store catches a duplicate revision and rejects the second writer with [`AggregateStateFirstLevelConcurrencyException`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/AtomicAction/AggregateStateFirstLevelConcurrencyException.cs) — but the lost work has already happened. The atomic action stops the second writer earlier, before any side effect runs. + +You need a real implementation as soon as more than one process can write to the same aggregate. A single-process worker can run on the missing implementation; an Aspire-era Cronus solution where API hosts and worker hosts both call `SaveAsync` cannot. + +## The interfaces + +`IAggregateRootAtomicAction` is the contract `AggregateRepository` calls. The full surface is one method: + +```csharp +public interface IAggregateRootAtomicAction : IDisposable +{ + Task> ExecuteAsync(AggregateRootId arId, int aggregateRootRevision, Func action); +} +``` + +`ExecuteAsync` is given the aggregate id, the revision the caller intends to append, and the work that should run while the lock is held. The implementation acquires a lock keyed on `arId.Value`, runs `action`, releases the lock and returns a [`Result`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Userfull/Result.cs) — `IsSuccessful = true` if both the lock and the action succeeded. + +`ILock` is the lower-level primitive the atomic action sits on: + +```csharp +public interface ILock +{ + Task IsLockedAsync(string resource); + Task LockAsync(string resource, TimeSpan ttl); + Task UnlockAsync(string resource); +} +``` + +A lock is named (`resource`) and has a time-to-live. `LockAsync` returns `true` when the lock was acquired. `UnlockAsync` releases it explicitly; the TTL releases it implicitly if the holder dies. The same `ILock` instance is reused across many atomic actions. + +## Configuration keys + +The Redis-backed implementation binds two option classes against the `cronus:atomicaction:redis` section. The full reference, including defaults and validation, lives next to the rest of the framework configuration: + +{% content-ref url="configuration.md" %} +[configuration.md](configuration.md) +{% endcontent-ref %} + +Specifically: [`Cronus:AtomicAction:Redis:ConnectionString`](configuration.md#cronus-atomicaction-redis-connectionstring), [`Cronus:AtomicAction:Redis:LockTtl`](configuration.md#cronus-atomicaction-redis-lockttl), [`Cronus:AtomicAction:Redis:LongTtl`](configuration.md#cronus-atomicaction-redis-longttl), [`Cronus:AtomicAction:Redis:LockRetryCount`](configuration.md#cronus-atomicaction-redis-lockretrycount), [`Cronus:AtomicAction:Redis:LockRetryDelay`](configuration.md#cronus-atomicaction-redis-lockretrydelay) and [`Cronus:AtomicAction:Redis:ClockDriveFactor`](configuration.md#cronus-atomicaction-redis-clockdrivefactor). + +`LockTtl` is the short TTL applied while the action runs; `LongTtl` is the longer TTL applied after the action succeeds, to prevent a late-arriving node from re-running the same revision. The retry count and delay are passed through to RedLock; the clock-drive factor compensates for clock drift between Redis nodes. + +## Choosing an implementation + +Three implementations exist. The first is the only one you should default to. + +* [**Redis**](https://github.com/Elders/Cronus.AtomicAction.Redis) — the production choice. [`RedisAggregateRootAtomicAction`](https://github.com/Elders/Cronus.AtomicAction.Redis/blob/master/src/Elders.Cronus.AtomicAction.Redis/RedisAggregateRootAtomicAction.cs) wraps a [`RedisAggregateRootLock`](https://github.com/Elders/Cronus.AtomicAction.Redis/blob/master/src/Elders.Cronus.AtomicAction.Redis/AggregateRootLock/RedisAggregateRootLock.cs) (an `ILock` over `Elders.RedLock` 9.0.2 — the [Redlock algorithm](https://redis.io/topics/distlock)) plus an [`IRevisionStore`](https://github.com/Elders/Cronus.AtomicAction.Redis/blob/master/src/Elders.Cronus.AtomicAction.Redis/RevisionStore/IRevisionStore.cs) that remembers the last revision committed for each aggregate. Together they reject a stale `aggregateRootRevision` even if the lock acquisition itself was successful. The discovery — [`RedisAggregateRootAtomicActionDiscovery`](https://github.com/Elders/Cronus.AtomicAction.Redis/blob/master/src/Elders.Cronus.AtomicAction.Redis/RedisAggregateRootAtomicActionDiscovery.cs) — registers both `IAggregateRootAtomicAction` and `ILock` with `CanOverrideDefaults = true`, so adding the package is enough to replace the in-memory default. +* [**Consul**](https://github.com/Elders/Cronus.AtomicAction.Consul) — historical. The Consul implementation predates Cronus 11 and references the removed `IAggregateRootId` interface; it does not currently compile against Cronus 11.x and is not currently shipped as a working option. If you need a non-Redis implementation, use it as a sketch rather than as something you can drop in. +* [`MissingAggregateRootAtomicAction`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/AtomicAction/MissingAggregateRootAtomicAction.cs) — the in-process default. It implements both `IAggregateRootAtomicAction` and `ILock` by throwing `NotImplementedException` with a clear message: "The AggregateRootAtomicAction is not configured. Please install a nuget package which provides aggregate sync capabilities such as IAggregateRootAtomicAction. ex.: Cronus.AtomicAction.Redis." It is registered by Cronus core so the container resolves; the first save call is what fails. Treat it as "atomic actions are not wired up yet", not as "atomic actions are off". + +## Failure modes + +* **Lock not acquired.** `LockAsync` returned `false`. `RedisAggregateRootAtomicAction.ExecuteAsync` returns `Result(false).WithError("Failed to lock and execute atomic action.")`. `AggregateRepository.SaveInternalAsync` raises `AggregateStateFirstLevelConcurrencyException` from the result errors. The caller — typically an application service — sees the exception and is expected to either retry or abort the command. +* **Revision mismatch.** The lock was acquired but the revision the caller wants to append does not follow the last one the revision store remembers. The Redis implementation rolls back to the previous revision under `LongTtl` and returns failure. Same exception path as above. +* **Action threw.** The action ran but threw. The atomic action captures it (`Result.Error(ex)`), releases the lock, and the exception surfaces back through `SaveInternalAsync`. +* **Lock holder died.** Whoever held the lock crashed before calling `UnlockAsync`. The TTL releases it; the next caller acquires cleanly. This is why `LockTtl` defaults to one second — long enough for the work, short enough that nobody waits long for a dead holder. + +## Best Practices + +{% hint style="success" %} +**You can / should / must** + +* an atomic action **must** be configured on every process that calls `IAggregateRepository.SaveAsync` +* an atomic action **must** key its lock on `AggregateRootId.Value` so different aggregates do not contend +* a Redis atomic action **should** sit on a Redis instance with the same availability profile as the event store — if Redis is gone, every write fails +* an `ILock` **can** be reused as a generic distributed-lock primitive within the same bounded context; the `ILock` registration is independent of `IAggregateRootAtomicAction` +{% endhint %} + +{% hint style="warning" %} +**You should not** + +* an atomic action **should not** be configured with a `LockTtl` longer than the slowest expected `SaveAsync` — a long TTL turns a crash into a long stall +* an atomic action **should not** be skipped on "read-only" hosts that occasionally write — there is no such thing as "occasional" when it comes to event-store concurrency +* a process **should not** rely on `MissingAggregateRootAtomicAction` for anything other than the very first integration test; ship a real implementation before more than one writer exists +{% endhint %} diff --git a/docs/cronus-framework/integrity-validation-and-dangerzone.md b/docs/cronus-framework/integrity-validation-and-dangerzone.md new file mode 100644 index 00000000..eb4c18d5 --- /dev/null +++ b/docs/cronus-framework/integrity-validation-and-dangerzone.md @@ -0,0 +1,158 @@ +# Integrity validation + +When Cronus loads an aggregate from the event store, it does not blindly fold every event it finds into the state. The stream is first run through an **integrity policy** that checks the events make a coherent history — no duplicates, no out-of-order revisions, no holes. If the policy reports a violation that no resolver could fix, the load fails. This page covers the policy extension point, the default rules, and the (deliberate) absence of a "skip the policy" bypass in the shipped framework. + +## What integrity validation does + +Every load goes through [`AggregateRepository.LoadAsync`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/AggregateRepository.cs): + +```csharp +public async Task> LoadAsync(AggregateRootId id) where AR : IAggregateRoot +{ + EventStream eventStream = await eventStore.LoadAsync(id).ConfigureAwait(false); + var integrityResult = integrityPolicy.Apply(eventStream); + if (integrityResult.IsIntegrityViolated) + throw new EventStreamIntegrityViolationException($"AR integrity is violated for ID={id.Value}"); + eventStream = integrityResult.Output; + // ... fold into aggregate state ... +} +``` + +The repository asks the registered [`IIntegrityPolicy`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IntegrityValidation/IIntegrityPolicy.cs) to inspect the freshly loaded `EventStream`. If the policy returns `IsIntegrityViolated = true`, the load throws and the application service does not see a half-broken aggregate. If the policy is satisfied, the (possibly rewritten) stream is folded into the state. + +The policy in core is [`EventStreamIntegrityPolicy`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Integrity/EventStreamIntegrityPolicy.cs). It is composed of three rules applied in order: + +1. `DuplicateRevisionsValidator` — fails if two commits in the stream share a revision number. +2. `OrderedRevisionsValidator` paired with `UnorderedRevisionsResolver` — flags out-of-order revisions and lets the resolver attempt to put them back in order. +3. `MissingRevisionsValidator` — fails if a revision number is missing from the sequence. + +Each rule is an [`IntegrityRule`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IntegrityValidation/IntegrityRule.cs) — a pair of an [`IValidator`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IntegrityValidation/IValidator.cs) (does the stream look right?) and an [`IResolver`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IntegrityValidation/IResolver.cs) (if it does not, can we recover?). The default resolver is [`EmptyResolver`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IntegrityValidation/IResolver.cs), which always reports the stream as violated — no recovery, raise the exception. + +## The IIntegrityPolicy extension point + +The interface is small: + +```csharp +public interface IIntegrityPolicy +{ + IEnumerable> Rules { get; } + + IntegrityResult Apply(T candidate); +} +``` + +A policy returns a sealed [`IntegrityResult`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IntegrityValidation/IntegrityResult.cs): + +```csharp +public sealed class IntegrityResult +{ + public IntegrityResult(T output, bool isIntegrityViolated) { /* ... */ } + public bool IsIntegrityViolated { get; } + public T Output { get; } +} +``` + +`Output` is the (possibly rewritten) candidate. `IsIntegrityViolated` is the verdict. + +Validators implement [`IValidator`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IntegrityValidation/IValidator.cs): + +```csharp +public interface IValidator : IComparable> +{ + IValidatorResult Validate(T candidate); + uint PriorityLevel { get; } +} +``` + +[`IValidatorResult`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IntegrityValidation/IValidatorResult.cs) carries `IsValid`, an `ErrorType` discriminator and an `Errors` collection. The default concrete result is [`ValidatorResult`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IntegrityValidation/ValidatorResult.cs) — a list of error strings plus the type tag. + +Resolvers implement [`IResolver`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/IntegrityValidation/IResolver.cs): + +```csharp +public interface IResolver : IComparable> +{ + IntegrityResult Resolve(T eventStream, IValidatorResult validatorResult); + uint PriorityLevel { get; } +} +``` + +A resolver is given the offending stream and the validator's report. It returns either a recovered `IntegrityResult` with `IsIntegrityViolated = false` and a corrected `Output`, or it gives up and returns `IsIntegrityViolated = true`. + +## Replacing the policy + +The repository takes its `IIntegrityPolicy` from the container, so a satellite — or your own discovery — can replace it with a stricter or more permissive variant. Use a [discovery](extensibility/discoveries.md) with `CanOverrideDefaults = true`: + +```csharp +public class CustomIntegrityPolicyDiscovery : DiscoveryBase> +{ + protected override DiscoveryResult> DiscoverFromAssemblies(DiscoveryContext context) + { + var policyModel = new DiscoveredModel( + typeof(IIntegrityPolicy), + typeof(MyStrictEventStreamIntegrityPolicy), + ServiceLifetime.Singleton); + policyModel.CanOverrideDefaults = true; + + return new DiscoveryResult>([policyModel]); + } +} +``` + +The same shape lets you write a permissive policy that knows how to repair a specific corruption pattern that turned up once in production, by adding a custom resolver to the list. Keep that policy package gated to the bounded context that needs it. + +## Worked example: an unordered-revisions resolver + +Suppose a Cassandra topology change once produced streams where revisions are correct but the rows came back in the wrong order. The default `OrderedRevisionsValidator` flags this; the default resolver (in core, before [`EventStreamIntegrityPolicy`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/EventStore/Integrity/EventStreamIntegrityPolicy.cs) wires `UnorderedRevisionsResolver`) would refuse the load. A resolver that sorts the commits in place and returns a healthy result fixes the case without dropping data: + +```csharp +public sealed class SortByRevisionResolver : IResolver +{ + public uint PriorityLevel => 100; + + public int CompareTo(IResolver other) => PriorityLevel.CompareTo(other.PriorityLevel); + + public IntegrityResult Resolve(EventStream eventStream, IValidatorResult validatorResult) + { + var sorted = eventStream.Commits.OrderBy(c => c.Revision).ToList(); + var fixedStream = new EventStream(sorted); + return new IntegrityResult(fixedStream, isIntegrityViolated: false); + } +} +``` + +Pair it with the validator, register the rule on a custom policy, override the default through a discovery, and the next load that hits the bad pattern recovers without a manual repair. + +## "DangerZone" — what is and is not in the framework + +Some Cronus discussions refer to a **DangerZone** — a deliberate, ugly bypass of the integrity policy for one-off corrective actions. **There is no `DangerZone` namespace, no `IDangerZone` interface, and no `Cronus:DangerZone:*` configuration key in the shipped framework today**. Every load goes through the registered `IIntegrityPolicy`, and the only way to "bypass" the standard checks is to register a more permissive policy via a discovery — which is itself the first-class extension point. + +If you find code or scripts that mention DangerZone, treat them as out-of-band repair tooling that lives outside the Cronus host, not as a documented framework feature. The expected pattern for a corrective action is: + +1. Stop the host (or the relevant consumers). +2. Apply the repair directly against the event store, in a one-shot tool that you can audit. +3. Restart the host. + +Do not try to encode the bypass as a permanent flag. + +## Best Practices + +{% hint style="success" %} +**You can / should / must** + +* an integrity policy **must** treat a violation as a failure unless a resolver can prove the stream is recoverable +* a custom validator **should** populate `ErrorType` with a stable string so log searches can find every occurrence of the same defect +* a custom resolver **should** preserve every commit it can — repair, do not drop +* a stricter policy **can** be shipped as a satellite discovery and applied to a single bounded context where the constraint matters most +{% endhint %} + +{% hint style="warning" %} +**You should not** + +* a policy **should not** be replaced with a no-op "always succeed" implementation in production — the integrity check exists because the alternative is a silently corrupted aggregate +* a resolver **should not** mutate the input stream in place; build a new `EventStream` and return it via `IntegrityResult` +* an integrity violation **should not** be swallowed in the application-service layer; let `EventStreamIntegrityViolationException` surface to the caller and the operator +{% endhint %} + +{% hint style="danger" %} +**Repair is a one-off action, not a feature flag.** Even when there is a real, justified reason to load a stream that the policy refuses, do it from a separate one-shot tool that you can audit, with the host stopped, and put the policy back in place when you are done. There is no shipped flag that says "skip integrity validation for the next N loads", and that is on purpose. +{% endhint %} diff --git a/docs/cronus-framework/multi-process-topology.md b/docs/cronus-framework/multi-process-topology.md new file mode 100644 index 00000000..fd4dab65 --- /dev/null +++ b/docs/cronus-framework/multi-process-topology.md @@ -0,0 +1,134 @@ +# Multi-process topology + +A Cronus solution rarely runs as a single process in production. Once traffic grows, the workload splits: an API host has to answer HTTP fast, a worker host has to chew through events at its own pace, an offline scraper has to pour data in without holding up either, and a console may need to run a one-off bulk operation against the same event store. All four are the same Cronus solution — same bounded context, same configuration, same event store and broker — but each does only the part of the work it is meant to do. + +The split is controlled by a small family of boolean flags. This page explains the pattern, the toggles, and the topologies that fall out of them. + +## Why split + +You split a Cronus host when one of these things stops being true: + +* The HTTP-facing process is fast enough to keep the user waiting only for the command write, not for projections, sagas and gateways to settle. +* A long replay (a versioning rebuild, a migration, a backfill) does not steal capacity from production traffic. +* External-facing side effects — emails, push notifications, third-party calls — happen on a host that can be restarted, throttled or paused without taking the API down. +* Read-only workloads can scale horizontally without each replica trying to advance write-side state. + +Each of those is a `Cronus:*Enabled` flag flipped on a different host. + +## The toggle pattern + +Cronus core ships nine flags, all on [`CronusHostOptions`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/CronusHostOptions.cs) and bound under `cronus:`: + +| Toggle | Default | What it starts | +| --- | --- | --- | +| `Cronus:ApplicationServicesEnabled` | `true` | The application-services consumer (commands → aggregates) and the event-store indices consumer | +| `Cronus:ProjectionsEnabled` | `true` | The projections consumer (events → projection store) | +| `Cronus:PortsEnabled` | `true` | The ports consumer (events → outbound side effects with no persistent state) | +| `Cronus:SagasEnabled` | `true` | The sagas consumer (events → new commands and timeouts) | +| `Cronus:GatewaysEnabled` | `true` | The gateways consumer (events → external systems with persistent infrastructure state) | +| `Cronus:TriggersEnabled` | `true` | The triggers consumer (event hooks) | +| `Cronus:SystemServicesEnabled` | `true` | Cronus-internal system app services, sagas, ports, triggers, projections, system indices | +| `Cronus:MigrationsEnabled` | `false` | The replay/migration consumer | +| `Cronus:RpcApiEnabled` | `false` | The RPC API host (request/response over RabbitMQ) | + +The flags are read by [`CronusHost`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Hosting/CronusHost.cs) at start. Each flag gates a `consumer.StartAsync()` call — if the flag is `false`, the consumer is built (the container still resolves it) but never started, so it never pulls from RabbitMQ. The flags are also re-evaluated when the options are reloaded, which means you can toggle a host between modes by changing configuration. + +For the full reference of these keys see: + +{% content-ref url="configuration.md" %} +[configuration.md](configuration.md) +{% endcontent-ref %} + +Two consumers — `MigrationsEnabled` and `RpcApiEnabled` — default to `false`. Every other consumer defaults to `true`. The default Cronus host is therefore an "all-in-one" worker; you turn things off as you split it. + +## Common topologies + +### All-in-one (development, small services) + +Every flag at default. One process does everything. Simple, fine until traffic forces you to split. + +### API host + worker host (the canonical split) + +Two processes share the same configuration except for the flags: + +| Flag | API host | Worker host | +| --- | --- | --- | +| `ApplicationServicesEnabled` | `true` | `true` | +| `ProjectionsEnabled` | `false` | `true` | +| `PortsEnabled` | `false` | `true` | +| `SagasEnabled` | `false` | `true` | +| `GatewaysEnabled` | `false` | `true` | +| `TriggersEnabled` | `false` | `true` | +| `SystemServicesEnabled` | `true` | `true` | + +The API host accepts commands over HTTP, dispatches them to application services and returns; nothing else runs in-process. The worker picks up the resulting events and turns them into projections, command-publishing sagas, side-effect ports and external-system gateways. Both processes call `IAggregateRepository.SaveAsync` (the API on every command, the worker through sagas), so both must have a real `IAggregateRootAtomicAction` configured — see [Atomic Actions](atomic-actions.md). + +### Read-only host (scale-out reads) + +Add a third process that does no writes at all: + +| Flag | Read-only host | +| --- | --- | +| `ApplicationServicesEnabled` | `false` | +| `ProjectionsEnabled` | `false` | +| `PortsEnabled` | `false` | +| `SagasEnabled` | `false` | +| `GatewaysEnabled` | `false` | +| `TriggersEnabled` | `false` | +| `SystemServicesEnabled` | `false` | + +It serves reads from the projection store and never advances any write-side state. You can run as many of these as the load needs, behind a load balancer. + +### Migration host + +A short-lived process that runs a replay or a backfill without disturbing production: + +| Flag | Migration host | +| --- | --- | +| `ApplicationServicesEnabled` | `false` | +| `ProjectionsEnabled` | `false` | +| `MigrationsEnabled` | `true` | + +Production hosts keep `MigrationsEnabled = false` so they never compete for the migration queue. When the replay finishes, the migration host shuts down. + +### Scrapers and consoles + +A scraper is a long-running process that pulls data from an external source and publishes commands; a console is a short one-off. Both typically run with most consumers off — they do not need to dispatch commands or run projections — and only the message-publishing infrastructure on. The Locus solution's [Scrapers](https://github.com/Elders/locus.backend/tree/master/src/Scrapers) and [`Elders.Locus.BulkOperations.Console`](https://github.com/Elders/locus.backend/tree/master/src/Elders.Locus.BulkOperations.Console) are the reference shape. + +## Shared infrastructure + +Every process in the topology sees the same: + +* **Event store** ([`Cronus:Persistence:Cassandra:*`](configuration.md#cronus-persistence-cassandra)) — appends from any host go into the same keyspaces; loads from any host hit the same data. +* **Projection store** ([`Cronus:Projections:Cassandra:*`](configuration.md#cronus-projections-cassandra)) — the worker writes; everyone reads. +* **Transport** ([`Cronus:Transport:RabbitMQ:*`](configuration.md#cronus-transport-rabbitmq)) — the API publishes commands and events; the worker consumes them. Each consumer type has its own queue, so flipping a flag on one host does not starve another. +* **Atomic action** ([`Cronus:AtomicAction:Redis:*`](configuration.md#cronus-atomicaction-redis)) — every process that writes must reach the same Redis. See [Atomic Actions](atomic-actions.md). + +The bounded-context name and tenant list — [`Cronus:BoundedContext`](configuration.md#cronus-boundedcontext) and [`Cronus:Tenants`](configuration.md#cronus-tenants) — must match across every process, because they are part of how queues, exchanges and keyspaces are named. + +## Wiring the topology with Aspire + +The natural place to define a topology is the AppHost. Each process is a `builder.AddProject<...>()` with its own `WithEnvironment("Cronus__ProjectionsEnabled", "false")` overrides, while sharing one `cassandra`, `rabbitmq` and `redis` resource. See: + +{% content-ref url="aspire-cronus-wiring.md" %} +[aspire-cronus-wiring.md](aspire-cronus-wiring.md) +{% endcontent-ref %} + +## Best Practices + +{% hint style="success" %} +**You can / should / must** + +* every process **must** share the same `Cronus:BoundedContext` and `Cronus:Tenants` — they identify the queues and keyspaces the topology lives in +* every writing process **must** be configured with a real atomic action; "writing" means anywhere `SaveAsync` is called, including sagas +* a topology **should** start as all-in-one and split only when a real bottleneck appears — the toggles are reversible, the architecture should be too +* a topology **can** mix Aspire-managed and externally-managed processes (a long-lived worker in Aspire, a CI-triggered console outside it) as long as both see the same configuration +{% endhint %} + +{% hint style="warning" %} +**You should not** + +* a topology **should not** turn off `SystemServicesEnabled` on a writing host — the system consumers are how versioning, indices and similar internal flows progress +* a topology **should not** run two processes with `MigrationsEnabled = true` against the same bounded context at the same time +* a topology **should not** depend on flag changes at runtime to fence off bad behaviour — disable the flag *and* stop the host if you really do not want it doing the work +{% endhint %} diff --git a/docs/cronus-framework/projections/projection-markers.md b/docs/cronus-framework/projections/projection-markers.md new file mode 100644 index 00000000..294f933c --- /dev/null +++ b/docs/cronus-framework/projections/projection-markers.md @@ -0,0 +1,121 @@ +# Projection markers + +Most projections should participate in versioning and replay — that is the point of [`ProjectionVersionManager`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/ProjectionVersionManager.cs). A few should not. The framework expresses that with two marker interfaces declared in [`CronusAssembly.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/CronusAssembly.cs): `INonVersionableProjection` and `INonRebuildableProjection`. Both are opt-out signals — you implement them on your projection class, and the framework changes how it treats the projection at boot and during replay. + +This page covers when to reach for either marker, what changes when you do, and how the markers interact with the rest of the versioning subsystem. + +## The problem the markers solve + +The default versioning policy ([`MarkupInterfaceProjectionVersioningPolicy`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/MarkupInterfaceProjectionVersioningPolicy.cs)) treats every projection as versionable. Whenever the projection's hash changes, the version manager requests a new version, the builder replays every event the projection handles, and once the new version is live, reads switch over. That is the right thing for almost every projection. + +It is the wrong thing for two narrow cases: + +1. **System projections that the versioning subsystem itself depends on.** [`ProjectionVersionsHandler`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/Handlers/ProjectionVersionsHandler.cs) is the projection that *records* what versions exist. If it tried to participate in versioning, the system would have to be running before it could be running. +2. **Projections that should not be wiped and rebuilt at all** — typically because their state cannot be reconstructed from events alone, or because rebuilding them would be ruinously expensive and the on-disk shape is stable enough that no rebuild is ever needed. + +`INonVersionableProjection` covers the first; `INonRebuildableProjection` covers the second. + +## INonVersionableProjection + +The full interface: + +```csharp +public interface INonVersionableProjection +{ + string GetHash() => "ver"; + + int GetRevision() => 1; +} +``` + +When a projection implements this marker, two things change: + +* [`ProjectionHasher.CalculateHash`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/ProjectionHasher.cs) does not derive a hash from the set of `IEventHandler` interfaces. It instantiates the projection with the parameterless constructor and returns whatever `GetHash()` says — `"ver"` by default. Adding or removing event handlers does not change this value. +* `MarkupInterfaceProjectionVersioningPolicy.IsVersionable(projectionName)` returns `false`. The version manager — which reads the policy on every `Replay`, `NotifyHash` and `Rebuild` call — treats the projection as non-versionable, which means it does not request a new version when the handler set changes. + +The combined effect is that the projection has exactly one version for its lifetime. Reads always come from that version's storage slot; replays do not happen automatically when you change the code. + +The framework uses this on `ProjectionVersionsHandler` because that projection has to be available before the version manager can do anything; there is no chicken-and-egg way to version it. Your code uses it when you have a projection whose state must survive deploys verbatim — for example, a projection that records the latest known offset of an external feed, where re-deriving the value from the event log is not the right behaviour even if you change the handler shape. + +A worked example: + +```csharp +[DataContract(Name = "8a4f06d3-3f9b-46ab-8f5c-6f6b6e64a3f5")] +public class FeedOffsetProjection : ProjectionDefinition, + INonVersionableProjection, + IEventHandler +{ + public FeedOffsetProjection() + { + Subscribe(x => new FeedId(x.Id.Tenant, x.Id.Id)); + } + + public Task HandleAsync(FeedAdvanced @event) + { + // ... + return Task.CompletedTask; + } +} +``` + +If you later add `IEventHandler`, the hash stays `"ver"`, no replay is requested, and the new handler simply starts running for new events from the moment it is deployed. State accumulated under the previous handler set is preserved. + +## INonRebuildableProjection + +The interface is empty — it is a pure marker: + +```csharp +public interface INonRebuildableProjection { } +``` + +The semantics are "this projection's state must not be wiped and reconstructed from the event log". A projection that cannot survive a rebuild is one where, for example, the state was bootstrapped from a one-off import that the event log does not contain, or where part of the state is computed from an external source at write time and the source is no longer available. + +The framework declares the marker but does not currently apply it inside the versioning policy classes that ship in [`Elders.Cronus`](https://github.com/Elders/Cronus). The Cronus tests ([`TestData.cs`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus.Tests/Projections/TestData.cs)) cover the type so satellite packages — particularly the projection store implementations such as [`Elders.Cronus.Projections.Cassandra`](https://github.com/Elders/Cronus.Projections.Cassandra) — can detect it and refuse to drop or recreate the projection's tables during a rebuild. + +The recommended treatment, until the marker has more first-class support in core, is to use it as a **declaration of intent** in your domain code so that operators reading the projection know not to issue a rebuild against it. Wire a real refusal — failing the call, or branching at the projection-store level — in whatever component manages your rebuild operations. + +```csharp +[DataContract(Name = "1d6f2c79-1e8b-4a07-9db6-2e0c8b0f2d3f")] +public class ImportedCustomerLedgerProjection : ProjectionDefinition, + INonRebuildableProjection, + IEventHandler +{ + // ... +} +``` + +## Interaction with the version manager + +The version manager runs the same flow regardless of markers — it requests versions, it tracks timeboxes, it transitions a `Building` version to `Live`. The markers change two of its inputs: + +* [`ProjectionHasher.CalculateHash`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/ProjectionHasher.cs) returns the marker-overridden hash for `INonVersionableProjection`, so the manager never sees a hash change for those projections in normal operation. +* [`MarkupInterfaceProjectionVersioningPolicy.IsVersionable`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/MarkupInterfaceProjectionVersioningPolicy.cs) returns `false` for `INonVersionableProjection`, which is what the manager uses to decide whether `Replay` is allowed. The first-time-ever case is special-cased: even a non-versionable projection gets a single creation pass when no live version exists yet. + +The disaster-recovery branch in [`ProjectionVersionManager.Rebuild`](https://github.com/Elders/Cronus/blob/master/src/Elders.Cronus/Projections/Versioning/ProjectionVersionManager.cs) bypasses the policy entirely — it is reserved for the system projection that owns versioning state. + +## Cross-link + +The full versioning lifecycle, including how the policy is consulted and how the timebox is set, lives next to this page: + +{% content-ref url="versioning.md" %} +[versioning.md](versioning.md) +{% endcontent-ref %} + +## Best Practices + +{% hint style="success" %} +**You can / should / must** + +* a projection **must** be marked `INonVersionableProjection` if its state cannot survive a routine handler-set change — the default behaviour is to wipe and replay +* a projection **can** override `GetHash()` on `INonVersionableProjection` to bump the version explicitly when you really do want a one-off rebuild +* a projection **should** be marked `INonRebuildableProjection` if the projection-store side has no way to reconstruct it — the marker is what operators look for +* the marker decision **should** live in the same file as the projection class so the constraint is visible at every reading +{% endhint %} + +{% hint style="warning" %} +**You should not** + +* a projection **should not** be marked `INonVersionableProjection` to "save time" — versioning is the framework's mechanism for keeping the read model honest +* a projection **should not** depend on `INonRebuildableProjection` having a framework-enforced effect today; treat the marker as documentation backed by your projection-store integration +* a projection **should not** mix the markers without a reason; pick the one that matches the constraint you want to express +{% endhint %} diff --git a/docs/cronus-framework/projections/versioning.md b/docs/cronus-framework/projections/versioning.md index fa9460d4..e1d88c1d 100644 --- a/docs/cronus-framework/projections/versioning.md +++ b/docs/cronus-framework/projections/versioning.md @@ -60,6 +60,7 @@ Next deploy, Cronus hashes the new handler, notices the hash has changed, reques ## Related * [Handlers / Projections](../domain-modeling/handlers/projections.md) — how to write the handler itself. +* [Projection Markers](projection-markers.md) — `INonVersionableProjection` and `INonRebuildableProjection`, the opt-out markers referenced above. * [Jobs](../jobs.md) — the job runner the replay sits on top of. * [Indices](../indices.md) — specifically the `EventToAggregateRootId` index, which the rebuild depends on to locate events efficiently. * [Snapshots](snapshots.md) — note: snapshots are not currently shipped; that page documents the situation. From 09249c72f37c8d3a13dd5ee67050b7bb5a45b5a3 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Tue, 28 Apr 2026 18:41:46 +0300 Subject: [PATCH 07/21] chore: Updates CI pipeline to use .NET 10 SDK - Collapses the dotnet 8 + dotnet 9 test stages into a single net10 test stage - Bumps the UseDotNet task to version 10.x (with includePreviewVersions=false) - Updates dotnet test --framework to net10.0 - Updates DeployStage UseDotNet to 10.x and dependsOn to RunTestsDotnet10 --- ci/azure-pipelines.yml | 46 ++++++++---------------------------------- 1 file changed, 8 insertions(+), 38 deletions(-) diff --git a/ci/azure-pipelines.yml b/ci/azure-pipelines.yml index 861035cc..5795edf8 100644 --- a/ci/azure-pipelines.yml +++ b/ci/azure-pipelines.yml @@ -12,8 +12,8 @@ pool: vmImage: 'ubuntu-22.04' stages: -- stage: RunTestsDotnet9 - displayName: 'Run tests for dotnet 9' +- stage: RunTestsDotnet10 + displayName: 'Run tests for dotnet 10' jobs: - job: run_tests @@ -25,50 +25,20 @@ stages: - task: UseDotNet@2 inputs: packageType: 'sdk' - version: '9.x' - includePreviewVersions: true + version: '10.x' + includePreviewVersions: false - task: DotNetCoreCLI@2 name: test inputs: command: test projects: '**/*Tests.csproj' - arguments: '--framework net9.0' - -- stage: RunTestsDotnet8 - displayName: 'Run tests for dotnet 8' - jobs: - - job: run_tests - - steps: - - checkout: self - clean: true - persistCredentials: true - - - task: UseDotNet@2 - inputs: - packageType: 'sdk' - version: '9.x' - includePreviewVersions: true - - - task: UseDotNet@2 - inputs: - packageType: 'sdk' - version: '8.x' - includePreviewVersions: true - - - task: DotNetCoreCLI@2 - name: test - inputs: - command: test - projects: '**/*Tests.csproj' - arguments: '--framework net8.0' + arguments: '--framework net10.0' - stage : DeployStage displayName: 'Deploy stage' dependsOn: - - RunTestsDotnet8 - - RunTestsDotnet9 + - RunTestsDotnet10 jobs: - job: build_pack_publish @@ -81,8 +51,8 @@ stages: - task: UseDotNet@2 inputs: packageType: 'sdk' - version: '9.x' - includePreviewVersions: true + version: '10.x' + includePreviewVersions: false - task: DotNetCoreCLI@2 name: build From 7d1888c815f1553365a991f5b953d1a1b0769649 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Wed, 29 Apr 2026 14:24:49 +0300 Subject: [PATCH 08/21] fix: Migrates publish step to bash + elders-nuget group with newVer gate --- ci/azure-pipelines.yml | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/ci/azure-pipelines.yml b/ci/azure-pipelines.yml index 5795edf8..949a7418 100644 --- a/ci/azure-pipelines.yml +++ b/ci/azure-pipelines.yml @@ -1,6 +1,8 @@ --- variables: - PROJECT_DIR: Elders.Cronus + - group: elders-nuget + - name: PROJECT_DIR + value: Elders.Cronus trigger: branches: @@ -76,12 +78,10 @@ stages: echo dotnet nuget `dotnet nuget --version` echo dotnet `dotnet --version` - - task: NuGetCommand@2 + - task: Bash@3 name: publish - enabled: true + displayName: nuget push to nuget.org condition: and(eq(variables['newVer'], 'yes'), succeeded()) inputs: - command: 'push' - packagesToPush: '$(Build.StagingDirectory)/*.nupkg' - nuGetFeedType: 'external' - publishFeedCredentials: 'CI-AzurePipelines' + targetType: 'inline' + script: 'dotnet nuget push $(Build.StagingDirectory)/*.nupkg --api-key $(NUGET_KEY) --source https://api.nuget.org/v3/index.json --skip-duplicate' From d22fa6c4b3cee9a1c1aebc740bfde95179ce7360 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Wed, 29 Apr 2026 14:30:17 +0300 Subject: [PATCH 09/21] Adds async overload to RetryableOperation --- .../Elders.Cronus.Tests.csproj | 2 + .../RetryableOperationAsyncTests.cs | 61 +++++++++++++++++++ .../Userfull/RetryableOperation.cs | 55 +++++++++++++++++ 3 files changed, 118 insertions(+) create mode 100644 src/Elders.Cronus.Tests/PublisherTests/RetryableOperationAsyncTests.cs diff --git a/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj b/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj index 780906c3..a7a8131c 100644 --- a/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj +++ b/src/Elders.Cronus.Tests/Elders.Cronus.Tests.csproj @@ -10,6 +10,8 @@ + + diff --git a/src/Elders.Cronus.Tests/PublisherTests/RetryableOperationAsyncTests.cs b/src/Elders.Cronus.Tests/PublisherTests/RetryableOperationAsyncTests.cs new file mode 100644 index 00000000..3aca155c --- /dev/null +++ b/src/Elders.Cronus.Tests/PublisherTests/RetryableOperationAsyncTests.cs @@ -0,0 +1,61 @@ +using System; +using System.Threading; +using System.Threading.Tasks; +using Xunit; + +namespace Elders.Cronus.Tests.PublisherTests; + +public class RetryableOperationAsyncTests +{ + [Fact] + public async Task TryExecuteAsync_returns_value_on_first_success() + { + RetryPolicy policy = RetryableOperation.RetryPolicyFactory.CreateLinearRetryPolicy(3, TimeSpan.FromMilliseconds(1)); + + int result = await RetryableOperation.TryExecuteAsync( + _ => Task.FromResult(42), + policy); + + Assert.Equal(42, result); + } + + [Fact] + public async Task TryExecuteAsync_retries_until_success() + { + RetryPolicy policy = RetryableOperation.RetryPolicyFactory.CreateLinearRetryPolicy(5, TimeSpan.FromMilliseconds(1)); + int attempts = 0; + + int result = await RetryableOperation.TryExecuteAsync( + _ => + { + attempts++; + if (attempts < 3) throw new InvalidOperationException("transient"); + return Task.FromResult(99); + }, + policy, + getOperationInfo: () => "test-op"); + + Assert.Equal(99, result); + Assert.Equal(3, attempts); + } + + [Fact] + public async Task TryExecuteAsync_propagates_cancellation() + { + RetryPolicy policy = RetryableOperation.RetryPolicyFactory.CreateLinearRetryPolicy(3, TimeSpan.FromSeconds(5)); + using var cts = new CancellationTokenSource(); + cts.CancelAfter(TimeSpan.FromMilliseconds(50)); + + await Assert.ThrowsAnyAsync(async () => + { + await RetryableOperation.TryExecuteAsync( + async ct => + { + await Task.Delay(TimeSpan.FromSeconds(10), ct); + return 1; + }, + policy, + cancellationToken: cts.Token); + }); + } +} diff --git a/src/Elders.Cronus/Userfull/RetryableOperation.cs b/src/Elders.Cronus/Userfull/RetryableOperation.cs index 4c1b59c1..ff79d750 100644 --- a/src/Elders.Cronus/Userfull/RetryableOperation.cs +++ b/src/Elders.Cronus/Userfull/RetryableOperation.cs @@ -1,5 +1,6 @@ using System; using System.Threading; +using System.Threading.Tasks; using Microsoft.Extensions.Logging; namespace Elders.Cronus; @@ -64,6 +65,60 @@ public static T TryExecute(Func operation, RetryPolicy retryPolicy, Func + /// Asynchronously executes the specified operation, retrying on exceptions according to the supplied . + /// + /// The result type of the operation. + /// The asynchronous operation to execute. The provided should be honored by the operation. + /// The retry policy that decides whether to retry and the delay between retries. + /// Optional callback returning a description of the operation, used for diagnostic logging. + /// A token to observe while waiting between retries and to forward to . + /// A task that completes with the operation's result once it succeeds. + /// Thrown when or is null. + /// Thrown when is canceled while waiting between retries or during the operation. + public static async Task TryExecuteAsync(Func> operation, RetryPolicy retryPolicy, Func getOperationInfo = null, CancellationToken cancellationToken = default) + { + ArgumentNullException.ThrowIfNull(operation); + ArgumentNullException.ThrowIfNull(retryPolicy); + + ShouldRetry shouldRetry = retryPolicy(); + int attempt = 0; + + while (true) + { + cancellationToken.ThrowIfCancellationRequested(); + + Exception lastException; + try + { + return await operation(cancellationToken).ConfigureAwait(false); + } + catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) + { + throw; + } + catch (Exception ex) + { + lastException = ex; + } + + if (shouldRetry(attempt, lastException, out TimeSpan delay)) + { + if (logger.IsEnabled(LogLevel.Debug)) + logger.LogDebug("Retry {retryCount} after {delay}ms. Operation Info: {operationInfo}", attempt, delay.TotalMilliseconds, getOperationInfo?.Invoke()); + + await Task.Delay(delay, cancellationToken).ConfigureAwait(false); + attempt++; + } + else + { + if (logger.IsEnabled(LogLevel.Debug)) + logger.LogDebug("Maximum number of retries has been reached."); + throw lastException; + } + } + } + private static T InvokeTryExecuteInternal(Func operation, out Exception exception) { exception = null; From 6e74b7b894281fb918c9aeac83f876e80916491c Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Wed, 29 Apr 2026 14:34:30 +0300 Subject: [PATCH 10/21] Refactors Publisher base classes for async --- src/Elders.Cronus/PublisherBase.cs | 60 +++++++++++++++--------------- 1 file changed, 31 insertions(+), 29 deletions(-) diff --git a/src/Elders.Cronus/PublisherBase.cs b/src/Elders.Cronus/PublisherBase.cs index 2f8ce96c..8774c43b 100644 --- a/src/Elders.Cronus/PublisherBase.cs +++ b/src/Elders.Cronus/PublisherBase.cs @@ -1,8 +1,10 @@ -using System; +using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using System.Runtime.CompilerServices; +using System.Threading; +using System.Threading.Tasks; using Elders.Cronus.Hosting.Heartbeat; using Elders.Cronus.Multitenancy; using Elders.Cronus.Workflow; @@ -77,7 +79,7 @@ bool CanIgnoreLeft_isDelegatingHandlerExecuted(PublishResult left, PublishResult public abstract class PublisherHandler { - protected internal virtual PublishResult PublishInternal(CronusMessage message) + protected internal virtual Task PublishInternalAsync(CronusMessage message, CancellationToken cancellationToken) { throw new NotImplementedException(); } @@ -85,12 +87,12 @@ protected internal virtual PublishResult PublishInternal(CronusMessage message) public abstract class DelegatingPublishHandler : PublisherHandler { - protected internal override PublishResult PublishInternal(CronusMessage message) + protected internal override Task PublishInternalAsync(CronusMessage message, CancellationToken cancellationToken) { if (InnerHandler is null) throw new InvalidOperationException("The inner publisher handler is not set."); - return InnerHandler.PublishInternal(message); + return InnerHandler.PublishInternalAsync(message, cancellationToken); } internal PublisherHandler InnerHandler { get; set; } @@ -107,7 +109,7 @@ public CronusHeadersPublishHandler(ITenantResolver tenantResolver, IOp this.boundedContext = boundedContextOptions.Value; } - protected internal override PublishResult PublishInternal(CronusMessage message) + protected internal override async Task PublishInternalAsync(CronusMessage message, CancellationToken cancellationToken) { Type payloadType = message.GetMessageType(); @@ -131,7 +133,7 @@ protected internal override PublishResult PublishInternal(CronusMessage message) message.Headers.Remove("contract_name"); message.Headers.Add("contract_name", payloadType.GetContractId()); - return new PublishResult(true, false) && base.PublishInternal(message); + return new PublishResult(true, false) && await base.PublishInternalAsync(message, cancellationToken).ConfigureAwait(false); } } @@ -149,13 +151,13 @@ public LoggingPublishHandler(ILogger logger) this.logger = logger; } - protected internal override PublishResult PublishInternal(CronusMessage message) + protected internal override async Task PublishInternalAsync(CronusMessage message, CancellationToken cancellationToken) { using (logger.BeginScope(s => s.AddScope(Log.MessageId, message.Id.ToString()))) { try { - PublishResult isPublished = base.PublishInternal(message); + PublishResult isPublished = await base.PublishInternalAsync(message, cancellationToken).ConfigureAwait(false); Type messageType = message.GetMessageType(); @@ -193,7 +195,7 @@ public ActivityPublishHandler(DiagnosticListener diagnosticListener, ActivitySou this.logger = logger; } - protected internal override PublishResult PublishInternal(CronusMessage message) + protected internal override async Task PublishInternalAsync(CronusMessage message, CancellationToken cancellationToken) { Activity activity = StartActivity(message); if (Activity.Current is not null) @@ -202,7 +204,7 @@ protected internal override PublishResult PublishInternal(CronusMessage message) message.Headers.Add(TelemetryTraceParent, Activity.Current.Id); } - PublishResult published = base.PublishInternal(message); + PublishResult published = await base.PublishInternalAsync(message, cancellationToken).ConfigureAwait(false); StopActivity(activity); return new PublishResult(true, false) && published; @@ -277,7 +279,7 @@ public PublisherBase(IEnumerable handlers) this.handlers = handlers.Cast(); } - public virtual bool Publish(TMessage message, Dictionary messageHeaders) + public virtual async Task PublishAsync(TMessage message, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) { if (messageHeaders is null) messageHeaders = new Dictionary(); @@ -287,7 +289,7 @@ public virtual bool Publish(TMessage message, Dictionary message var enumerator = handlers.GetEnumerator(); bool hasHandlers = enumerator.MoveNext(); if (hasHandlers == false) - return PublishInternal(cronusMessage); + return await PublishInternalAsync(cronusMessage, cancellationToken).ConfigureAwait(false); while (hasHandlers) { @@ -303,10 +305,23 @@ public virtual bool Publish(TMessage message, Dictionary message } } - return handlers.First().PublishInternal(cronusMessage); + return await handlers.First().PublishInternalAsync(cronusMessage, cancellationToken).ConfigureAwait(false); } - public virtual bool Publish(byte[] messageRaw, Type messageType, string tenant, Dictionary messageHeaders) + public virtual async Task PublishAsync(TMessage message, DateTime publishAt, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) + { + messageHeaders = messageHeaders ?? new Dictionary(); + messageHeaders[MessageHeader.PublishTimestamp] = publishAt.ToFileTimeUtc().ToString(); + return await PublishAsync(message, messageHeaders, cancellationToken).ConfigureAwait(false); + } + + public virtual async Task PublishAsync(TMessage message, TimeSpan publishAfter, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) + { + DateTime publishAt = DateTime.UtcNow.Add(publishAfter); + return await PublishAsync(message, publishAt, messageHeaders, cancellationToken).ConfigureAwait(false); + } + + public virtual async Task PublishAsync(byte[] messageRaw, Type messageType, string tenant, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) { if (messageHeaders is null) messageHeaders = new Dictionary(); @@ -321,7 +336,7 @@ public virtual bool Publish(byte[] messageRaw, Type messageType, string tenant, IEnumerator enumerator = handlers.GetEnumerator(); bool hasHandlers = enumerator.MoveNext(); if (hasHandlers == false) - return PublishInternal(cronusMessage); + return await PublishInternalAsync(cronusMessage, cancellationToken).ConfigureAwait(false); while (hasHandlers) { @@ -337,20 +352,7 @@ public virtual bool Publish(byte[] messageRaw, Type messageType, string tenant, } } - return handlers.First().PublishInternal(cronusMessage); - } - - public virtual bool Publish(TMessage message, DateTime publishAt, Dictionary messageHeaders = null) - { - messageHeaders = messageHeaders ?? new Dictionary(); - messageHeaders.Add(MessageHeader.PublishTimestamp, publishAt.ToFileTimeUtc().ToString()); - return Publish(message, messageHeaders); - } - - public bool Publish(TMessage message, TimeSpan publishAfter, Dictionary messageHeaders = null) - { - DateTime publishAt = DateTime.UtcNow.Add(publishAfter); - return Publish(message, publishAt, messageHeaders); + return await handlers.First().PublishInternalAsync(cronusMessage, cancellationToken).ConfigureAwait(false); } private void EnsureValidTenant(string tenant, Dictionary messageHeaders) From 5bfaf989da5648d83c04495b32bca308cd8c36e2 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Wed, 29 Apr 2026 14:37:27 +0300 Subject: [PATCH 11/21] Refactors Publisher concrete to async with TryExecuteAsync retry --- src/Elders.Cronus/Publisher.cs | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/src/Elders.Cronus/Publisher.cs b/src/Elders.Cronus/Publisher.cs index efd46b93..64c22762 100644 --- a/src/Elders.Cronus/Publisher.cs +++ b/src/Elders.Cronus/Publisher.cs @@ -1,5 +1,7 @@ -using System; +using System; using System.Collections.Generic; +using System.Threading; +using System.Threading.Tasks; namespace Elders.Cronus; @@ -9,17 +11,18 @@ namespace Elders.Cronus; /// The message to be sent. public abstract class Publisher : PublisherBase where TMessage : IMessage { - private RetryPolicy retryPolicy; + private readonly RetryPolicy retryPolicy; public Publisher(IEnumerable handlers) : base(handlers) { - retryPolicy = new RetryPolicy(RetryableOperation.RetryPolicyFactory.CreateLinearRetryPolicy(5, TimeSpan.FromMilliseconds(300))); + retryPolicy = RetryableOperation.RetryPolicyFactory.CreateLinearRetryPolicy(5, TimeSpan.FromMilliseconds(300)); } - public override bool Publish(TMessage message, Dictionary messageHeaders) + public override Task PublishAsync(TMessage message, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) { - bool isPublished = RetryableOperation.TryExecute(() => base.Publish(message, messageHeaders), retryPolicy); - - return isPublished; + return RetryableOperation.TryExecuteAsync( + ct => base.PublishAsync(message, messageHeaders, ct), + retryPolicy, + cancellationToken: cancellationToken); } } From f887d9e4a876f216f49fa551527319ce0c9f3b63 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Wed, 29 Apr 2026 14:43:41 +0300 Subject: [PATCH 12/21] Migrates ICronusStartup interfaces and implementers to async --- .../When_adding_event_to_aggregateCommit.cs | 24 ++++++++++--------- src/Elders.Cronus/Hosting/CronusBooter.cs | 10 ++++---- src/Elders.Cronus/Hosting/CronusHost.cs | 2 +- .../Hosting/EventStoreIndicesStartup.cs | 21 +++++++++------- src/Elders.Cronus/Hosting/ICronusStartup.cs | 12 ++++++---- .../Hosting/ProjectionsStartup.cs | 8 ++++--- 6 files changed, 45 insertions(+), 32 deletions(-) diff --git a/src/Elders.Cronus.Tests/Hosting/When_adding_event_to_aggregateCommit.cs b/src/Elders.Cronus.Tests/Hosting/When_adding_event_to_aggregateCommit.cs index dbd849e7..addd67c8 100644 --- a/src/Elders.Cronus.Tests/Hosting/When_adding_event_to_aggregateCommit.cs +++ b/src/Elders.Cronus.Tests/Hosting/When_adding_event_to_aggregateCommit.cs @@ -2,6 +2,8 @@ using System.Collections.Generic; using System.Linq; using System; +using System.Threading; +using System.Threading.Tasks; namespace Elders.Cronus.Migrations; @@ -60,15 +62,15 @@ public IEnumerable Scan() yield return typeof(PortsStartup); } - [CronusStartup(Bootstraps.Environment)] public class EnvironmentStartup : ICronusStartup { public void Bootstrap() { } } - [CronusStartup(Bootstraps.ExternalResource)] public class ExternalResourceStartup : ICronusStartup { public void Bootstrap() { } } - [CronusStartup(Bootstraps.Configuration)] public class ConfigurationStartup : ICronusStartup { public void Bootstrap() { } } - [CronusStartup(Bootstraps.Aggregates)] public class AggregatesStartup : ICronusStartup { public void Bootstrap() { } } - [CronusStartup(Bootstraps.Ports)] public class PortsStartup : ICronusStartup { public void Bootstrap() { } } - [CronusStartup(Bootstraps.Sagas)] public class SagasStartup : ICronusStartup { public void Bootstrap() { } } - [CronusStartup(Bootstraps.Projections)] public class ProjectionsStartup : ICronusStartup { public void Bootstrap() { } } - [CronusStartup(Bootstraps.Projections)] public class SecondProjectionsStartup : ICronusStartup { public void Bootstrap() { } } - [CronusStartup(Bootstraps.Gateways)] public class GatewaysStartup : ICronusStartup { public void Bootstrap() { } } - [CronusStartup(Bootstraps.Runtime)] public class RuntimeStartup : ICronusStartup { public void Bootstrap() { } } - public class NoAttributeStartup : ICronusStartup { public void Bootstrap() { } } + [CronusStartup(Bootstraps.Environment)] public class EnvironmentStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } + [CronusStartup(Bootstraps.ExternalResource)] public class ExternalResourceStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } + [CronusStartup(Bootstraps.Configuration)] public class ConfigurationStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } + [CronusStartup(Bootstraps.Aggregates)] public class AggregatesStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } + [CronusStartup(Bootstraps.Ports)] public class PortsStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } + [CronusStartup(Bootstraps.Sagas)] public class SagasStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } + [CronusStartup(Bootstraps.Projections)] public class ProjectionsStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } + [CronusStartup(Bootstraps.Projections)] public class SecondProjectionsStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } + [CronusStartup(Bootstraps.Gateways)] public class GatewaysStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } + [CronusStartup(Bootstraps.Runtime)] public class RuntimeStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } + public class NoAttributeStartup : ICronusStartup { public Task BootstrapAsync(CancellationToken cancellationToken = default) => Task.CompletedTask; } } diff --git a/src/Elders.Cronus/Hosting/CronusBooter.cs b/src/Elders.Cronus/Hosting/CronusBooter.cs index 7fe4c953..57cd5a85 100644 --- a/src/Elders.Cronus/Hosting/CronusBooter.cs +++ b/src/Elders.Cronus/Hosting/CronusBooter.cs @@ -1,10 +1,12 @@ -using Elders.Cronus.MessageProcessing; +using Elders.Cronus.MessageProcessing; using Elders.Cronus.Multitenancy; using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.Logging; using Microsoft.Extensions.Options; using System; using System.Collections.Generic; +using System.Threading; +using System.Threading.Tasks; namespace Elders.Cronus; @@ -23,7 +25,7 @@ public CronusBooter(IServiceProvider serviceProvider, IOptionsMonitor startups = scanner.Scan(); @@ -31,7 +33,7 @@ public void BootstrapCronus() foreach (var startupType in startups) { ICronusStartup startup = (ICronusStartup)serviceProvider.GetRequiredService(startupType); - startup.Bootstrap(); + await startup.BootstrapAsync(cancellationToken).ConfigureAwait(false); } IEnumerable tenantStartups = scanner.ScanForCronusTenantStartups(); @@ -45,7 +47,7 @@ public void BootstrapCronus() var cronusContext = cronusContextFactory.Create(tenant, scopedServiceProvider.ServiceProvider); ICronusTenantStartup tenantStartUp = (ICronusTenantStartup)cronusContext.ServiceProvider.GetRequiredService(tenantStartupType); - tenantStartUp.Bootstrap(); + await tenantStartUp.BootstrapAsync(cancellationToken).ConfigureAwait(false); } } } diff --git a/src/Elders.Cronus/Hosting/CronusHost.cs b/src/Elders.Cronus/Hosting/CronusHost.cs index 7f1df914..b921f56d 100644 --- a/src/Elders.Cronus/Hosting/CronusHost.cs +++ b/src/Elders.Cronus/Hosting/CronusHost.cs @@ -88,7 +88,7 @@ public async Task StartAsync() { CronusLogger.Configure(serviceProvider.GetService()); - booter.BootstrapCronus(); + await booter.BootstrapCronusAsync().ConfigureAwait(false); if (hostOptions.SystemServicesEnabled) { diff --git a/src/Elders.Cronus/Hosting/EventStoreIndicesStartup.cs b/src/Elders.Cronus/Hosting/EventStoreIndicesStartup.cs index da712a64..8d61cd79 100644 --- a/src/Elders.Cronus/Hosting/EventStoreIndicesStartup.cs +++ b/src/Elders.Cronus/Hosting/EventStoreIndicesStartup.cs @@ -1,9 +1,11 @@ -using Elders.Cronus.EventStore.Index; +using Elders.Cronus.EventStore.Index; using Elders.Cronus.Multitenancy; using Microsoft.Extensions.Logging; using Microsoft.Extensions.Options; using System; using System.Linq; +using System.Threading; +using System.Threading.Tasks; namespace Elders.Cronus; @@ -25,10 +27,13 @@ public EventStoreIndicesStartup(TypeContainer indexTypeContain this.indexTypeContainer = indexTypeContainer; cronusHostOptions.OnChange(CronusHostOptionsChanged); - tenantsOptions.OnChange(OptionsChangedBootstrapEventStoreIndexForTenant); + tenantsOptions.OnChange(async newOptions => + { + await OptionsChangedBootstrapEventStoreIndexForTenantAsync(newOptions).ConfigureAwait(false); + }); } - public void Bootstrap() + public async Task BootstrapAsync(CancellationToken cancellationToken = default) { if (cronusHostOptions.ApplicationServicesEnabled == false) return; @@ -37,22 +42,22 @@ public void Bootstrap() { foreach (var tenant in tenants.Tenants) { - InitializeIndesForTenant(index, tenant); + await InitializeIndexForTenantAsync(index, tenant, cancellationToken).ConfigureAwait(false); } } } - private void InitializeIndesForTenant(Type index, string tenant) + private async Task InitializeIndexForTenantAsync(Type index, string tenant, CancellationToken cancellationToken = default) { if (cronusHostOptions.ApplicationServicesEnabled == false) return; var id = new EventStoreIndexManagerId(index.GetContractId(), tenant); var command = new RegisterIndex(id); - publisher.Publish(command); + await publisher.PublishAsync(command, cancellationToken: cancellationToken).ConfigureAwait(false); } - private void OptionsChangedBootstrapEventStoreIndexForTenant(TenantsOptions newOptions) + private async Task OptionsChangedBootstrapEventStoreIndexForTenantAsync(TenantsOptions newOptions) { if (tenants.Tenants.SequenceEqual(newOptions.Tenants) == false) // Check for difference between tenants and newOptions { @@ -66,7 +71,7 @@ private void OptionsChangedBootstrapEventStoreIndexForTenant(TenantsOptions newO { foreach (var tenant in newTenants) { - InitializeIndesForTenant(index, tenant); + await InitializeIndexForTenantAsync(index, tenant).ConfigureAwait(false); } } diff --git a/src/Elders.Cronus/Hosting/ICronusStartup.cs b/src/Elders.Cronus/Hosting/ICronusStartup.cs index ae17fd39..b0893643 100644 --- a/src/Elders.Cronus/Hosting/ICronusStartup.cs +++ b/src/Elders.Cronus/Hosting/ICronusStartup.cs @@ -1,18 +1,20 @@ -namespace Elders.Cronus; +using System.Threading; +using System.Threading.Tasks; + +namespace Elders.Cronus; /// /// This type of startups are singleton and are executed ONLY once, so use accordingly /// public interface ICronusStartup { - // TODO: Make this async - void Bootstrap(); + Task BootstrapAsync(CancellationToken cancellationToken = default); } /// /// This type of startups are executed X amount of times per tenant, so use accordingly /// -public interface ICronusTenantStartup //TODO: also make this async :) kali +public interface ICronusTenantStartup { - void Bootstrap(); + Task BootstrapAsync(CancellationToken cancellationToken = default); } diff --git a/src/Elders.Cronus/Hosting/ProjectionsStartup.cs b/src/Elders.Cronus/Hosting/ProjectionsStartup.cs index d0654738..132cd0bd 100644 --- a/src/Elders.Cronus/Hosting/ProjectionsStartup.cs +++ b/src/Elders.Cronus/Hosting/ProjectionsStartup.cs @@ -1,4 +1,6 @@ -using Elders.Cronus.Projections; +using System.Threading; +using System.Threading.Tasks; +using Elders.Cronus.Projections; namespace Elders.Cronus; @@ -12,8 +14,8 @@ public ProjectionsStartup(CronusProjectionBootstrapper projectionsBootstrapper) this.projectionsBootstrapper = projectionsBootstrapper; } - public void Bootstrap() + public Task BootstrapAsync(CancellationToken cancellationToken = default) { - projectionsBootstrapper.BootstrapAsync().GetAwaiter().GetResult(); + return projectionsBootstrapper.BootstrapAsync(); } } From 9b48a2c3cc62a0b19f450ea265b2c5f66e18eb34 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Wed, 29 Apr 2026 14:49:11 +0300 Subject: [PATCH 13/21] Migrates remaining sync Publish and RequestTimeout call sites to async --- .../AutoUpdates/AutoUpdateSaga.cs | 2 +- .../Index/Handlers/EventStoreIndexBuilder.cs | 13 +++++------- .../Players/ReplayPublicEvents_Job.cs | 5 ++--- .../Hosting/Heartbeat/CronusHeartbeat.cs | 4 ++-- .../AggregateCommitPublisher.cs | 6 ++---- .../AggregateRepositoryAndEventPublisher.cs | 6 +++--- .../CronusProjectionBootstrapper.cs | 2 +- .../Projections/Rebuilding/ProgressTracker.cs | 2 +- .../RebuildProjectionSequentially_Job.cs | 6 +++--- .../Rebuilding/RebuildProjection_Job.cs | 6 +++--- .../Versioning/Handlers/ProjectionBuilder.cs | 20 ++++++++----------- 11 files changed, 31 insertions(+), 41 deletions(-) diff --git a/src/Elders.Cronus/AutoUpdates/AutoUpdateSaga.cs b/src/Elders.Cronus/AutoUpdates/AutoUpdateSaga.cs index 74dc27c4..5fb6f1d0 100644 --- a/src/Elders.Cronus/AutoUpdates/AutoUpdateSaga.cs +++ b/src/Elders.Cronus/AutoUpdates/AutoUpdateSaga.cs @@ -27,7 +27,7 @@ public async Task HandleAsync(AutoUpdateTriggered @event) var id = new AutoUpdaterId(@event.BoundedContext, @event.Id.Tenant); var finish = new FinishAutoUpdate(id, @event.Name, DateTimeOffset.UtcNow); - commandPublisher.Publish(finish); + await commandPublisher.PublishAsync(finish).ConfigureAwait(false); } } } diff --git a/src/Elders.Cronus/EventStore/Index/Handlers/EventStoreIndexBuilder.cs b/src/Elders.Cronus/EventStore/Index/Handlers/EventStoreIndexBuilder.cs index caa9b3a6..2b7ee9cc 100644 --- a/src/Elders.Cronus/EventStore/Index/Handlers/EventStoreIndexBuilder.cs +++ b/src/Elders.Cronus/EventStore/Index/Handlers/EventStoreIndexBuilder.cs @@ -23,16 +23,13 @@ public EventStoreIndexBuilder(IPublisher commandPublisher, IPublisher< this.messageCounterJobFactory = messageCounterJobFactory; } - public Task HandleAsync(EventStoreIndexRequested @event) + public async Task HandleAsync(EventStoreIndexRequested @event) { var startRebuildAt = @event.Timebox.RequestStartAt; if (startRebuildAt.AddMinutes(5) > DateTime.UtcNow && @event.Timebox.HasExpired == false) { - RequestTimeout(new RebuildIndexInternal(@event, @event.Timebox.RequestStartAt, @event.MaxDegreeOfParallelism)); - //RequestTimeout(new EventStoreIndexRebuildTimedout(@event, @event.Timebox.FinishRequestUntil)); + await RequestTimeoutAsync(new RebuildIndexInternal(@event, @event.Timebox.RequestStartAt, @event.MaxDegreeOfParallelism)).ConfigureAwait(false); } - - return Task.CompletedTask; } public async Task HandleAsync(RebuildIndexInternal sagaTimeout) @@ -54,17 +51,17 @@ public async Task HandleAsync(RebuildIndexInternal sagaTimeout) if (result == JobExecutionStatus.Running) { - RequestTimeout(new RebuildIndexInternal(sagaTimeout.EventStoreIndexRequest, DateTime.UtcNow.AddSeconds(60), sagaTimeout.MaxDegreeOfParallelism)); + await RequestTimeoutAsync(new RebuildIndexInternal(sagaTimeout.EventStoreIndexRequest, DateTime.UtcNow.AddSeconds(60), sagaTimeout.MaxDegreeOfParallelism)).ConfigureAwait(false); } else if (result == JobExecutionStatus.Failed) { // log error - RequestTimeout(new RebuildIndexInternal(sagaTimeout.EventStoreIndexRequest, DateTime.UtcNow.AddSeconds(60), sagaTimeout.MaxDegreeOfParallelism)); + await RequestTimeoutAsync(new RebuildIndexInternal(sagaTimeout.EventStoreIndexRequest, DateTime.UtcNow.AddSeconds(60), sagaTimeout.MaxDegreeOfParallelism)).ConfigureAwait(false); } else if (result == JobExecutionStatus.Completed) { var finalize = new FinalizeEventStoreIndexRequest(sagaTimeout.EventStoreIndexRequest.Id); - commandPublisher.Publish(finalize); + await commandPublisher.PublishAsync(finalize).ConfigureAwait(false); } } diff --git a/src/Elders.Cronus/EventStore/Players/ReplayPublicEvents_Job.cs b/src/Elders.Cronus/EventStore/Players/ReplayPublicEvents_Job.cs index 6e4bcbe1..7180dc15 100644 --- a/src/Elders.Cronus/EventStore/Players/ReplayPublicEvents_Job.cs +++ b/src/Elders.Cronus/EventStore/Players/ReplayPublicEvents_Job.cs @@ -42,7 +42,7 @@ protected override async Task RunJobAsync(IClusterOperations ulong counter = Data.EventTypePaging is null ? 0 : Data.EventTypePaging.ProcessedCount; PlayerOperator @operator = new PlayerOperator() { - OnLoadAsync = eventRaw => + OnLoadAsync = async eventRaw => { string tenant = contextAccessor.CronusContext.Tenant; //TODO: Document which headers are essential or make another ctor for CronusMessage with byte[] @@ -56,10 +56,9 @@ protected override async Task RunJobAsync(IClusterOperations { "contract_name", Data.SourceEventTypeId } }; - publicEventPublisher.Publish(eventRaw.Data, Data.SourceEventTypeId.GetTypeByContract(), tenant, headers); + await publicEventPublisher.PublishAsync(eventRaw.Data, Data.SourceEventTypeId.GetTypeByContract(), tenant, headers, cancellationToken).ConfigureAwait(false); counter++; - return Task.CompletedTask; }, NotifyProgressAsync = async options => { diff --git a/src/Elders.Cronus/Hosting/Heartbeat/CronusHeartbeat.cs b/src/Elders.Cronus/Hosting/Heartbeat/CronusHeartbeat.cs index 6456f80b..bd22ed81 100644 --- a/src/Elders.Cronus/Hosting/Heartbeat/CronusHeartbeat.cs +++ b/src/Elders.Cronus/Hosting/Heartbeat/CronusHeartbeat.cs @@ -40,9 +40,9 @@ public async Task StartBeatingAsync(CancellationToken stoppingToken) { Dictionary heartbeatHeaders = new Dictionary() { { MessageHeader.TTL, TTL } }; var signal = new HeartbeatSignal(boundedContext.Name, tenants.Tenants.ToList()); - publisher.Publish(signal, heartbeatHeaders); + await publisher.PublishAsync(signal, heartbeatHeaders, stoppingToken).ConfigureAwait(false); - await Task.Delay(TimeSpan.FromSeconds(options.IntervalInSeconds), stoppingToken); + await Task.Delay(TimeSpan.FromSeconds(options.IntervalInSeconds), stoppingToken).ConfigureAwait(false); } catch (Exception ex) when (ex is TaskCanceledException or ObjectDisposedException) { diff --git a/src/Elders.Cronus/MessageProcessing/AggregateCommitPublisher.cs b/src/Elders.Cronus/MessageProcessing/AggregateCommitPublisher.cs index 95cab653..eb7f0f1e 100644 --- a/src/Elders.Cronus/MessageProcessing/AggregateCommitPublisher.cs +++ b/src/Elders.Cronus/MessageProcessing/AggregateCommitPublisher.cs @@ -22,18 +22,16 @@ public AggregateCommitPublisher(IPublisher publisher, ICronusCo this.logger = logger; } - public Task OnAppendAsync(AggregateCommit origin) + public async Task OnAppendAsync(AggregateCommit origin) { try { - bool publishResult = publisher.Publish(origin, BuildHeaders(origin)); + bool publishResult = await publisher.PublishAsync(origin, BuildHeaders(origin)).ConfigureAwait(false); if (publishResult == false) logger.LogError("Unable to publish aggregate commit."); } catch (Exception ex) when (True(() => logger.LogError(ex, "Unable to publish aggregate commit."))) { } - - return Task.CompletedTask; } public Task OnAppendingAsync(AggregateCommit origin) => Task.FromResult(origin); diff --git a/src/Elders.Cronus/MessageProcessing/AggregateRepositoryAndEventPublisher.cs b/src/Elders.Cronus/MessageProcessing/AggregateRepositoryAndEventPublisher.cs index 0ed6ee51..19cf457b 100644 --- a/src/Elders.Cronus/MessageProcessing/AggregateRepositoryAndEventPublisher.cs +++ b/src/Elders.Cronus/MessageProcessing/AggregateRepositoryAndEventPublisher.cs @@ -41,17 +41,17 @@ public async Task SaveAsync(AR aggregateRoot) where AR : IAggregateRoot { if (theEvent is EntityEvent entityEvent) { - isEverythingPublished &= eventPublisher.Publish(entityEvent.Event, BuildHeaders(aggregateCommit, aggregateRoot, ++position)); + isEverythingPublished &= await eventPublisher.PublishAsync(entityEvent.Event, BuildHeaders(aggregateCommit, aggregateRoot, ++position)).ConfigureAwait(false); } else { - isEverythingPublished &= eventPublisher.Publish(theEvent, BuildHeaders(aggregateCommit, aggregateRoot, ++position)); + isEverythingPublished &= await eventPublisher.PublishAsync(theEvent, BuildHeaders(aggregateCommit, aggregateRoot, ++position)).ConfigureAwait(false); } } position += 5; foreach (IPublicEvent publicEvent in aggregateRoot.UncommittedPublicEvents) { - isEverythingPublished &= publicEventPublisher.Publish(publicEvent, BuildHeaders(aggregateCommit, aggregateRoot, position++)); + isEverythingPublished &= await publicEventPublisher.PublishAsync(publicEvent, BuildHeaders(aggregateCommit, aggregateRoot, position++)).ConfigureAwait(false); } if (isEverythingPublished == false) diff --git a/src/Elders.Cronus/Projections/CronusProjectionBootstrapper.cs b/src/Elders.Cronus/Projections/CronusProjectionBootstrapper.cs index 008b0859..87cb9212 100644 --- a/src/Elders.Cronus/Projections/CronusProjectionBootstrapper.cs +++ b/src/Elders.Cronus/Projections/CronusProjectionBootstrapper.cs @@ -77,7 +77,7 @@ private async Task BootstrapProjectionsForTenantAsync(string tenant) { var id = new ProjectionVersionManagerId(projectionVersion.ProjectionName, tenant); var command = new RegisterProjection(id, projectionVersion.Hash); - publisher.Publish(command); + await publisher.PublishAsync(command).ConfigureAwait(false); } } } diff --git a/src/Elders.Cronus/Projections/Rebuilding/ProgressTracker.cs b/src/Elders.Cronus/Projections/Rebuilding/ProgressTracker.cs index 8607475b..441cf344 100644 --- a/src/Elders.Cronus/Projections/Rebuilding/ProgressTracker.cs +++ b/src/Elders.Cronus/Projections/Rebuilding/ProgressTracker.cs @@ -159,7 +159,7 @@ private void Notify(CancellationToken cancellationToken = default) while (true) { RebuildProjectionProgress progressSignalche = GetProgressSignal(); - signalPublisher.Publish(progressSignalche); + await signalPublisher.PublishAsync(progressSignalche, cancellationToken: cancellationToken).ConfigureAwait(false); if (cancellationToken.IsCancellationRequested) break; diff --git a/src/Elders.Cronus/Projections/Rebuilding/RebuildProjectionSequentially_Job.cs b/src/Elders.Cronus/Projections/Rebuilding/RebuildProjectionSequentially_Job.cs index bbe4727e..daa4cbd9 100644 --- a/src/Elders.Cronus/Projections/Rebuilding/RebuildProjectionSequentially_Job.cs +++ b/src/Elders.Cronus/Projections/Rebuilding/RebuildProjectionSequentially_Job.cs @@ -82,7 +82,7 @@ protected override async Task RunJobAsync(IClusterOperations return JobExecutionStatus.Running; var startSignal = progressTracker.GetProgressStartedSignal(); - signalPublisher.Publish(startSignal); + await signalPublisher.PublishAsync(startSignal, cancellationToken: cancellationToken).ConfigureAwait(false); List projectionEventsContractIds = projectionVersionHelper.GetInvolvedEventTypes(projectionType).Select(x => x.GetContractId()).ToList(); @@ -195,7 +195,7 @@ protected override async Task RunJobAsync(IClusterOperations Data = await cluster.PingAsync(Data).ConfigureAwait(false); var finishSignal = progressTracker.GetProgressFinishedSignal(); - signalPublisher.Publish(finishSignal); + await signalPublisher.PublishAsync(finishSignal, cancellationToken: cancellationToken).ConfigureAwait(false); var totalCount = progressTracker.GetTotalProcessedCount(); var avgSpeed = progressTracker.GetProcessedPerSecond(); @@ -220,7 +220,7 @@ private async Task CancelJobAsync(IClusterOperations cluster) Data = await cluster.PingAsync(Data).ConfigureAwait(false); var finishSignal = progressTracker.GetProgressFinishedSignal(); - signalPublisher.Publish(finishSignal); + await signalPublisher.PublishAsync(finishSignal).ConfigureAwait(false); } private bool IsInterested(string eventTypeContract, byte[] data) diff --git a/src/Elders.Cronus/Projections/Rebuilding/RebuildProjection_Job.cs b/src/Elders.Cronus/Projections/Rebuilding/RebuildProjection_Job.cs index ffc1f3cc..5f0efc82 100644 --- a/src/Elders.Cronus/Projections/Rebuilding/RebuildProjection_Job.cs +++ b/src/Elders.Cronus/Projections/Rebuilding/RebuildProjection_Job.cs @@ -85,7 +85,7 @@ protected override async Task RunJobAsync(IClusterOperations return JobExecutionStatus.Running; var startSignal = progressTracker.GetProgressStartedSignal(); - signalPublisher.Publish(startSignal); + await signalPublisher.PublishAsync(startSignal, cancellationToken: cancellationToken).ConfigureAwait(false); List projectionHandledEventTypes = projectionVersionHelper.GetInvolvedEventTypes(projectionType).Select(x => x.GetContractId()).ToList(); var projectionInstance = contextAccessor.CronusContext.ServiceProvider.GetRequiredService(projectionType) as IAmEventSourcedProjectionFast; @@ -178,7 +178,7 @@ protected override async Task RunJobAsync(IClusterOperations Data = await cluster.PingAsync(Data).ConfigureAwait(false); var finishSignal = progressTracker.GetProgressFinishedSignal(); - signalPublisher.Publish(finishSignal); + await signalPublisher.PublishAsync(finishSignal, cancellationToken: cancellationToken).ConfigureAwait(false); var totalCount = progressTracker.GetTotalProcessedCount(); var avgSpeed = progressTracker.GetProcessedPerSecond(); @@ -203,6 +203,6 @@ private async Task CancelJobAsync(IClusterOperations cluster) Data = await cluster.PingAsync(Data).ConfigureAwait(false); var finishSignal = progressTracker.GetProgressFinishedSignal(); - signalPublisher.Publish(finishSignal); + await signalPublisher.PublishAsync(finishSignal).ConfigureAwait(false); } } diff --git a/src/Elders.Cronus/Projections/Versioning/Handlers/ProjectionBuilder.cs b/src/Elders.Cronus/Projections/Versioning/Handlers/ProjectionBuilder.cs index c1436486..74acb2f9 100644 --- a/src/Elders.Cronus/Projections/Versioning/Handlers/ProjectionBuilder.cs +++ b/src/Elders.Cronus/Projections/Versioning/Handlers/ProjectionBuilder.cs @@ -40,16 +40,14 @@ public ProjectionBuilder(IPublisher commandPublisher, IPublisher DateTime.UtcNow && @event.Timebox.HasExpired == false) { - RequestTimeout(new CreateNewProjectionVersion(@event, @event.Timebox.RequestStartAt)); - //RequestTimeout(new ProjectionVersionRequestHeartbeat(@event, @event.Timebox.FinishRequestUntil)); + await RequestTimeoutAsync(new CreateNewProjectionVersion(@event, @event.Timebox.RequestStartAt)).ConfigureAwait(false); + //await RequestTimeoutAsync(new ProjectionVersionRequestHeartbeat(@event, @event.Timebox.FinishRequestUntil)).ConfigureAwait(false); } - - return Task.CompletedTask; } public Task HandleAsync(ProjectionVersionRequestPaused @event) @@ -87,26 +85,24 @@ public async Task HandleAsync(CreateNewProjectionVersion sagaTimeout) if (result == JobExecutionStatus.Running) { - RequestTimeout(new CreateNewProjectionVersion(sagaTimeout.ProjectionVersionRequest, DateTime.UtcNow.AddSeconds(60))); + await RequestTimeoutAsync(new CreateNewProjectionVersion(sagaTimeout.ProjectionVersionRequest, DateTime.UtcNow.AddSeconds(60))).ConfigureAwait(false); } else if (result == JobExecutionStatus.Failed) { var cancel = new CancelProjectionVersionRequest(sagaTimeout.ProjectionVersionRequest.Id, sagaTimeout.ProjectionVersionRequest.Version, "Failed"); - commandPublisher.Publish(cancel); + await commandPublisher.PublishAsync(cancel).ConfigureAwait(false); } else if (result == JobExecutionStatus.Completed) { var finalize = new FinalizeProjectionVersionRequest(sagaTimeout.ProjectionVersionRequest.Id, sagaTimeout.ProjectionVersionRequest.Version); - commandPublisher.Publish(finalize); + await commandPublisher.PublishAsync(finalize).ConfigureAwait(false); } } - public Task HandleAsync(ProjectionVersionRequestHeartbeat sagaTimeout) + public async Task HandleAsync(ProjectionVersionRequestHeartbeat sagaTimeout) { var timedout = new TimeoutProjectionVersionRequest(sagaTimeout.ProjectionVersionRequest.Id, sagaTimeout.ProjectionVersionRequest.Version, sagaTimeout.ProjectionVersionRequest.Timebox); - commandPublisher.Publish(timedout); - - return Task.CompletedTask; + await commandPublisher.PublishAsync(timedout).ConfigureAwait(false); } private void OptionsForTenantReloaded(TenantsOptions newOptions) From c355496731da6477d03bf5579504c43d61aa2de9 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Wed, 29 Apr 2026 14:52:36 +0300 Subject: [PATCH 14/21] Reads ActivitySource version from assembly metadata --- .../Hosting/CronusServiceCollectionExtensions.cs | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/Elders.Cronus/Hosting/CronusServiceCollectionExtensions.cs b/src/Elders.Cronus/Hosting/CronusServiceCollectionExtensions.cs index 52d9a4f0..8e1be164 100644 --- a/src/Elders.Cronus/Hosting/CronusServiceCollectionExtensions.cs +++ b/src/Elders.Cronus/Hosting/CronusServiceCollectionExtensions.cs @@ -1,6 +1,7 @@ using System; using System.Diagnostics; using System.Linq; +using System.Reflection; using Elders.Cronus.Cluster.Job; using Elders.Cronus.DangerZone; using Elders.Cronus.Discoveries; @@ -15,6 +16,12 @@ namespace Elders.Cronus; public static class CronusServiceCollectionExtensions { + private static readonly string _assemblyVersion = + typeof(CronusServiceCollectionExtensions).Assembly + .GetCustomAttribute()?.InformationalVersion + ?? typeof(CronusServiceCollectionExtensions).Assembly.GetName().Version?.ToString() + ?? "0.0.0"; + /// /// // Adds Cronus core services /// @@ -79,7 +86,7 @@ internal static IServiceCollection AddOpenTelemetry(this IServiceCollection serv { services.AddSingleton(new DiagnosticListener("cronus")); - services.AddSingleton(new ActivitySource("Elders.Cronus", "11.0.0")); + services.AddSingleton(new ActivitySource("Elders.Cronus", _assemblyVersion)); } return services; From bc8d25b5adaee4a3d0ac559aa1d67d55469f7a96 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Wed, 29 Apr 2026 14:53:15 +0300 Subject: [PATCH 15/21] Adds async stress tests for Publisher --- .../PublisherTests/AsyncStressTests.cs | 80 +++++++++++++++++++ 1 file changed, 80 insertions(+) create mode 100644 src/Elders.Cronus.Tests/PublisherTests/AsyncStressTests.cs diff --git a/src/Elders.Cronus.Tests/PublisherTests/AsyncStressTests.cs b/src/Elders.Cronus.Tests/PublisherTests/AsyncStressTests.cs new file mode 100644 index 00000000..b780a2fb --- /dev/null +++ b/src/Elders.Cronus.Tests/PublisherTests/AsyncStressTests.cs @@ -0,0 +1,80 @@ +using System; +using System.Collections.Generic; +using System.Diagnostics; +using System.Linq; +using System.Threading; +using System.Threading.Tasks; +using Xunit; + +namespace Elders.Cronus.Tests.PublisherTests; + +public class AsyncStressTests +{ + private sealed class TestMessage : IMessage + { + public DateTimeOffset Timestamp { get; } = DateTimeOffset.UtcNow; + } + + private sealed class CountingPublisher : IPublisher + { + public int CallCount; + + public Task PublishAsync(TestMessage message, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) + { + Interlocked.Increment(ref CallCount); + return Task.FromResult(true); + } + + public Task PublishAsync(TestMessage message, DateTime publishAt, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) => Task.FromResult(true); + + public Task PublishAsync(TestMessage message, TimeSpan publishAfter, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) => Task.FromResult(true); + + public Task PublishAsync(byte[] messageRaw, Type messageType, string tenant, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) => Task.FromResult(true); + } + + private sealed class SlowPublisher : IPublisher + { + public async Task PublishAsync(TestMessage message, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) + { + await Task.Delay(TimeSpan.FromSeconds(5), cancellationToken).ConfigureAwait(false); + return true; + } + + public Task PublishAsync(TestMessage message, DateTime publishAt, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) => Task.FromResult(true); + + public Task PublishAsync(TestMessage message, TimeSpan publishAfter, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) => Task.FromResult(true); + + public Task PublishAsync(byte[] messageRaw, Type messageType, string tenant, Dictionary messageHeaders = null, CancellationToken cancellationToken = default) => Task.FromResult(true); + } + + [Fact] + public async Task ParallelFanout_1000_concurrent_publishes_no_deadlock() + { + var pub = new CountingPublisher(); + var tasks = Enumerable.Range(0, 1000).Select(_ => pub.PublishAsync(new TestMessage())).ToArray(); + var sw = Stopwatch.StartNew(); + var results = await Task.WhenAll(tasks); + sw.Stop(); + + Assert.Equal(1000, pub.CallCount); + Assert.All(results, r => Assert.True(r)); + Assert.True(sw.ElapsedMilliseconds < 5000, $"1000 publishes took {sw.ElapsedMilliseconds}ms (expected <5s)"); + } + + [Fact] + public async Task CancellationToken_propagates_during_in_flight_publish() + { + var slowPub = new SlowPublisher(); + using var cts = new CancellationTokenSource(); + cts.CancelAfter(TimeSpan.FromMilliseconds(50)); + var sw = Stopwatch.StartNew(); + + await Assert.ThrowsAnyAsync(async () => + { + await slowPub.PublishAsync(new TestMessage(), cancellationToken: cts.Token); + }); + + sw.Stop(); + Assert.True(sw.ElapsedMilliseconds < 200, $"Cancellation took {sw.ElapsedMilliseconds}ms (expected <200ms)"); + } +} From 57a5d85201dfa1a6a1e040787492a2958b7b27d7 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Wed, 29 Apr 2026 17:04:55 +0300 Subject: [PATCH 16/21] Bumps Cronus.DomainModeling to 12.0.0-preview.8 --- src/Elders.Cronus/Elders.Cronus.csproj | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/Elders.Cronus/Elders.Cronus.csproj b/src/Elders.Cronus/Elders.Cronus.csproj index ad904f82..1c3f790e 100644 --- a/src/Elders.Cronus/Elders.Cronus.csproj +++ b/src/Elders.Cronus/Elders.Cronus.csproj @@ -35,7 +35,7 @@ - + From 2c54f9e7496a6c924d713dca2c62979be9cefb20 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Wed, 29 Apr 2026 17:14:13 +0300 Subject: [PATCH 17/21] major: Fixes lost stack trace when retries are exhausted in TryExecute and TryExecuteAsync --- src/Elders.Cronus/Userfull/RetryableOperation.cs | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/Elders.Cronus/Userfull/RetryableOperation.cs b/src/Elders.Cronus/Userfull/RetryableOperation.cs index ff79d750..d42f2d63 100644 --- a/src/Elders.Cronus/Userfull/RetryableOperation.cs +++ b/src/Elders.Cronus/Userfull/RetryableOperation.cs @@ -1,4 +1,5 @@ using System; +using System.Runtime.ExceptionServices; using System.Threading; using System.Threading.Tasks; using Microsoft.Extensions.Logging; @@ -54,7 +55,7 @@ public static T TryExecute(Func operation, RetryPolicy retryPolicy, Func TryExecuteAsync(Func> { if (logger.IsEnabled(LogLevel.Debug)) logger.LogDebug("Maximum number of retries has been reached."); - throw lastException; + ExceptionDispatchInfo.Throw(lastException); } } } From b0f93800ea929b89de17d3a2c39763549250b482 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Mon, 11 May 2026 10:13:22 +0300 Subject: [PATCH 18/21] fix: Fixes projection bootstrap race by synthesizing discovery-time version Fixes a bootstrap-ordering race that left non-system projections stuck in JobExecutionStatus.Running indefinitely. Chain: - RebuildProjection_Job returns Running while ProjectionVersionHelper.ShouldBeRetriedAsync is true. - ShouldBeRetriedAsync returns true while EventStoreIndexStatus reads come back NotPresent. - EventStoreIndexStatus state is updated through ProjectionRepository.SaveAsync(Type, IEvent) via ProjectionIndex.IndexAsync, which goes through GetProjectionVersionsAsync. - During bootstrap the ProjectionVersionsHandler stream for a projection is empty, so RestoreFromHistoryAsync returns default(T) and the read collapses to NotFound. The save loop then iterates zero versions and silently no-ops, so EventStoreIndexStatus is never populated and IndexStatus.IsNotPresent() stays true forever. Fix: when GetProjectionVersionsAsync sees a NotFound (no error, no data), synthesize a discovery-time version (Status=New, Revision=1, hash from ProjectionHasher) so the write path in SaveAsync(Type, IEvent) has a writable version to persist into. The synthesized version matches what ProjectionVersionManager produces at first registration, so once the canonical lifecycle eventually transitions the version to Live, reads find the events at the same (name, revision) coordinates the writes used. Status flip alone does not change the storage key in either the Cassandra or Postgres adapters. Considered alternatives: 1. Bootstrap ordering (await ProjectionVersionsHandler rebuild before publishing RegisterProjection for everything else): architecturally cleaner but adds startup latency and still does not handle late-arriving projections. 2. Saga retry tuning: masks the symptom without fixing the lost write. Adds ProjectionRepositoryBootstrapRaceTests covering the regression. --- .../ProjectionRepositoryBootstrapRaceTests.cs | 223 ++++++++++++++++++ .../Projections/ProjectionRepository.cs | 32 ++- 2 files changed, 254 insertions(+), 1 deletion(-) create mode 100644 src/Elders.Cronus.Tests/Projections/ProjectionRepositoryBootstrapRaceTests.cs diff --git a/src/Elders.Cronus.Tests/Projections/ProjectionRepositoryBootstrapRaceTests.cs b/src/Elders.Cronus.Tests/Projections/ProjectionRepositoryBootstrapRaceTests.cs new file mode 100644 index 00000000..08f9f056 --- /dev/null +++ b/src/Elders.Cronus.Tests/Projections/ProjectionRepositoryBootstrapRaceTests.cs @@ -0,0 +1,223 @@ +using System; +using System.Collections.Concurrent; +using System.Collections.Generic; +using System.Linq; +using System.Runtime.Serialization; +using System.Threading; +using System.Threading.Tasks; +using Elders.Cronus.MessageProcessing; +using Elders.Cronus.Projections; +using Elders.Cronus.Projections.Cassandra; +using Elders.Cronus.Projections.Versioning; +using Microsoft.Extensions.DependencyInjection; +using Xunit; + +namespace Elders.Cronus.Tests.Projections; + +/// +/// Regression tests for the bootstrap-ordering race fixed in commit 647f0f74 +/// ("Synthesizes discovery-time projection version when version handler stream is empty"). +/// +/// Background: during startup the ProjectionVersionsHandler stream for a given +/// projection name is empty until the version handler itself has been rebuilt. Before +/// the fix, would silently +/// no-op in that window because GetProjectionVersionsAsync reported NotFound and +/// the save loop iterated zero versions. That left writes to system projections like +/// EventStoreIndexStatus dropped, which kept user-projection rebuilds stuck in +/// JobExecutionStatus.Running indefinitely. +/// +/// The fix synthesizes a discovery-time +/// (Status=New, Revision=1, Hash=) whenever the version +/// handler stream is empty, so writes have a writable target version. +/// +public class ProjectionRepositoryBootstrapRaceTests +{ + [Fact] + public async Task GetProjectionVersionsAsync_when_version_handler_stream_is_empty_synthesizes_discovery_time_version() + { + BootstrapTestHarness harness = BootstrapTestHarness.WithEmptyVersionHandlerStream(); + + ReadResult result = await harness.Repository.InvokeGetProjectionVersionsAsync(harness.ProjectionName); + + Assert.True(result.IsSuccess, $"Expected synthesized fallback to produce IsSuccess=true. Got: {result}"); + Assert.NotNull(result.Data); + Assert.Equal(1, result.Data.Count); + + ProjectionVersion synthesized = SingleVersion(result.Data); + Assert.Equal(harness.ProjectionName, synthesized.ProjectionName); + Assert.Equal(ProjectionStatus.New, synthesized.Status); + Assert.Equal(1, synthesized.Revision); + Assert.Equal(harness.ExpectedHash, synthesized.Hash); + } + + [Fact] + public async Task SaveAsync_when_version_handler_stream_is_empty_persists_through_synthesized_version() + { + BootstrapTestHarness harness = BootstrapTestHarness.WithEmptyVersionHandlerStream(); + BootstrapTestEvent @event = new BootstrapTestEvent(); + + await harness.Repository.SaveAsync(typeof(BootstrapTestProjection), @event); + + ProjectionCommit commit = Assert.Single(harness.ProjectionStore.SavedCommits); + Assert.Same(@event, commit.Event); + Assert.NotNull(commit.Version); + Assert.Equal(harness.ProjectionName, commit.Version.ProjectionName); + Assert.Equal(ProjectionStatus.New, commit.Version.Status); + Assert.Equal(1, commit.Version.Revision); + Assert.Equal(harness.ExpectedHash, commit.Version.Hash); + } + + [Fact] + public void Synthesized_version_hash_matches_canonical_ProjectionHasher_output() + { + // Guards against drift between ProjectionRepository's fallback and ProjectionVersionManager's + // first-registration path. If the storage key (name, revision, hash) ever diverges, writes + // made through the bootstrap fallback would be invisible once the canonical Live version + // arrives. + ProjectionHasher hasher = new ProjectionHasher(); + string canonicalHash = hasher.CalculateHash(typeof(BootstrapTestProjection)); + + BootstrapTestHarness harness = BootstrapTestHarness.WithEmptyVersionHandlerStream(); + + Assert.Equal(canonicalHash, harness.ExpectedHash); + } + + private static ProjectionVersion SingleVersion(ProjectionVersions versions) + { + ArgumentNullException.ThrowIfNull(versions); + ProjectionVersion only = null; + int count = 0; + using IEnumerator enumerator = versions.GetEnumerator(); + while (enumerator.MoveNext()) + { + only = enumerator.Current; + count++; + } + Assert.Equal(1, count); + return only; + } + + // ---------------- harness ---------------- + + private sealed class BootstrapTestHarness + { + public TestableProjectionRepository Repository { get; } + public CapturingProjectionStore ProjectionStore { get; } + public string ProjectionName { get; } + public string ExpectedHash { get; } + + private BootstrapTestHarness(TestableProjectionRepository repository, CapturingProjectionStore projectionStore, string projectionName, string expectedHash) + { + Repository = repository; + ProjectionStore = projectionStore; + ProjectionName = projectionName; + ExpectedHash = expectedHash; + } + + public static BootstrapTestHarness WithEmptyVersionHandlerStream() + { + // Pre-warm the contract cache so projectionName.GetTypeByContract() inside the fix + // can resolve back to BootstrapTestProjection. + string projectionName = typeof(BootstrapTestProjection).GetContractId(); + _ = typeof(BootstrapTestEvent).GetContractId(); + + ProjectionHasher hasher = new ProjectionHasher(); + string expectedHash = hasher.CalculateHash(typeof(BootstrapTestProjection)); + + CapturingProjectionStore projectionStore = new CapturingProjectionStore(); + ServiceCollection services = new ServiceCollection(); + services.AddTransient(); + ServiceProvider serviceProvider = services.BuildServiceProvider(); + + CronusContextAccessor contextAccessor = new CronusContextAccessor + { + CronusContext = new CronusContext("test-tenant", serviceProvider) + }; + DefaultHandlerFactory handlerFactory = new DefaultHandlerFactory(contextAccessor); + + TestableProjectionRepository repository = new TestableProjectionRepository(contextAccessor, projectionStore, handlerFactory, hasher); + + return new BootstrapTestHarness(repository, projectionStore, projectionName, expectedHash); + } + + private sealed class CronusContextAccessor : ICronusContextAccessor + { + public CronusContext CronusContext { get; set; } + } + } + + /// + /// Subclass that exposes the protected + /// for direct invocation. The method itself is not overridden - we exercise the real fallback logic. + /// + private sealed class TestableProjectionRepository : ProjectionRepository + { + public TestableProjectionRepository(ICronusContextAccessor contextAccessor, IProjectionStore projectionStore, IHandlerFactory handlerFactory, ProjectionHasher projectionHasher) + : base(contextAccessor, projectionStore, handlerFactory, projectionHasher) + { + } + + public Task> InvokeGetProjectionVersionsAsync(string projectionName) + => GetProjectionVersionsAsync(projectionName); + } + + /// + /// Fake that simulates the bootstrap window: every + /// call leaves the operator's stream callback + /// uninvoked, so callers receive an empty . Captures all + /// writes for inspection. + /// + private sealed class CapturingProjectionStore : IProjectionStore + { + private readonly ConcurrentQueue savedCommits = new ConcurrentQueue(); + + public IReadOnlyCollection SavedCommits => savedCommits.ToArray(); + + public Task EnumerateProjectionsAsync(ProjectionsOperator @operator, ProjectionQueryOptions options) + { + // Intentionally do not invoke @operator.OnProjectionStreamLoadedAsync. This mirrors + // the bootstrap state where the ProjectionVersionsHandler stream is empty: callers + // keep their default ProjectionStream.Empty(), RestoreFromHistoryAsync returns + // default(T), and GetProjectionVersionsFromStoreAsync collapses to NotFound. + return Task.CompletedTask; + } + + public Task SaveAsync(ProjectionCommit commit) + { + ArgumentNullException.ThrowIfNull(commit); + savedCommits.Enqueue(commit); + return Task.CompletedTask; + } + } + + // ---------------- test projection / event ---------------- + + [DataContract(Name = "f3a6c1d2-7b48-4a92-9d1e-bootstrapraceevt")] + public sealed class BootstrapTestEvent : IEvent + { + public DateTimeOffset Timestamp { get; } = DateTimeOffset.UtcNow; + } + + [DataContract(Name = "8b2e5a14-9f6d-47c8-a3b0-bootstrapraceprj")] + public sealed class BootstrapTestProjection : ProjectionDefinition, + IEventHandler + { + public BootstrapTestProjection() + { + Subscribe(_ => new BootstrapTestProjectionId("test-tenant", "fixed-id")); + } + + public Task HandleAsync(BootstrapTestEvent @event) => Task.CompletedTask; + } + + public sealed class BootstrapTestProjectionState + { + } + + public sealed class BootstrapTestProjectionId : AggregateRootId + { + private BootstrapTestProjectionId() : base() { } + + public BootstrapTestProjectionId(string tenant, string id) : base(tenant, "bootstrapraceid", id) { } + } +} diff --git a/src/Elders.Cronus/Projections/ProjectionRepository.cs b/src/Elders.Cronus/Projections/ProjectionRepository.cs index e19fcd11..efc076a5 100644 --- a/src/Elders.Cronus/Projections/ProjectionRepository.cs +++ b/src/Elders.Cronus/Projections/ProjectionRepository.cs @@ -191,7 +191,37 @@ protected async virtual Task> GetProjectionVersio if (queryResult.IsSuccess) return new ReadResult(queryResult.Data.State.AllVersions); - return ReadResult.WithError(queryResult.Error); + if (queryResult.HasError) + return ReadResult.WithError(queryResult.Error); + + // Bootstrap-ordering fallback: the ProjectionVersionsHandler stream for `projectionName` is empty, + // which happens before the version handler has finished its own rebuild. Synthesize a discovery-time + // version so writes through SaveAsync(Type, IEvent) are not silently dropped during startup. + // Once the version handler is rebuilt and the canonical versions are persisted, this fallback path + // is no longer hit because GetProjectionVersionsFromStoreAsync starts returning IsSuccess=true. + ProjectionVersions fallback = TryBuildDiscoveryTimeVersions(projectionName); + if (fallback is not null) + return new ReadResult(fallback); + + return ReadResult.WithNotFoundHint($"No versions found for projection `{projectionName}` and no discovery-time fallback could be derived."); + } + + private ProjectionVersions TryBuildDiscoveryTimeVersions(string projectionName) + { + try + { + Type projectionType = projectionName.GetTypeByContract(); + if (projectionType is null) + return null; + + string hash = projectionHasher.CalculateHash(projectionType); + ProjectionVersion seed = new ProjectionVersion(projectionName, ProjectionStatus.New, 1, hash); + return new ProjectionVersions(seed); + } + catch (Exception ex) when (ExceptionFilter.True(() => LogProjectionLoadError(log, ex))) + { + return null; + } } private async Task> GetInternalAsync(IBlobId projectionId, Type projectionType) where T : IProjectionDefinition From 801f8728be7d3821c776dc6e58bc0c48ee4948e0 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Mon, 11 May 2026 12:01:37 +0300 Subject: [PATCH 19/21] fix: Awaits projection initialization and injects discovery-time versions cache --- .../ProjectionRepositoryBootstrapRaceTests.cs | 36 +++++++++++- .../ProjectionVersionHelperTests.cs | 55 +++++++++++++++++++ .../Discoveries/ProjectionsDiscovery.cs | 2 + .../Projections/DiscoveryTimeVersionsCache.cs | 37 +++++++++++++ .../IDiscoveryTimeVersionsCache.cs | 25 +++++++++ .../Projections/ProjectionRepository.cs | 52 +++++++++++++----- .../Rebuilding/ProjectionVersionHelper.cs | 20 +++++-- 7 files changed, 204 insertions(+), 23 deletions(-) create mode 100644 src/Elders.Cronus.Tests/Projections/Rebuilding/ProjectionVersionHelperTests.cs create mode 100644 src/Elders.Cronus/Projections/DiscoveryTimeVersionsCache.cs create mode 100644 src/Elders.Cronus/Projections/IDiscoveryTimeVersionsCache.cs diff --git a/src/Elders.Cronus.Tests/Projections/ProjectionRepositoryBootstrapRaceTests.cs b/src/Elders.Cronus.Tests/Projections/ProjectionRepositoryBootstrapRaceTests.cs index 08f9f056..98a422eb 100644 --- a/src/Elders.Cronus.Tests/Projections/ProjectionRepositoryBootstrapRaceTests.cs +++ b/src/Elders.Cronus.Tests/Projections/ProjectionRepositoryBootstrapRaceTests.cs @@ -67,6 +67,33 @@ public async Task SaveAsync_when_version_handler_stream_is_empty_persists_throug Assert.Equal(harness.ExpectedHash, commit.Version.Hash); } + [Fact] + public async Task Mutating_synthesized_result_does_not_corrupt_memoization_cache() + { + // The fallback path memoizes ProjectionVersions per (projectionName, tenant). ProjectionVersions + // exposes Add/Remove and can be mutated; if the cache returned the same instance on every call, + // a caller that mutates it (e.g., adds a Canceled version) would poison every subsequent reader. + BootstrapTestHarness harness = BootstrapTestHarness.WithEmptyVersionHandlerStream(); + + ReadResult first = await harness.Repository.InvokeGetProjectionVersionsAsync(harness.ProjectionName); + Assert.True(first.IsSuccess); + ProjectionVersion synthesized = SingleVersion(first.Data); + + // Add a Canceled twin of the synthesized version. ProjectionVersions.Add removes the matching New + // entry and inserts the Canceled one — so a non-defensive cache would now hand out a Canceled + // single-version on the next read. + first.Data.Add(synthesized.WithStatus(ProjectionStatus.Canceled)); + + ReadResult second = await harness.Repository.InvokeGetProjectionVersionsAsync(harness.ProjectionName); + Assert.True(second.IsSuccess); + Assert.NotSame(first.Data, second.Data); + + ProjectionVersion stillSynthesized = SingleVersion(second.Data); + Assert.Equal(ProjectionStatus.New, stillSynthesized.Status); + Assert.Equal(1, stillSynthesized.Revision); + Assert.Equal(harness.ExpectedHash, stillSynthesized.Hash); + } + [Fact] public void Synthesized_version_hash_matches_canonical_ProjectionHasher_output() { @@ -135,7 +162,10 @@ public static BootstrapTestHarness WithEmptyVersionHandlerStream() }; DefaultHandlerFactory handlerFactory = new DefaultHandlerFactory(contextAccessor); - TestableProjectionRepository repository = new TestableProjectionRepository(contextAccessor, projectionStore, handlerFactory, hasher); + // Each harness gets a fresh cache instance; no static state, no cross-test leakage. + IDiscoveryTimeVersionsCache discoveryCache = new DiscoveryTimeVersionsCache(); + + TestableProjectionRepository repository = new TestableProjectionRepository(contextAccessor, projectionStore, handlerFactory, hasher, discoveryCache); return new BootstrapTestHarness(repository, projectionStore, projectionName, expectedHash); } @@ -152,8 +182,8 @@ private sealed class CronusContextAccessor : ICronusContextAccessor /// private sealed class TestableProjectionRepository : ProjectionRepository { - public TestableProjectionRepository(ICronusContextAccessor contextAccessor, IProjectionStore projectionStore, IHandlerFactory handlerFactory, ProjectionHasher projectionHasher) - : base(contextAccessor, projectionStore, handlerFactory, projectionHasher) + public TestableProjectionRepository(ICronusContextAccessor contextAccessor, IProjectionStore projectionStore, IHandlerFactory handlerFactory, ProjectionHasher projectionHasher, IDiscoveryTimeVersionsCache discoveryTimeVersionsCache) + : base(contextAccessor, projectionStore, handlerFactory, projectionHasher, discoveryTimeVersionsCache) { } diff --git a/src/Elders.Cronus.Tests/Projections/Rebuilding/ProjectionVersionHelperTests.cs b/src/Elders.Cronus.Tests/Projections/Rebuilding/ProjectionVersionHelperTests.cs new file mode 100644 index 00000000..1cebbfb4 --- /dev/null +++ b/src/Elders.Cronus.Tests/Projections/Rebuilding/ProjectionVersionHelperTests.cs @@ -0,0 +1,55 @@ +using System; +using System.Threading.Tasks; +using Elders.Cronus.Projections; +using Elders.Cronus.Projections.Rebuilding; +using Elders.Cronus.Projections.Versioning; +using Microsoft.Extensions.Logging.Abstractions; +using Xunit; + +namespace Elders.Cronus.Tests.Projections.Rebuilding; + +public class ProjectionVersionHelperTests +{ + [Fact] + public async Task InitializeNewProjectionVersionAsync_propagates_initializer_exceptions() + { + // Before the await fix, InitializeNewProjectionVersion was void and swallowed every failure + // from IInitializableProjectionStore.InitializeAsync. Callers had no way to learn the version + // tracker was not actually initialized and looped on retry. The async variant must surface + // these failures so RebuildProjection_Job transitions to JobExecutionStatus.Failed instead of + // staying silently stuck in Running. + InvalidOperationException expected = new InvalidOperationException("storage adapter exploded"); + ThrowingInitializableProjectionStore initializer = new ThrowingInitializableProjectionStore(expected); + + ProjectionVersionHelper helper = new ProjectionVersionHelper( + contextAccessor: null, + projectionReader: null, + projectionVersionInitializer: initializer, + projectionHasher: new ProjectionHasher(), + logger: NullLogger.Instance); + + InvalidOperationException actual = await Assert.ThrowsAsync( + () => helper.InitializeNewProjectionVersionAsync()); + + Assert.Same(expected, actual); + Assert.Equal(1, initializer.CallCount); + } + + private sealed class ThrowingInitializableProjectionStore : IInitializableProjectionStore + { + private readonly Exception toThrow; + + public ThrowingInitializableProjectionStore(Exception toThrow) + { + this.toThrow = toThrow; + } + + public int CallCount { get; private set; } + + public Task InitializeAsync(ProjectionVersion version) + { + CallCount++; + throw toThrow; + } + } +} diff --git a/src/Elders.Cronus/Discoveries/ProjectionsDiscovery.cs b/src/Elders.Cronus/Discoveries/ProjectionsDiscovery.cs index cb853193..89fac628 100644 --- a/src/Elders.Cronus/Discoveries/ProjectionsDiscovery.cs +++ b/src/Elders.Cronus/Discoveries/ProjectionsDiscovery.cs @@ -49,6 +49,7 @@ IEnumerable GetSupportingModels() yield return new DiscoveredModel(typeof(IProjectionVersioningPolicy), typeof(MarkupInterfaceProjectionVersioningPolicy), ServiceLifetime.Singleton); yield return new DiscoveredModel(typeof(MarkupInterfaceProjectionVersioningPolicy), typeof(MarkupInterfaceProjectionVersioningPolicy), ServiceLifetime.Singleton); yield return new DiscoveredModel(typeof(ProjectionHasher), typeof(ProjectionHasher), ServiceLifetime.Singleton); + yield return new DiscoveredModel(typeof(IDiscoveryTimeVersionsCache), typeof(DiscoveryTimeVersionsCache), ServiceLifetime.Singleton); yield return new DiscoveredModel(typeof(LatestProjectionVersionFinder), typeof(LatestProjectionVersionFinder), ServiceLifetime.Transient); @@ -91,6 +92,7 @@ IEnumerable GetSupportingModels() yield return new DiscoveredModel(typeof(IProjectionVersioningPolicy), typeof(MarkupInterfaceProjectionVersioningPolicy), ServiceLifetime.Singleton); yield return new DiscoveredModel(typeof(MarkupInterfaceProjectionVersioningPolicy), typeof(MarkupInterfaceProjectionVersioningPolicy), ServiceLifetime.Singleton); yield return new DiscoveredModel(typeof(ProjectionHasher), typeof(ProjectionHasher), ServiceLifetime.Singleton); + yield return new DiscoveredModel(typeof(IDiscoveryTimeVersionsCache), typeof(DiscoveryTimeVersionsCache), ServiceLifetime.Singleton); } IEnumerable RegisterMissingModels() diff --git a/src/Elders.Cronus/Projections/DiscoveryTimeVersionsCache.cs b/src/Elders.Cronus/Projections/DiscoveryTimeVersionsCache.cs new file mode 100644 index 00000000..84bb62b5 --- /dev/null +++ b/src/Elders.Cronus/Projections/DiscoveryTimeVersionsCache.cs @@ -0,0 +1,37 @@ +using System; +using System.Collections.Concurrent; + +namespace Elders.Cronus.Projections; + +/// +/// Singleton-scoped in-memory implementation of . +/// Each entry is keyed by (projectionName, tenant) and lives for the lifetime of the +/// host. The cache is bypassed in production once +/// ProjectionRepository.GetProjectionVersionsFromStoreAsync returns IsSuccess=true, +/// so stale entries are never observed. +/// +public sealed class DiscoveryTimeVersionsCache : IDiscoveryTimeVersionsCache +{ + private readonly ConcurrentDictionary<(string ProjectionName, string Tenant), ProjectionVersions> entries + = new ConcurrentDictionary<(string ProjectionName, string Tenant), ProjectionVersions>(); + + public ProjectionVersions GetOrAdd(string projectionName, string tenant, Func factory) + { + ArgumentNullException.ThrowIfNull(projectionName); + ArgumentNullException.ThrowIfNull(tenant); + ArgumentNullException.ThrowIfNull(factory); + + var key = (projectionName, tenant); + + if (entries.TryGetValue(key, out ProjectionVersions cached) == false) + { + cached = factory(); + if (cached is null) + return null; + + entries.TryAdd(key, cached); + } + + return new ProjectionVersions([.. cached]); + } +} diff --git a/src/Elders.Cronus/Projections/IDiscoveryTimeVersionsCache.cs b/src/Elders.Cronus/Projections/IDiscoveryTimeVersionsCache.cs new file mode 100644 index 00000000..e2332ac6 --- /dev/null +++ b/src/Elders.Cronus/Projections/IDiscoveryTimeVersionsCache.cs @@ -0,0 +1,25 @@ +using System; + +namespace Elders.Cronus.Projections; + +/// +/// Memoizes the discovery-time synthesized by +/// when the canonical ProjectionVersionsHandler +/// stream is empty for a given (projectionName, tenant). Every GetOrAdd +/// returns a fresh clone, so callers cannot mutate the cached entry. +/// +public interface IDiscoveryTimeVersionsCache +{ + /// + /// Returns a clone of the cached for + /// and . On first call invokes + /// and stores the result; subsequent calls return clones of + /// the stored value. If the factory returns null, nothing is cached and + /// null is returned. + /// + /// The projection contract name. + /// The Cronus tenant the cache entry belongs to. + /// Builds the canonical on cache miss. + /// A fresh clone of the cached versions, or null when the factory returned null. + ProjectionVersions GetOrAdd(string projectionName, string tenant, Func factory); +} diff --git a/src/Elders.Cronus/Projections/ProjectionRepository.cs b/src/Elders.Cronus/Projections/ProjectionRepository.cs index efc076a5..3856f35f 100644 --- a/src/Elders.Cronus/Projections/ProjectionRepository.cs +++ b/src/Elders.Cronus/Projections/ProjectionRepository.cs @@ -23,16 +23,29 @@ public partial class ProjectionRepository : IProjectionWriter, IProjectionReader readonly IProjectionStore projectionStore; private readonly IHandlerFactory handlerFactory; private readonly ProjectionHasher projectionHasher; - - public ProjectionRepository(ICronusContextAccessor contextAccessor, IProjectionStore projectionStore, IHandlerFactory handlerFactory, ProjectionHasher projectionHasher) + private readonly IDiscoveryTimeVersionsCache discoveryTimeVersionsCache; + + /// + /// Read- and write-side projection repository. Synthesized versions for projections whose + /// canonical ProjectionVersionsHandler stream is empty are memoized through + /// . + /// + /// Tenant + request context accessor. + /// Backing projection store (Cassandra, Postgres, etc.). + /// Builds projection instances for dispatch. + /// Computes the content-derived hash that goes into a projection version. + /// Singleton cache for the bootstrap-fallback path; injected to avoid static state and to make it replaceable in tests. + public ProjectionRepository(ICronusContextAccessor contextAccessor, IProjectionStore projectionStore, IHandlerFactory handlerFactory, ProjectionHasher projectionHasher, IDiscoveryTimeVersionsCache discoveryTimeVersionsCache) { if (contextAccessor is null) throw new ArgumentException(nameof(contextAccessor)); if (projectionStore is null) throw new ArgumentException(nameof(projectionStore)); + ArgumentNullException.ThrowIfNull(discoveryTimeVersionsCache); this.contextAccessor = contextAccessor; this.projectionStore = projectionStore; this.handlerFactory = handlerFactory; this.projectionHasher = projectionHasher; + this.discoveryTimeVersionsCache = discoveryTimeVersionsCache; } public async Task SaveAsync(Type projectionType, IEvent @event) @@ -206,22 +219,33 @@ protected async virtual Task> GetProjectionVersio return ReadResult.WithNotFoundHint($"No versions found for projection `{projectionName}` and no discovery-time fallback could be derived."); } + /// + /// Returns a discovery-time for the given projection name when + /// the canonical ProjectionVersionsHandler stream is empty. The injected + /// memoizes the synthesized value per + /// (projectionName, tenant) and hands out a fresh clone on every call. + /// private ProjectionVersions TryBuildDiscoveryTimeVersions(string projectionName) { - try - { - Type projectionType = projectionName.GetTypeByContract(); - if (projectionType is null) - return null; + string tenant = contextAccessor.CronusContext.Tenant; - string hash = projectionHasher.CalculateHash(projectionType); - ProjectionVersion seed = new ProjectionVersion(projectionName, ProjectionStatus.New, 1, hash); - return new ProjectionVersions(seed); - } - catch (Exception ex) when (ExceptionFilter.True(() => LogProjectionLoadError(log, ex))) + return discoveryTimeVersionsCache.GetOrAdd(projectionName, tenant, () => { - return null; - } + try + { + Type projectionType = projectionName.GetTypeByContract(); + if (projectionType is null) + return null; + + string hash = projectionHasher.CalculateHash(projectionType); + ProjectionVersion seed = new ProjectionVersion(projectionName, ProjectionStatus.New, 1, hash); + return new ProjectionVersions(seed); + } + catch (Exception ex) when (ExceptionFilter.True(() => LogProjectionLoadError(log, ex))) + { + return null; + } + }); } private async Task> GetInternalAsync(IBlobId projectionId, Type projectionType) where T : IProjectionDefinition diff --git a/src/Elders.Cronus/Projections/Rebuilding/ProjectionVersionHelper.cs b/src/Elders.Cronus/Projections/Rebuilding/ProjectionVersionHelper.cs index 5f7b77d5..0d5158f1 100644 --- a/src/Elders.Cronus/Projections/Rebuilding/ProjectionVersionHelper.cs +++ b/src/Elders.Cronus/Projections/Rebuilding/ProjectionVersionHelper.cs @@ -28,14 +28,22 @@ public ProjectionVersionHelper(ICronusContextAccessor contextAccessor, IProjecti } /// - /// Initializing new projection version if needed + /// Initializes the persistent projection store so that + /// version-tracking reads return an empty stream instead of a storage-missing error. Idempotent. /// - /// - /// - public void InitializeNewProjectionVersion() + /// A task that completes when the underlying initializer has finished. Rethrows any initializer exception so callers can surface the failure. + public async Task InitializeNewProjectionVersionAsync() { ProjectionVersion newPersistentVersion = GetNewProjectionVersion(); - projectionVersionInitializer.InitializeAsync(newPersistentVersion); + try + { + await projectionVersionInitializer.InitializeAsync(newPersistentVersion).ConfigureAwait(false); + } + catch (Exception ex) + { + logger.LogError(ex, "Failed to initialize projection version {cronus_ProjectionVersion}.", newPersistentVersion); + throw; + } } public async Task ShouldBeRetriedAsync(ProjectionVersion version) @@ -43,7 +51,7 @@ public async Task ShouldBeRetriedAsync(ProjectionVersion version) bool isVersionTrackerMissing = await IsVersionTrackerMissingAsync().ConfigureAwait(false); if (isVersionTrackerMissing) { - InitializeNewProjectionVersion(); + await InitializeNewProjectionVersionAsync().ConfigureAwait(false); if (version.ProjectionName.Equals(ProjectionVersionsHandler.ContractId, StringComparison.OrdinalIgnoreCase) == false) return true; From 117bf472cb1bec5a80ca44bf8a30a54b3da7d0d3 Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Mon, 11 May 2026 12:30:33 +0300 Subject: [PATCH 20/21] fix: Disables PR-validation builds in Azure Pipelines --- ci/azure-pipelines.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ci/azure-pipelines.yml b/ci/azure-pipelines.yml index 949a7418..4539f720 100644 --- a/ci/azure-pipelines.yml +++ b/ci/azure-pipelines.yml @@ -10,6 +10,8 @@ trigger: paths: exclude: [CHANGELOG.md] +pr: none + pool: vmImage: 'ubuntu-22.04' From 2c860b6f3669a6de45c6c8a68e031e73c1a36b8b Mon Sep 17 00:00:00 2001 From: Kalin Venkov Date: Mon, 11 May 2026 14:17:52 +0300 Subject: [PATCH 21/21] fix: Force-refreshes tags before semantic-release in Azure Pipelines --- ci/azure-pipelines.yml | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/ci/azure-pipelines.yml b/ci/azure-pipelines.yml index 4539f720..ff164b49 100644 --- a/ci/azure-pipelines.yml +++ b/ci/azure-pipelines.yml @@ -72,6 +72,12 @@ stages: inputs: targetType: 'inline' script: | + # Force-refresh tags from origin. git fetch's default tag policy does NOT + # update tags that already exist locally, which leaves the agent's workspace + # stuck on stale tag positions across builds (specifically when a tag was + # force-moved on origin). semantic-release then mis-computes the next + # version. --force --tags makes the agent always match origin's tag refs. + git fetch --tags --force origin time curl -L https://github.com/Elders/blob/releases/download/SemRel-01/node_modules.tar.gz | tar mx -I pigz time npx semantic-release --no-ci # few commands for debugging purposes