From c29d1c7ffd2944b76a56dc2a1841d6e65018cc61 Mon Sep 17 00:00:00 2001 From: Francesco Cislaghi Date: Thu, 26 Mar 2026 19:14:01 +0100 Subject: [PATCH] docs: write runtime vs engine contract --- docs/README.md | 1 + docs/runtime-vs-engine-contract.md | 162 +++++++++++++++++++++++++++++ docs/status.md | 2 +- 3 files changed, 164 insertions(+), 1 deletion(-) create mode 100644 docs/runtime-vs-engine-contract.md diff --git a/docs/README.md b/docs/README.md index a4403af..0f7f9bd 100644 --- a/docs/README.md +++ b/docs/README.md @@ -29,6 +29,7 @@ - [Reservation Engine Semantics](./reservation-semantics.md) - [Reservation Runtime Seam Evaluation](./reservation-runtime-seam-evaluation.md) - [Runtime Extraction Roadmap](./runtime-extraction-roadmap.md) +- [Runtime vs Engine Contract](./runtime-vs-engine-contract.md) - [Snapshot File Seam Evaluation](./snapshot-file-seam-evaluation.md) - [Revoke Safety Slice](./revoke-safety-slice.md) - [Operator Runbook](./operator-runbook.md) diff --git a/docs/runtime-vs-engine-contract.md b/docs/runtime-vs-engine-contract.md new file mode 100644 index 0000000..ef98ee8 --- /dev/null +++ b/docs/runtime-vs-engine-contract.md @@ -0,0 +1,162 @@ +# Runtime vs Engine Contract + +## Purpose + +This document is the focused internal contract for `M13-T01`. + +Use it when deciding whether new code belongs in the shared runtime substrate or inside one +engine. + +The rule is simple: + +- if the code only preserves bounded durable execution discipline, it may belong in shared runtime +- if the code defines domain meaning, it belongs in the engine + +## Shared Runtime Contract + +The shared runtime exists to preserve trusted substrate behavior across engines. + +It owns: + +- bounded retirement bookkeeping +- WAL frame encoding, validation, checksum, and torn-tail detection +- append-only WAL file mechanics +- rewrite and truncation file mechanics + +It currently maps to: + +- `allocdb-retire-queue` +- `allocdb-wal-frame` +- `allocdb-wal-file` + +### Shared runtime may know about + +- bytes +- lengths +- checksums +- frame versions +- file descriptors and paths +- bounded queue behavior +- truncation and rewrite discipline + +### Shared runtime must not know about + +- commands +- result codes +- resources, buckets, pools, holds, reservations, or leases +- snapshot schemas +- engine invariants +- replay semantics above raw framing + +## Engine Contract + +Each engine owns the database-specific meaning layered on top of the substrate. + +It owns: + +- command surfaces +- domain config +- state-machine invariants +- snapshot schemas +- recovery semantics +- read models and result surfaces + +Today that means each engine keeps local ownership of: + +- command enums and codecs above raw frame bytes +- snapshot encode/decode +- snapshot file wrappers while formats still differ +- top-level recovery entry points +- logical-slot behavior such as refill, expiry, revoke, reclaim, and fencing + +## Placement Rules + +When adding new code, apply these rules in order. + +### Rule 1 + +Start engine-local by default. + +Do not begin from "how can this be shared?" Begin from "what engine behavior am I expressing?" + +### Rule 2 + +Move code into shared runtime only if the seam is already proven. + +That means at least one of: + +- the code is mechanically identical across engines +- the same fix is being repeated in multiple engines +- a new engine slice would clearly avoid copy-paste by using the shared layer + +### Rule 3 + +Keep extraction below the semantic line. + +Good shared-runtime candidates: + +- durable bytes-on-disk framing +- bounded retirement structures +- file rewrite and truncation helpers + +Bad shared-runtime candidates: + +- generic state-machine traits +- generic reserve/confirm/release APIs +- generic snapshot schemas +- generic engine config layers + +### Rule 4 + +If an extraction needs engine-specific switches, it is not ready. + +Examples of bad signals: + +- feature flags that mirror engine names +- runtime branches on allocator/quota/reservation semantics +- generic types that only one engine can meaningfully use + +## Current Map + +### Shared now + +- `allocdb-retire-queue` +- `allocdb-wal-frame` +- `allocdb-wal-file` + +### Deferred + +- `snapshot_file` + - only clean inside the `quota-core` / `reservation-core` pair +- bounded collections beyond `retire_queue` + - still need stable multi-engine shape +- recovery helpers above frame/file mechanics + - still coupled to engine-local replay contracts + +### Explicitly engine-local + +- `allocdb-core` lease and fencing semantics +- `quota-core` debit and refill semantics +- `reservation-core` hold and expiry semantics + +## Authoring Checklist + +Before extracting any new module, answer these questions: + +1. Is this code below the semantic line? +2. Is the shape already proven across multiple engines? +3. Would extraction reduce copy-paste immediately? +4. Can the shared module avoid engine-specific branches? + +If any answer is "no", keep the code local. + +## Practical Use + +When writing a new engine or engine slice: + +1. use the shared runtime only for already-extracted substrate +2. implement new semantics locally +3. copy new runtime-adjacent code locally if the seam is still uncertain +4. extract later only under demonstrated pressure + +That keeps the repository honest and keeps future library claims evidence-based. diff --git a/docs/status.md b/docs/status.md index 5c1f972..daa4a7f 100644 --- a/docs/status.md +++ b/docs/status.md @@ -217,4 +217,4 @@ - the next recommended step remains downstream real-cluster e2e work such as `gpu_control_plane`, not more unplanned lease-kernel semantics work; the current deployment slice covers a first in-cluster `StatefulSet` shape, but bootstrap-primary routing, failover/rejoin orchestration, and background maintenance remain operator work, and the current staging unblock path is to publish `skel84/allocdb` from GitHub Actions rather than relying on the local Docker engine - PR `#107` merged the `M10` quota-engine proof on `main`, and PRs `#116`, `#117`, and `#118` merged the full `M11` reservation-core chain on `main`: the repository now has a second and third deterministic engine with bounded command sets, logical-slot refill/expiry, and snapshot/WAL recovery proofs - PRs `#132`, `#133`, and `#134` merged the first `M12` runtime extractions on `main`: `retire_queue`, `wal`, and `wal_file` are now shared internal substrate instead of copied engine-local modules, while `M12-T04` closed as a defer decision because `snapshot_file` is still only a clean seam inside the `quota-core` / `reservation-core` pair and `allocdb-core` keeps the simpler file format -- the next roadmap step is now `M13`: define the internal engine authoring boundary in `runtime-extraction-roadmap.md` and stop extraction pressure until that contract is written down; the authoring rule is to keep shared runtime below the semantic line and keep command surfaces, snapshot schemas, recovery entry points, and state-machine meaning engine-local +- the next roadmap step is now `M13`: define the internal engine authoring boundary in `runtime-extraction-roadmap.md` and stop extraction pressure until that contract is written down; the authoring rule is to keep shared runtime below the semantic line and keep command surfaces, snapshot schemas, recovery entry points, and state-machine meaning engine-local, then publish the focused `runtime-vs-engine-contract` note as the shorter authoring reference for future engine work