Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
- [Reservation Engine Plan](./reservation-engine-plan.md)
- [Reservation Engine Semantics](./reservation-semantics.md)
- [Reservation Runtime Seam Evaluation](./reservation-runtime-seam-evaluation.md)
- [Runtime Extraction Roadmap](./runtime-extraction-roadmap.md)
- [Revoke Safety Slice](./revoke-safety-slice.md)
- [Operator Runbook](./operator-runbook.md)
- [KubeVirt Jepsen Report](./kubevirt-jepsen-report.md)
Expand Down
148 changes: 148 additions & 0 deletions docs/runtime-extraction-roadmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Runtime Extraction Roadmap

## Purpose

This document defines the path from the current engine family to something that can honestly be
called a general internal DB-building library.

The current state is:

- `allocdb-core`, `quota-core`, and `reservation-core` all exist on `main`
- the engine thesis is proven strongly enough
- a broad shared runtime is still premature
- `retire_queue` is the first justified micro-extraction candidate

The goal is not to market a framework early. The goal is to extract only the runtime substrate that
has actually stabilized under multiple engines.

## End State

We should only call this a general internal DB-building library when all of the following are true:

- more than one runtime module is shared cleanly across engines
- the shared-vs-domain boundary is explicit and stable
- a new engine or engine slice can be built with materially less copy-paste
- extraction reduces maintenance cost more than it adds abstraction cost

Until then, the honest description remains:

- multiple deterministic engines
- emerging shared runtime

## Milestone Shape

### M12: First Internal Runtime Extractions

Goal:

- extract the smallest runtime pieces that are already mechanically shared

Scope:

- `retire_queue`
- `wal`
- `wal_file`
- `snapshot_file` only if the file-level discipline stays separable from snapshot schemas

Non-goals:

- no public framework story
- no snapshot schema extraction
- no recovery API extraction
- no state-machine trait layer

Exit criteria:

- extracted modules are used by all applicable engines
- behavior is unchanged
- tests stay green without new abstraction leaks

### M13: Internal Engine Authoring Contract

Goal:

- define the stable boundary between shared runtime and engine-local semantics

Scope:

- one internal runtime contract note
- explicit ownership of:
- bounded collections
- durable frame/file helpers
- snapshot-file discipline
- recovery helper seams, if any
- explicit non-ownership of:
- command schemas
- result surfaces
- snapshot schemas
- state-machine semantics

Exit criteria:

- the contract is clear enough that another engine authoring pass is constrained by it

### M14: Fourth-Engine Or Reduced-Copy Proof

Goal:

- prove that the extracted substrate lowers authoring cost rather than only moving code around

Acceptable proof shapes:

- build a fourth engine against the extracted substrate, or
- retrofit one substantial new engine slice against the extracted substrate with clearly reduced
copy-paste and no correctness regression

Exit criteria:

- one new engine or engine slice uses the extracted substrate directly
- the reduction in duplicated runtime code is obvious
- the authoring contract survives contact with real implementation work

## Recommended Issue Shape

### M12

- `M12`: Extract the first internal shared runtime substrate from the three-engine family
- `M12-T01`: Extract shared `retire_queue`
- `M12-T02`: Extract shared `wal`
- `M12-T03`: Extract shared `wal_file`
- `M12-T04`: Evaluate and, if still clean, extract shared `snapshot_file`

### M13

- `M13`: Define the internal engine authoring boundary after the first extractions
- `M13-T01`: Write the internal runtime-vs-engine contract
- `M13-T02`: Reassess whether a fourth-engine proof is still required or whether the extracted
substrate already lowered authoring cost enough

### M14

- `M14`: Prove the extracted substrate lowers engine-authoring cost
- `M14-T01`: Build one new engine or engine slice against the extracted substrate
- `M14-T02`: Re-evaluate whether the repository can now honestly claim an internal DB-building
library

## Execution Rules

- extract smallest-first
- after each micro-extraction, stop and verify before continuing
- if one extraction introduces awkward generic plumbing, stop and reassess rather than force the
sequence
- keep domain logic local even if runtime discipline is shared

## Current Recommendation

Do this next:

1. `M12-T01` shared `retire_queue`
2. `M12-T02` shared `wal`
3. `M12-T03` shared `wal_file`
4. only then decide whether `snapshot_file` is still clean enough to extract

Do not do this next:

- public framework branding
- generic state-machine APIs
- generic snapshot schemas
- extracting recovery entry points before the lower layers stabilize
12 changes: 6 additions & 6 deletions docs/status.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AllocDB Status
## Current State
- Phase: replicated implementation with external Jepsen gate closed, M9 lease-kernel follow-on live-validated, M10 second-engine proof merged, and M11 third-engine proof merged
- Phase: replicated implementation with external Jepsen gate closed, M9 lease-kernel follow-on live-validated, M10 second-engine proof merged, M11 third-engine proof merged, and M12 runtime-extraction roadmap staged
- Planning IDs: tasks use `M#-T#`; spikes use `M#-S#`
- Current milestone status:
- `M0` semantics freeze: complete enough for core work
Expand All @@ -16,6 +16,7 @@
- `M9` generic lease-kernel follow-on: implementation merged on `main`
- `M10` second-engine proof: merged on `main`; shared runtime extraction deferred
- `M11` third-engine proof: merged on `main`; broad shared runtime still deferred, first micro-extraction now justified
- `M12` first internal runtime extractions: planned
- Latest completed implementation chunks:
- `4156a80` `Bootstrap AllocDB core and docs`
- `f84a641` `Add WAL file and snapshot recovery primitives`
Expand Down Expand Up @@ -212,9 +213,8 @@
simulation coverage are now all in the mainline implementation
- PR `#97` merged issue `#96`, extending Jepsen history generation and analysis for bundle
reserve, revoke/reclaim, and stale-holder lease paths, then closing the loop with live KubeVirt
`lease_safety-control` and full `1800s` `lease_safety-crash-restart` evidence on `allocdb-a`,
both with `blockers=0`
`lease_safety-control` and full `1800s` `lease_safety-crash-restart` evidence on `allocdb-a` with `blockers=0`
- the next recommended step remains downstream real-cluster e2e work such as `gpu_control_plane`, not more unplanned lease-kernel semantics work; the current deployment slice covers a first in-cluster `StatefulSet` shape, but bootstrap-primary routing, failover/rejoin orchestration, and background maintenance remain operator work, and the current staging unblock path is to publish `skel84/allocdb` from GitHub Actions rather than relying on the local Docker engine
- PR `#107` merged the `M10` quota-engine proof on `main`: `quota-core` now proves a second deterministic engine in-repo with bounded `CreateBucket` / `Debit`, logical-slot refill, and snapshot/WAL recovery; the `M10-T05` seam evaluation still concludes that shared runtime extraction is premature, with `retire_queue` the closest candidate and the rest still engine-local
- PRs `#116`, `#117`, and `#118` merged the full `M11` reservation-core chain on `main`: scaffold, deterministic hold lifecycle, logical-slot overdue expiry, and expiry/recovery proof are now in the mainline implementation
- PR `#118` also closes the third-engine readout: `retire_queue` is now the first justified internal extraction candidate across all three engines, while a broad `dsm-runtime` or public DB-building library is still premature; `wal`, `wal_file`, and `snapshot_file` are the next likely internal seams only after that micro-extraction lands
- PR `#107` merged the `M10` quota-engine proof on `main`, and PRs `#116`, `#117`, and `#118` merged the full `M11` reservation-core chain on `main`: the repository now has a second and third deterministic engine with bounded command sets, logical-slot refill/expiry, and snapshot/WAL recovery proofs
- the `M10-T05` and `M11-T05` readouts still defer broad shared-runtime extraction: `retire_queue` is the first justified internal extraction candidate, while `wal`, `wal_file`, and `snapshot_file` remain the next likely seams only after that micro-extraction lands
- the next roadmap is now explicit in `runtime-extraction-roadmap.md`: start with `retire_queue`, then `wal`, then `wal_file`, and only then decide whether `snapshot_file` is still clean enough to extract before defining the internal authoring contract and asking for a fourth-engine or reduced-copy proof
Loading