Skip to content

🤖 feat: add kind dev cluster scaffolding and k9s tooling#27

Merged
ThomasK33 merged 2 commits into
mainfrom
operator-testing-fby9
Feb 10, 2026
Merged

🤖 feat: add kind dev cluster scaffolding and k9s tooling#27
ThomasK33 merged 2 commits into
mainfrom
operator-testing-fby9

Conversation

@ThomasK33

@ThomasK33 ThomasK33 commented Feb 10, 2026

Copy link
Copy Markdown
Member

Summary

This PR adds a repeatable KIND-based local dev/demo workflow for the controller and k9s-driven cluster inspection.

Background

The repo already uses KIND for CI smoke tests, but local setup previously required manual steps and separate tool installation. This change makes the workflow repo-native and easier to run in Coder workspaces.

Implementation

  • Added Kubernetes demo tools (kubectl, kind, k9s) to the Nix devshell.
  • Added hack/kind-dev.sh with commands for:
    • up, ctx, load-image, k9s, status, down
  • Added Makefile wrappers for all kind-dev script commands.
  • Added README documentation for the KIND + k9s development loop.
  • Added .mux/skills/kind-dev/SKILL.md for per-workspace agent usage.

Validation

  • bash -n hack/kind-dev.sh
  • make -n kind-dev-up kind-dev-ctx kind-dev-load-image kind-dev-status kind-dev-k9s kind-dev-down
  • make build
  • make lint
  • make test
  • make verify-vendor

Risks

Low risk. Changes are additive and primarily developer tooling/documentation. Potential impact is limited to local dev ergonomics and optional scripts.


📋 Implementation Plan

Add k9s to Nix devshell + KIND dev-cluster scaffolding

Context / Why

You’re developing coder-k8s from a Coder workspace and want a “real” Kubernetes cluster so you can demo the operator with tools like k9s (instead of only envtest). The repo already has a minimal KIND-based smoke test in CI; the goal is to make that workflow easy and repeatable locally by:

  1. adding k9s to the Nix devshell, and
  2. adding small repo-native scripts (in hack/) that spin up a KIND cluster and install prerequisites (CRDs/RBAC + optional namespace/SA) so you can start the controller yourself and then demo it with k9s.

Evidence (repo reality)

  • flake.nix devshell currently includes Go/tooling but does not include k9s.
  • CI’s .github/workflows/ci.yaml has an e2e-kind job that:
    • builds a Linux binary named coder-k8s in the repo root (go build -o coder-k8s ./ with GOOS=linux, GOARCH=amd64, CGO_ENABLED=0)
    • builds a distroless image via Dockerfile.goreleaser
    • loads it into KIND and applies config/crd/bases/, config/rbac/, and config/e2e/, then applies the sample CR.
  • Dockerfile.goreleaser expects the binary at the repo root: COPY coder-k8s /coder-k8s.
  • config/e2e/deployment.yaml deploys the controller into namespace coder-system as deployment coder-k8s with image ghcr.io/coder/coder-k8s:e2e and imagePullPolicy: Never (so kind load is required).
  • Makefile’s build target runs go build ./... and does not produce the ./coder-k8s binary needed by Dockerfile.goreleaser.
  • hack/ currently contains standalone bash scripts (update-*.sh) and is the best-fit location for “dev loop” helpers.
Note on what the controller does today

internal/controller/codercontrolplane_controller.go is currently a no-op skeleton: it fetches the CR and logs at verbosity V(1), but does not create any dependent objects or update Status. The dev-cluster scaffolding below will still be useful (you can demo CRD install + controller reconcile triggers/logs), but a future small enhancement could set status.phase or emit an Event to make k9s demos more visually interesting.


Implementation plan

1) Add k9s, kubectl, and kind to the Nix devshell

File: flake.nix

Add k9s, kubectl, and kind to the packages list of the default devShell.

# flake.nix (devShells.default)
packages = with pkgs; [
  go
  gnumake
  git

  # Kubernetes dev/demo tools
  kubectl
  kind
  k9s

  goreleaser
  actionlint
  zizmor
  golangci-lint
  govulncheck

  docsPython
];

This makes nix develop sufficient for spinning up a KIND cluster and running k9s demos without ad-hoc installs.


2) Add a KIND dev-cluster setup script: hack/kind-dev.sh

File: hack/kind-dev.sh (new, executable)

Create one entrypoint script with subcommands to avoid duplicating bash boilerplate across multiple files.

Goals

  • Mirror CI for the bootstrap pieces (CRDs/RBAC + the coder-system namespace/ServiceAccount), but do not start the controller for you.
  • Be idempotent where practical (safe to re-run up).
  • Print clear “next steps” commands so you can start the controller yourself (out-of-cluster or in-cluster) and then demo with k9s.

Interface

  • ./hack/kind-dev.sh up — create cluster (if needed) + install CRDs/RBAC + create coder-system namespace/ServiceAccount/Binding (does not deploy the controller) + set current kubectl context to the cluster
  • ./hack/kind-dev.sh ctx — set current kubectl context to the cluster (short for “context”; useful if you overrode CLUSTER_NAME or switched away)
  • ./hack/kind-dev.sh load-image — build a Linux ./coder-k8s binary, build the container image, and kind load it (pre-req for in-cluster deployment)
  • ./hack/kind-dev.sh k9s — open k9s on the cluster context
  • ./hack/kind-dev.sh status — print useful kubectl status output
  • ./hack/kind-dev.sh down — delete the cluster

Defaults / config knobs (env vars)

  • CLUSTER_NAME default: coder-k8s-dev (if MUX_WORKSPACE_NAME is set, default to coder-k8s-${MUX_WORKSPACE_NAME} to avoid collisions); override via CLUSTER_NAME=...
  • KUBE_CONTEXT derived: kind-${CLUSTER_NAME}
  • NAMESPACE default: coder-system
  • DEPLOYMENT default: coder-k8s
  • IMAGE default: ghcr.io/coder/coder-k8s:e2e (must match config/e2e/deployment.yaml unless you also patch manifests)
  • GOARCH default: $(go env GOARCH) (with GOOS=linux, CGO_ENABLED=0)

Script shape (sketch)

#!/usr/bin/env bash
set -euo pipefail

ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "${ROOT}"

DEFAULT_CLUSTER_NAME="coder-k8s-dev"
if [[ -n "${MUX_WORKSPACE_NAME:-}" ]]; then
  DEFAULT_CLUSTER_NAME="coder-k8s-${MUX_WORKSPACE_NAME}"
fi

CLUSTER_NAME=${CLUSTER_NAME:-"${DEFAULT_CLUSTER_NAME}"}
KUBE_CONTEXT="kind-${CLUSTER_NAME}"
NAMESPACE=${NAMESPACE:-coder-system}
DEPLOYMENT=${DEPLOYMENT:-coder-k8s}
IMAGE=${IMAGE:-ghcr.io/coder/coder-k8s:e2e}
GOARCH=${GOARCH:-"$(go env GOARCH)"}

require_cmd() {
  command -v "$1" >/dev/null 2>&1 || {
    echo "assertion failed: missing required command: $1" >&2
    exit 1
  }
}

kubectl_ctx() {
  kubectl --context "${KUBE_CONTEXT}" "$@"
}

ensure_cluster() {
  if ! kind get clusters | grep -qx "${CLUSTER_NAME}"; then
    echo "assertion failed: kind cluster ${CLUSTER_NAME} does not exist (run: $0 up)" >&2
    exit 1
  fi
}

build_binary() {
  GOFLAGS=-mod=vendor CGO_ENABLED=0 GOOS=linux GOARCH="${GOARCH}" \
    go build -o coder-k8s ./
}

build_and_load_image() {
  docker build -f Dockerfile.goreleaser -t "${IMAGE}" .
  kind load docker-image "${IMAGE}" --name "${CLUSTER_NAME}"
}

cmd_up() {
  require_cmd kind
  require_cmd kubectl

  if ! kind get clusters | grep -qx "${CLUSTER_NAME}"; then
    kind create cluster --name "${CLUSTER_NAME}"
  fi

  # Ensure the apiserver is ready before applying manifests.
  kubectl_ctx wait --for=condition=Ready node --all --timeout=120s

  # Install CRDs + RBAC.
  kubectl_ctx apply -f config/crd/bases/
  kubectl_ctx apply -f config/rbac/

  # Prepare the namespace + ServiceAccount/Binding used by the in-cluster controller manifests.
  kubectl_ctx apply -f config/e2e/namespace.yaml
  kubectl_ctx apply -f config/e2e/serviceaccount.yaml
  kubectl_ctx apply -f config/e2e/clusterrole-binding.yaml

  # Convenience: switch kubectl's current context to this cluster.
  cmd_ctx

  echo
  echo "KIND cluster bootstrapped. Next steps:"
  echo
  echo "Run controller locally (out-of-cluster):"
  echo "  GOFLAGS=-mod=vendor go run . --app=controller"
  echo
  echo "OR deploy controller in-cluster:"
  echo "  $0 load-image"
  echo "  kubectl apply -f config/e2e/deployment.yaml"
  echo "  kubectl wait --for=condition=Available deploy/${DEPLOYMENT} -n ${NAMESPACE} --timeout=120s"
  echo
  echo "Then demo with k9s:"
  echo "  $0 k9s  # or: k9s"
}


cmd_ctx() {
  require_cmd kind
  require_cmd kubectl
  ensure_cluster

  # Ensure kubeconfig includes this kind cluster (helpful if KUBECONFIG changed).
  kind export kubeconfig --name "${CLUSTER_NAME}" >/dev/null

  kubectl config use-context "${KUBE_CONTEXT}" >/dev/null
  echo "Using kubectl context: ${KUBE_CONTEXT} (switch later with: $0 ctx)"
}

cmd_load_image() {
  require_cmd kind
  require_cmd docker
  require_cmd go
  ensure_cluster

  build_binary
  build_and_load_image

  echo "Loaded ${IMAGE} into kind cluster ${CLUSTER_NAME}."
}

cmd_k9s() {
  require_cmd k9s
  ensure_cluster
  exec k9s --context "${KUBE_CONTEXT}"
}

cmd_status() {
  ensure_cluster
  kubectl_ctx get nodes -o wide
  kubectl_ctx get codercontrolplanes -A || true
  kubectl_ctx -n "${NAMESPACE}" get deploy,pods -o wide || true
}

cmd_down() {
  require_cmd kind
  kind delete cluster --name "${CLUSTER_NAME}"
}

case "${1:-}" in
  up) cmd_up ;;
  ctx|context|use-context) cmd_ctx ;;
  load-image) cmd_load_image ;;
  k9s) cmd_k9s ;;
  status) cmd_status ;;
  down) cmd_down ;;
  *)
    echo "usage: $0 {up|ctx|load-image|k9s|status|down}" >&2
    exit 2
    ;;
esac

Notes:

  • Keep the script consistent with existing repo patterns: #!/usr/bin/env bash, set -euo pipefail, repo-root resolution, and “assertion failed:” wording for missing prerequisites.
  • Use explicit go build -o coder-k8s ./ (CI-compatible) rather than make build, because the Dockerfile requires the root binary.

3) (Recommended) Add Makefile wrappers for discoverability

File: Makefile

Add phony targets that call the script:

.PHONY: kind-dev-up kind-dev-ctx kind-dev-load-image kind-dev-down kind-dev-k9s kind-dev-status

kind-dev-up:
	./hack/kind-dev.sh up

kind-dev-ctx:
	./hack/kind-dev.sh ctx

kind-dev-load-image:
	./hack/kind-dev.sh load-image

kind-dev-status:
	./hack/kind-dev.sh status

kind-dev-k9s:
	./hack/kind-dev.sh k9s

kind-dev-down:
	./hack/kind-dev.sh down

This makes the KIND dev-cluster workflow discoverable via make help / tab completion and keeps the “entrypoint” stable.


4) (Recommended) Document the KIND dev loop in README

File: README.md

Add a short section like:

## KIND development cluster (for k9s demos)

Bootstrap a KIND cluster and install CRDs/RBAC (**this also switches your current kubectl context**):

    make kind-dev-up

> Tip: to run multiple clusters in parallel, override the name:
>
>     CLUSTER_NAME=my-cluster make kind-dev-up

If you need to switch your kubectl context later:

    make kind-dev-ctx
    # or: CLUSTER_NAME=my-cluster make kind-dev-ctx

Start the controller (pick one):

- Out-of-cluster (fast iteration):

        GOFLAGS=-mod=vendor go run . --app=controller

- In-cluster (closer to CI):

        make kind-dev-load-image
        kubectl apply -f config/e2e/deployment.yaml
        kubectl -n coder-system wait --for=condition=Available deploy/coder-k8s --timeout=120s

Demo:

    make kind-dev-k9s

Cleanup:

    make kind-dev-down

Mux users: there is an optional agent skill (`kind-dev`) under `.mux/skills/` with agent-oriented instructions for running per-workspace KIND clusters.

5) Add a Mux agent skill for per-workspace KIND clusters

Files: .mux/skills/kind-dev/SKILL.md

Create a lightweight Mux skill (docs-only, no bundled references) that agents can load on demand to run the KIND dev loop in parallel across workspaces.

The skill should emphasize:

  • Unique CLUSTER_NAME per workspace (recommend coder-k8s-${MUX_WORKSPACE_NAME})
  • Using the repo’s bootstrap script (./hack/kind-dev.sh) rather than re-encoding CI steps in agent prompts
  • Always using explicit contexts (kubectl --context kind-${CLUSTER_NAME} ...) to avoid acting on the wrong cluster

Suggested SKILL.md skeleton:

---
name: kind-dev
description: Per-workspace KIND clusters for coder-k8s dev + demos.
---

# KIND dev clusters (coder-k8s)

Load this skill only when you need a real Kubernetes cluster (KIND) during development or demos.

## Unique cluster names (parallel agents)
Prefer a per-workspace name to avoid collisions:

```bash
export CLUSTER_NAME="coder-k8s-${MUX_WORKSPACE_NAME:-dev}"

Bootstrap

./hack/kind-dev.sh up
kubectl --context kind-${CLUSTER_NAME} get nodes

Start controller (out-of-cluster)

# Ensure your current kubectl context points at the cluster (up already does this).
./hack/kind-dev.sh ctx

GOFLAGS=-mod=vendor go run . --app=controller

(Optional) In-cluster controller

./hack/kind-dev.sh load-image
kubectl --context kind-${CLUSTER_NAME} apply -f config/e2e/deployment.yaml
kubectl --context kind-${CLUSTER_NAME} -n coder-system wait --for=condition=Available deploy/coder-k8s --timeout=120s

Demo with k9s

k9s --context kind-${CLUSTER_NAME}

Cleanup

./hack/kind-dev.sh down

Validation (when implementing)

  1. Nix tools available:
    • nix develop -c kubectl version --client
    • nix develop -c kind version
    • nix develop -c k9s version
  2. Script sanity: bash -n hack/kind-dev.sh
  3. Bootstrap smoke test (use a deterministic cluster name):
    • CLUSTER_NAME=coder-k8s-dev ./hack/kind-dev.sh up
    • test "$(kubectl config current-context)" = "kind-coder-k8s-dev"
    • kubectl --context kind-coder-k8s-dev get crd codercontrolplanes.coder.com
    • kubectl --context kind-coder-k8s-dev get clusterrole manager-role
    • kubectl --context kind-coder-k8s-dev -n coder-system get sa coder-k8s
  4. Optional in-cluster controller smoke test:
    • CLUSTER_NAME=coder-k8s-dev ./hack/kind-dev.sh load-image
    • kubectl --context kind-coder-k8s-dev apply -f config/e2e/deployment.yaml
    • kubectl --context kind-coder-k8s-dev -n coder-system wait --for=condition=Available deploy/coder-k8s --timeout=120s
  5. Skill sanity:
    • test -f .mux/skills/kind-dev/SKILL.md
  6. Cleanup:
    • CLUSTER_NAME=coder-k8s-dev ./hack/kind-dev.sh down

Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh • Cost: $0.68

@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Please review this PR for correctness and readiness.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 68c079f361

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hack/kind-dev.sh
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Addressed feedback: exported kubeconfig/context before first kubectl call in kind-dev.sh up.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Feb 10, 2026
Merged via the queue into main with commit aa29d8b Feb 10, 2026
7 checks passed
@ThomasK33 ThomasK33 deleted the operator-testing-fby9 branch February 10, 2026 10:31
@ThomasK33

Copy link
Copy Markdown
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant