feat: Introduce v1alpha2 version of LlamaStackDistribution CRD by VaishnaviHire · Pull Request #253 · llamastack/llama-stack-k8s-operator

VaishnaviHire · 2026-02-23T14:51:37Z

This PR introduces the v1alpha2 API version for the LlamaStackDistribution CRD, enabling declarative, Kubernetes-native configuration of LlamaStack servers. Instead of requiring users to manually craft and supply a config.yaml via ConfigMap (as in v1alpha1), the operator now generates the server configuration automatically from structured CR fields (providers, resources, storage, networking). Both API versions are served concurrently with full conversion webhook support.

v1alpha2 Example

The v1alpha2 API replaces environment-variable-driven configuration with structured, declarative fields. All provider fields use typed []ProviderConfig slices with CEL validation.

Basic : single Ollama provider:

apiVersion: llamastack.io/v1alpha2
kind: LlamaStackDistribution
metadata:
  name: llamastackdistribution-v1alpha2-sample
spec:
  distribution:
    name: starter
  providers:
    inference:
      - provider: ollama
        endpoint: http://ollama-server-service.ollama-dist.svc.cluster.local:11434/v1
  resources:
    models:
      - name: "llama3.2:1b"
  networking:
    port: 8321
  workload:
    replicas: 1

Advanced : vLLM with secret refs, PostgreSQL storage, pgvector:

apiVersion: llamastack.io/v1alpha2
kind: LlamaStackDistribution
metadata:
  name: llamastack-vllm-pg
spec:
  distribution:
    name: starter
  providers:
    inference:
      - provider: vllm
        endpoint: http://vllm-service.vllm.svc.cluster.local:8000/v1
        secretRefs:
          api_key:
            name: vllm-creds
            key: token
    vectorIo:
      - provider: pgvector
        secretRefs:
          host:
            name: pg-credentials
            key: host
        settings:
          port: 5432
          db: llamastack
  resources:
    models:
      - name: llama3.2-8b
  storage:
    kv:
      type: redis
      endpoint: redis://redis-service.redis.svc.cluster.local:6379
    sql:
      type: postgres
      connectionString:
        name: pg-credentials
        key: dsn
  disabled:
    - safety
  workload:
    replicas: 2

Review Guide

This is a large PR (86 files, ~18k lines). The sections below group changes by area matching the commit structure. Each section is self-contained and can be reviewed independently.

1. v1alpha2 CRD Schema & Conversion (commit: `e7cc5c2`)

File	What to review
`api/v1alpha2/llamastackdistribution_types.go`	New spec/status types. Key design: typed `[]ProviderConfig` slices with CEL validation for provider ID uniqueness. `OverrideConfig` is mutually exclusive with providers/resources/storage/disabled.
`api/v1alpha2/zz_generated.deepcopy.go`	Auto-generated
`api/v1alpha1/llamastackdistribution_conversion.go`	Bidirectional v1alpha1 ↔ v1alpha2 conversion. Uses JSON blob annotations (`annV1Alpha1Extras`, `annV1Alpha2Extras`) for lossless round-trips in both directions.
`api/v1alpha1/llamastackdistribution_conversion_test.go`	Round-trip tests: providers, resources, storage, disabled, TLS, expose hostname, status fields
`config/crd/bases/llamastack.io_llamastackdistributions.yaml`	Generated CRD YAML with OpenAPI and CEL rules

2. Validating Webhook (commit: `a45c9f6`)

File	What to review
`api/v1alpha2/llamastackdistribution_webhook.go`	Validating webhook: provider ID uniqueness across all API types, distribution name validation, model provider reference checks
`api/v1alpha2/llamastackdistribution_webhook_test.go`	Unit tests: cross-slice collision detection, `deriveProviderID` behavior, unknown distribution rejection, edge cases
`config/webhook/*`	Webhook service, manifests, kustomize config
`config/certmanager/*`	Certificate and issuer for vanilla Kubernetes
`config/crd/kustomization.yaml`	Enabled webhook/cert-manager patches
`config/default/kustomization.yaml`	Enabled webhook and cert-manager components
`config/default/manager_webhook_patch.yaml`	Webhook port and TLS cert volume mount
`main.go`	Webhook registration

3. Config Generation Pipeline (commit: `3feb572`)

File	What to review
`pkg/config/config.go`	Pipeline: resolve base config → expand providers → expand resources → apply storage → apply disabled APIs → clean registered_resources → override port → render YAML. Key: deep-copy safety, deterministic output
`pkg/config/provider.go`	Provider expansion: `remote::` prefix, endpoint → `base_url`, sorted secret ref iteration, settings merge with override protection
`pkg/config/resource.go`	Model/tool/shield expansion with default provider resolution and provider existence validation
`pkg/config/storage.go`	KV (sqlite/redis) and SQL (sqlite/postgres) with secret env var mapping
`pkg/config/secret_resolver.go`	Resolves `secretRefs` maps to env vars (`LLSD_<PROVIDER_ID>_<KEY>`) and `${env.VAR_NAME}` substitutions
`pkg/config/resolver.go`	Base config resolver: resolves embedded configs by distribution name
`pkg/config/version.go`	Config version detection (supports versions 1-2)
`pkg/config/types.go`	Shared types: `BaseConfig`, `ProviderEntry`, `GeneratedConfig`
`pkg/config/config_test.go`	Unit tests: determinism, provider/resource expansion, storage, secret resolution, disabled API cleanup, deep-copy safety
`pkg/config/configs/*/config.yaml`	Embedded base configs for `starter`, `starter-gpu`, `postgres-demo` distributions
`distributions.json`	Distribution metadata

4. Controller Integration (commit: `bf90527`)

File	What to review
`controllers/v1alpha2_config.go`	v1alpha2 config handling: determines config source (override / generated / default), creates immutable ConfigMaps with content-hash naming, validates secret/ConfigMap refs, injects secret-backed env vars into pod spec, cleans up old ConfigMaps
`controllers/llamastackdistribution_controller.go`	Integration: calls `handleV1Alpha2NativeConfig` before standard reconcile. Dual status update path for v1alpha2 CRs
`controllers/kubebuilder_rbac.go`	RBAC markers: added `secrets` (get/list/watch) and `configmaps` (delete)
`controllers/resource_helper.go`	Deprecated `startupScript`. Sets `RUN_CONFIG_PATH` env var; image's built-in `entrypoint.sh` handles startup
`controllers/resource_helper_test.go`	Updated assertions: `RUN_CONFIG_PATH` instead of command/args overrides
`controllers/suite_test.go`	Envtest setup with webhook server
`controllers/testing_support_test.go`	Test constants and helpers
`controllers/llamastackdistribution_controller_test.go`	Envtest integration tests: config generation, ConfigMap creation, secret env var injection, status updates
`config/rbac/role.yaml`	Generated `ClusterRole` with secrets and configmaps-delete permissions

5. OpenShift Webhook Overlay (commit: `e3f405b`)

File	What to review
`config/openshift/kustomization.yaml`	OpenShift overlay: replaces cert-manager with service-serving certificates
`config/openshift/crd_ca_patch.yaml`	CRD CA injection annotation
`config/openshift/manager_webhook_patch.yaml`	Manager cert volume mount
`config/openshift/webhook_ca_patch.yaml`	Webhook CA injection annotation

6. E2E Tests (commit: `c6d24e8`)

File	What to review
`tests/e2e/creation_v1alpha2_test.go`	v1alpha2 CR creation, ConfigMap generation, Ready phase, secret env var injection into Deployment
`tests/e2e/conversion_test.go`	Cross-version read (v1alpha1 as v1alpha2 and vice versa)
`tests/e2e/webhook_validation_test.go`	Webhook rejects: missing distribution, duplicate provider IDs, invalid provider references
`tests/e2e/validation_test.go`	CRD structure, webhook service/TLS, operator readiness
`tests/e2e/creation_test.go`	Updated v1alpha1 creation tests
`tests/e2e/e2e_test.go`	Test suite registration
`tests/e2e/test_utils.go`	Test helpers
`.github/workflows/run-e2e-test.yml`	CI workflow updates for v1alpha2 targets

7. Documentation & Samples (commit: `3b6972b`)

File	What to review
`docs/migration-v1alpha1-to-v1alpha2.md`	Migration guide: field mapping tables, before/after examples, step-by-step migration
`docs/api-overview.md`	Full v1alpha2 API reference for both versions
`README.md`	Updated quick start with v1alpha2 examples using `secretRefs` and list syntax
`config/samples/v1alpha1/*`	Existing samples moved into versioned subdirectory
`config/samples/v1alpha2/*`	New v1alpha2 samples: basic, HA, vLLM+Postgres, networking
`specs/002-operator-generated-config/*`	Updated spec contracts and data model

8. Build Tooling & Release (commit: `b5a2df7`)

File	What to review
`Makefile`	Build target updates for v1alpha2
`.gitignore`	Ignore patterns for generated artifacts
`go.mod` / `go.sum`	Dependency updates
`release/operator.yaml`	Regenerated release manifest with all v1alpha2 resources

VaishnaviHire · 2026-03-05T15:07:45Z

@Mergifyio rebase

mergify · 2026-03-05T15:08:03Z

rebase

✅ Branch has been successfully rebased

pkg/config/config.go

controllers/v1alpha2_config.go

config/samples/v1alpha1/llamastackdistribution.yaml

api/v1alpha2/llamastackdistribution_webhook.go

pkg/config/secret_resolver.go

mfleader · 2026-03-06T21:51:26Z

controllers/llamastackdistribution_controller.go

+	logger := log.FromContext(ctx)
+
+	// Handle v1alpha2 native config generation before standard reconciliation.
+	v1a2Result, v1a2Err := r.handleV1Alpha2NativeConfig(ctx, key, instance)


Missing test coverage for FR-097 (preserve running Deployment on config generation failure).

Where is this covered? I don't see a test that creates a Deployment first, then fails config generation, then checks the Deployment is unchanged.

controllers/v1alpha2_config.go

pkg/config/secret_resolver.go

pkg/config/resource.go

api/v1alpha2/llamastackdistribution_webhook.go

tests/e2e/webhook_validation_test.go

eoinfennessy

Final few comments on the API.

eoinfennessy · 2026-03-20T10:37:00Z

api/v1alpha2/llamastackdistribution_types.go

+	// SecretRefs is a map of named secret references for provider-specific
+	// connection fields (e.g., host, password). Each key becomes the env var
+	// field suffix and maps to config.<key> with env var substitution:
+	// ${env.LLSD_<PROVIDER_ID>_<KEY>}. Use this instead of embedding
+	// secretKeyRef inside settings.
+	// +optional
+	// +kubebuilder:validation:MinProperties=1
+	SecretRefs map[string]SecretKeyRef `json:"secretRefs,omitempty"`


Just a thought. Should we remove the ApiKey field if it is possible for users to supply API_KEY here?

This one was a shorthand just for user convenience , since api key is one of the most common secret. I can remove this

eoinfennessy · 2026-03-20T10:40:10Z

api/v1alpha2/llamastackdistribution_types.go

+	// +kubebuilder:validation:MinItems=1
+	// +kubebuilder:validation:XValidation:rule="self.size() <= 1 || self.all(p, has(p.id))",message="each provider must have an explicit id when multiple providers are specified"
+	Inference []ProviderConfig `json:"inference,omitempty"`


Is it also possible to add a CEL check to ensure each ID is unique when multiple providers are specified?

Never mind. I see this is handled in webhook validation.

eoinfennessy · 2026-03-20T11:13:15Z

api/v1alpha2/llamastackdistribution_types.go

+	// Enabled activates external access via Ingress/Route.
+	// nil = not specified (no Ingress), false = explicitly disabled, true = create Ingress.
+	// +optional
+	Enabled *bool `json:"enabled,omitempty"`


I'm unsure why this is a pointer. Are we differentiating the behaviour of false and nil? Should this just be a bool?

The specs seem to suggest that the presence of a non-nil expose object enables ingress. Maybe the Enabled field is actually unnecessary?

expose omitted → Expose is nil → no Ingress

expose: {} → Expose is non-nil → create Ingress (with defaults)

expose: {hostname: "foo.example.com"} → create Ingress with that hostname

eoinfennessy · 2026-03-20T11:40:42Z

api/v1alpha2/llamastackdistribution_types.go

+// TLSSpec configures TLS for the LlamaStack server.
+// +kubebuilder:validation:XValidation:rule="!self.enabled || has(self.secretName)",message="secretName is required when TLS is enabled"
+// +kubebuilder:validation:XValidation:rule="!has(self.secretName) || self.enabled",message="secretName is only valid when TLS is enabled"
+// +kubebuilder:validation:XValidation:rule="!has(self.caBundle) || self.enabled",message="caBundle is only valid when TLS is enabled"
+type TLSSpec struct {
+	// Enabled enables TLS on the server.
+	// +optional
+	Enabled bool `json:"enabled,omitempty"`
+	// SecretName references a Kubernetes TLS Secret. Required when enabled is true.
+	// +optional
+	SecretName string `json:"secretName,omitempty"`
+	// CABundle configures custom CA certificates via ConfigMap reference.
+	// +optional
+	CABundle *CABundleConfig `json:"caBundle,omitempty"`
+}


Should we follow a similar pattern to what I described above for ExposeConfig? I.e. remove Enabled from this struct and make SecretName strictly required?

The presence or absence of tls will indicate whether or not it is enabled.

We don't need the SecretName here. I will remove it from spec. The only required files is ca-bundle configmap

I think it's no harm keeping SecretName here if it is our intent to serve LLS with TLS. It seems from the specs that this is the case.

Thinking more about this, TLSSpec is effectively configuring both incoming (SecretName) and outgoing (CABundle) TLS config. For that reason the Enabled boolean is overloaded here. I think we should rework this. How about the following? (@rhuss, hi, would appreciate your thoughts too if you have time)

# Before (conflates server and client; `enabled` is overloaded) networking: tls: enabled: true secretName: llama-tls caBundle: configMapName: custom-ca # After (separates server and client. Removes redundant `enabled`) # If a CA bundle is provided, client-side TLS is enabled # If TLS config is provided, server-side TLS is enabled networking: tls: secretName: llama-tls caBundle: configMapName: custom-ca

I think regarding SecretName, I can open a follow-up PR , since it will need additional verification for downstream. Keep the configmap to continue to support v1alpha1 features.

I had a good conversation with Claude about this and came up with the suggestion below:

Suggestion: Separate server TLS from CA trust configuration

The current TLSSpec conflates two distinct concerns:

Server TLS (serving) — the cert/key the LlamaStack server presents to incoming clients

CA trust (outbound) — custom CA certificates the server trusts when connecting to external services (provider endpoints, etc.)

These have different lifecycles, different audiences, and different security implications. As-is, adding a serving certificate secret to this struct would mix both under a single tls field in NetworkingSpec, making the API harder to reason about as it grows.

Proposed change:

Move caBundle to the top level of LlamaStackDistributionSpec. CA trust is a cross-cutting runtime concern (not a networking topology one), and caBundle already follows the dominant Kubernetes naming convention used by core webhooks, APIService, CRD conversion webhooks, and cert-manager.

Replace TLSSpec with a server TLS struct inside NetworkingSpec that holds the serving certificate secret reference. This is where server-side TLS naturally belongs.

# Before spec: networking: tls: caBundle: configMapName: my-ca-bundle

# After spec: caBundle: configMapName: my-ca-bundle networking: tls: secretName: my-serving-cert

This gives us clear semantics (each field does one thing), independent lifecycle (CA trust without server TLS and vice versa), and aligns with how the broader Kubernetes ecosystem models these concepts (e.g., OpenShift separates trustedCA from route TLS termination; Istio separates caCertificates from server gateway TLS).

eoinfennessy

Review of config gen pipeline. There are some critical issues that need to be addressed.

pkg/config/config.go

eoinfennessy · 2026-03-20T12:07:59Z

pkg/config/provider.go

+	for k, v := range settingsMap {
+		cfg[k] = v
+	}


It's possible that values from the settings map can override the endpoint, secret refs, and the API key.

Should we skip adding items that are already in cfg? And log a warning?

Or maybe add settings to the cfg map first and then add fields like base_url secret_refs and api_key?

We need to be careful that secret_refs can't override api_key too (which it can currently).

eoinfennessy · 2026-03-20T12:13:32Z

pkg/config/provider.go

+	for key := range pc.SecretRefs {
+		ident := providerID + ":" + key
+		if sub, ok := substitutions[ident]; ok {
+			cfg[key] = sub
+		} else {
+			envName := GenerateEnvVarName(providerID, key)
+			cfg[key] = "${env." + envName + "}"
+		}
+	}


The order of iteration is non-deterministic. This could potentially cause unnecessary Deployment updates.

We should sort the keys before iterating.

eoinfennessy · 2026-03-20T12:14:29Z

pkg/config/secret_resolver.go

+			for key, ref := range pc.SecretRefs {
+				addSecretToResolution(resolution, secretRefEntry{
+					ProviderID: providerID,
+					Field:      key,
+					SecretName: ref.Name,
+					SecretKey:  ref.Key,
+				})


The order of iteration is non-deterministic. This could potentially cause unnecessary Deployment updates.

We should sort the keys before iterating.

eoinfennessy · 2026-03-20T12:51:07Z

pkg/config/config.go

+// apiNameToConfigKey maps CRD-style camelCase API names to config.yaml snake_case keys.
+var apiNameToConfigKey = map[string]string{
+	"vectorIo":     "vector_io",
+	"toolRuntime":  "tool_runtime",
+	"postTraining": "post_training",
+	"datasetIo":    "datasetio",
+}
+
+// normalizeAPIName converts a CRD-style camelCase API name to the config.yaml
+// snake_case key. Names already in snake_case pass through unchanged.
+func normalizeAPIName(api string) string {
+	if mapped, ok := apiNameToConfigKey[api]; ok {
+		return mapped
+	}
+	return api
+}


This is all effectively dead code because the disabled enum already specifies snake-case values.

eoinfennessy · 2026-03-20T13:01:27Z

pkg/config/config.go

+// RenderConfigYAML serializes the config to deterministic YAML.
+func RenderConfigYAML(config *BaseConfig) (string, error) {
+	// Build an ordered map for deterministic output
+	out := buildOrderedConfig(config)
+
+	data, err := yaml.Marshal(out)
+	if err != nil {
+		return "", fmt.Errorf("failed to marshal config YAML: %w", err)
+	}
+
+	return string(data), nil
+}


This function mutates the provided config, which is unconventional and unexpected for a render function.

Consider having buildOrderedConfig write to out["registered_resources"] instead of config.Extra["registered_resources"].

eoinfennessy · 2026-03-20T13:10:02Z

pkg/config/types.go

+	EvalStore         map[string]interface{}   `json:"eval_store,omitempty"          yaml:"eval_store,omitempty"`
+	DatasetIOStore    map[string]interface{}   `json:"datasetio_store,omitempty"     yaml:"datasetio_store,omitempty"`
+	Server            map[string]interface{}   `json:"server,omitempty"              yaml:"server,omitempty"`
+	ExternalProviders map[string]interface{}   `json:"external_providers,omitempty"  yaml:"external_providers,omitempty"`


This field is effectively unused because we never use it in buildOrderedConfig.

We should delete it to avoid confusion.

eoinfennessy · 2026-03-20T13:13:31Z

pkg/config/resource.go

+		}
+
+		if mc.ContextLength != nil && *mc.ContextLength > 0 {
+			if entry["provider_model_id"] == nil {


This check is not needed. It is always true

eoinfennessy · 2026-03-20T13:33:25Z

pkg/config/resource.go

+		provider := mc.Provider
+		if provider == "" {
+			provider = defaultProvider
+		}


We have no validation that the model's provider ID actually exists.

The model can be registered with a non-existent provider and no error is returned. The llama-stack server would fail at startup with a confusing error about an unknown provider.

We should consider adding validation for this at the admission layer if not too complex. Otherwise, we can validate here and return an error so the CR's status can reflect the issue to users.

eoinfennessy

Review of webhooks:

We need to comprehensively test all validation logic (webhook and CEL). Currently we don't do any testing of validation logic.

There are some problems with data loss and stale data in the conversion logic. I added a suggestion to fix this.

api/v1alpha2/llamastackdistribution_webhook.go

api/v1alpha1/llamastackdistribution_conversion.go

VaishnaviHire · 2026-03-23T07:05:18Z

@eoinfennessy I have addressed the comments and updated the commits. Please take a look.

eoinfennessy · 2026-03-23T11:11:48Z

@VaishnaviHire, thanks for addressing the comments. In future review cycles on this PR, please avoid squashing and force-pushing. Instead, please add new commits for each change. This makes it easier for me to review the changes that have been made between PR reviews, which is especially tricky in such a large PR.

mergify · 2026-03-27T13:01:06Z

This pull request has merge conflicts that must be resolved before it can be merged. @VaishnaviHire please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

eoinfennessy

I re-reviewed after the previous suggestions. Thanks for addressing these. Couple of things left from these reviews:

CEL validation tests: Let's add envtest tests to ensure all of our complex CEL validation is actually working.
Split TLSSpec for server and client: see my latest comment in the thread above discussing this
Small bug remaining in resource.go (see below)

eoinfennessy · 2026-03-27T14:46:37Z

pkg/config/resource.go

+		if provider == "" {
+			return nil, fmt.Errorf("failed to expand model %q: no provider specified and no default inference provider found", mc.Name)
+		}
+		if mc.Provider != "" && !providerExists(provider, userProviders, base) {


The check mc.Provider != "" means: "only validate if the user explicitly set a provider." But this creates a blind spot — if the user omits the provider and the default is used, no existence check happens at all. The default provider could be stale or wrong, and the error would only surface at LlamaStack server startup with a confusing message about an unknown provider.

The fix is simply:

if !providerExists(provider, userProviders, base) {

This validates the provider regardless of whether it came from the user or from the default, which is what you'd want — if we resolved a provider name, we should verify it exists.

eoinfennessy

Review focussing on /controller. Mostly minor suggestions, but a couple of major things related to surfacing errors and status.

eoinfennessy · 2026-03-27T15:12:28Z

controllers/llamastackdistribution_controller.go

+	// Handle v1alpha2 native config generation before standard reconciliation.
+	v1a2Result, v1a2Err := r.handleV1Alpha2NativeConfig(ctx, key, instance)
+	if v1a2Err != nil {
+		logger.Error(v1a2Err, "failed to handle v1alpha2 native config")
+	}


FR-097 states:

If config generation or validation fails during a CR update, the operator MUST preserve the current running Deployment (image, ConfigMap, env vars) unchanged and set status condition ConfigGenerated=False with the failure reason. The running instance MUST NOT be disrupted.

Two gaps:

ConfigGenerated=False is never set. When handleV1Alpha2NativeConfig fails, v1a2Result is nil, so finalizeReconciliation takes the else branch and calls updateStatus — which only writes v1alpha1 status fields. SetV1Alpha2Condition is only called inside persistV1Alpha2Status, which is only reached on success. The constant ReasonConfigGenFailed is declared but unused. The failure is logged to operator stdout but never surfaces in .status.conditions.

No structural guarantee the Deployment is preserved. handleV1Alpha2NativeConfig mutates v1Instance in-place (setting UserConfig and appending env vars) after all fallible operations, then reconcileResources uses the same pointer to reconcile the Deployment. Today the mutation ordering is safe — mutations happen last, after all fallible steps. But this is an implicit invariant: if a fallible step is later added after the UserConfig assignment, reconcileResources would reconcile against a half-modified spec, potentially pointing the Deployment at a ConfigMap that doesn't exist.

Suggested approach: On handleV1Alpha2NativeConfig error, skip reconcileResources, persist ConfigGenerated=False with the failure reason, and return the error to requeue. This satisfies both halves of FR-097: the Deployment is untouched and the failure is visible in status.

There's also no test asserting that .status.conditions contains ConfigGenerated=False after a failed config generation.

eoinfennessy · 2026-03-27T15:14:15Z

controllers/v1alpha2_config.go

+// v1alpha2 Condition reasons.
+const (
+	ReasonConfigGenSucceeded = "ConfigGenerationSucceeded"
+	ReasonConfigGenFailed    = "ConfigGenerationFailed"


This is unused

eoinfennessy · 2026-03-27T15:22:22Z

controllers/llamastackdistribution_controller.go

+		if err := r.persistV1Alpha2Status(ctx, key, instance, v1a2Result); err != nil {
+			logger.Error(err, "failed to update v1alpha2 status")
+		}


We should return the status update error to match v1alpha1 behaviour in the else block, and ensure the status is eventually consistent.

eoinfennessy · 2026-03-27T15:23:45Z

controllers/llamastackdistribution_controller.go


 // updateStatus refreshes the LlamaStack status.
-func (r *LlamaStackDistributionReconciler) updateStatus(ctx context.Context, instance *llamav1alpha1.LlamaStackDistribution, reconcileErr error) error {
+// computeStatus computes all status fields on the in-memory v1alpha1 instance


We should remove the stale updateStatus comment above this line

eoinfennessy · 2026-03-27T15:30:19Z

controllers/v1alpha2_config.go

+	for _, envVar := range resolution.EnvVars {
+		if envVar.ValueFrom == nil || envVar.ValueFrom.SecretKeyRef == nil {
+			continue
+		}
+
+		secretName := envVar.ValueFrom.SecretKeyRef.Name
+		secretKey := envVar.ValueFrom.SecretKeyRef.Key
+
+		secret := &corev1.Secret{}
+		if err := r.Get(ctx, types.NamespacedName{
+			Name:      secretName,
+			Namespace: namespace,
+		}, secret); err != nil {
+			if k8serrors.IsNotFound(err) {
+				return fmt.Errorf("failed to find Secret %q in namespace %q (referenced by env var %s)", secretName, namespace, envVar.Name)
+			}
+			return fmt.Errorf("failed to get Secret %q: %w", secretName, err)
+		}
+
+		if _, ok := secret.Data[secretKey]; !ok {
+			return fmt.Errorf("failed to find key %q in Secret %q in namespace %q", secretKey, secretName, namespace)
+		}
+	}


We could maybe consider aggregating errors here to provide a better UX.

eoinfennessy · 2026-03-27T15:32:18Z

controllers/v1alpha2_config.go

+	for i, mc := range spec.Resources.Models {
+		if mc.Provider != "" {
+			if _, ok := providerIDs[mc.Provider]; !ok {
+				return fmt.Errorf(
+					"resources.models[%d].provider: provider ID %q not found; available providers: %s",
+					i, mc.Provider, strings.Join(sortedKeys(providerIDs), ", "),
+				)
+			}
+		}
+	}


We could aggregate errors here too.

eoinfennessy · 2026-03-27T15:37:44Z

controllers/v1alpha2_config.go

+			} else {
+				status.Conditions[i].Reason = reason
+				status.Conditions[i].Message = message
+			}


There's no ObservedGeneration set on the condition. Without it, a client can't distinguish whether a ConfigGenerated=True condition was set for the current spec generation or a previous one. Consider setting condition.ObservedGeneration = instance.Generation to match the convention used by most Kubernetes controllers.

eoinfennessy · 2026-03-27T15:42:31Z

controllers/llamastackdistribution_controller_test.go

+	namespace := createTestNamespace(t, "test-v1alpha2-secret-ref")
+	operatorNamespace := createTestNamespace(t, "test-v1alpha2-secret-op")
+	t.Setenv("OPERATOR_NAMESPACE", operatorNamespace.Name)
+
+	// Create operator config ConfigMap (required by NewLlamaStackDistributionReconciler)
+	opConfig := &corev1.ConfigMap{
+		ObjectMeta: metav1.ObjectMeta{
+			Name:      "llama-stack-operator-config",
+			Namespace: operatorNamespace.Name,
+		},
+		Data: map[string]string{},
+	}
+	require.NoError(t, k8sClient.Create(t.Context(), opConfig))


This is repeated 6 times. Consider writing a helper:

func setupV1Alpha2Env(t *testing.T, prefix string) (ns *corev1.Namespace, opNs *corev1.Namespace)

eoinfennessy · 2026-03-27T15:43:33Z

controllers/llamastackdistribution_controller_test.go

+	clusterInfo := &cluster.ClusterInfo{
+		OperatorNamespace:  operatorNamespace.Name,
+		DistributionImages: map[string]string{"starter": testImage},
+	}
+	reconciler, err := controllers.NewLlamaStackDistributionReconciler(
+		t.Context(), k8sClient, scheme.Scheme, clusterInfo,
+	)
+	require.NoError(t, err)


This is repeated 5 times. Consider a helper:

func newV1Alpha2Reconciler(t *testing.T, opNamespace string) *controllers.LlamaStackDistributionReconciler

api/v1alpha2/llamastackdistribution_types.go

eoinfennessy · 2026-03-30T15:02:06Z

pkg/config/configs/starter-gpu/config.yaml

+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference


The agents API has been renamed to responses: llamastack/llama-stack#5195

We probably need to update this in all embedded configs

VaishnaviHire · 2026-03-30T18:44:34Z

@eoinfennessy I have addressed the comments. Additionally I added deploy time feature flag for v1alpha2 - an overlay v1alpha1-only that deploys only the v1alpha1 CRD.

Add typed v1alpha2 API (ProvidersSpec, ModelConfig, ExposeConfig, StorageSpec) with kubebuilder validation markers and CEL rules. Implement lossless v1alpha1<->v1alpha2 conversion via JSON-blob annotations for fields that have no v1alpha1 equivalent. Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Implement admission webhook that validates distribution names against the embedded registry, enforces unique provider IDs per category, and checks model provider references. Wire up cert-manager and webhook kustomize overlays. Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Build the config generation pipeline that renders a complete config.yaml from v1alpha2 spec fields (providers, resources, storage). Includes distribution registry, provider expansion, model/tool/shield resource resolution, storage configuration, secret-ref placeholder injection, and disabled-API pruning. Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Wire the config generation pipeline into the reconciliation loop. Adds v1alpha2 config source detection, ConfigMap creation with generated config.yaml, secret env-var injection into pod spec, RBAC permissions for secrets and configmap deletion, and controller-level integration tests. Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Add kustomize overlay for OpenShift deployments that patches the webhook configuration to use the service-serving-cert-signer CA instead of cert-manager, along with SCC-compatible manager patches. Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Add end-to-end tests covering v1alpha2 CR creation, conversion round-trips, webhook validation rejection, secret env-var injection, and TLS configuration. Refactor existing e2e tests into focused test files with shared utilities. Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Reorganize sample CRs into v1alpha1/ and v1alpha2/ subdirectories. Add v1alpha2 sample CRs (vLLM+Postgres, HA, networking), API overview, and v1alpha1-to-v1alpha2 migration guide. Update README with v1alpha2 quick-start examples. Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Update Makefile with webhook cert-manager targets, add go module dependencies for the config pipeline and webhook infrastructure, and regenerate the release operator manifest. Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Add v1alpha1-only overlay for deployment. This allows the v1alpha2 api to be incrementally enabled for GA releases. Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Remove support for eval, safety and related apis Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

VaishnaviHire requested review from cdoern, derekhiggins, leseb, mfleader, nathan-weinberg, rhdedgar and rhuss as code owners February 23, 2026 14:51

VaishnaviHire marked this pull request as draft February 23, 2026 14:51

VaishnaviHire removed request for cdoern, leseb and nathan-weinberg February 23, 2026 14:52

VaishnaviHire force-pushed the implement_run_config_schema branch 4 times, most recently from 28c7f4a to afec277 Compare February 27, 2026 08:03

VaishnaviHire force-pushed the implement_run_config_schema branch from b760e03 to 1843bd3 Compare March 3, 2026 14:33

VaishnaviHire changed the title ~~[DRAFT] Implement run config schema~~ feat: Introduce v1alpha2 version of LlamaStackDistribution CRD Mar 3, 2026

VaishnaviHire marked this pull request as ready for review March 3, 2026 14:42

VaishnaviHire force-pushed the implement_run_config_schema branch from 1843bd3 to 31e0e3c Compare March 5, 2026 15:08

VaishnaviHire requested a review from eoinfennessy March 5, 2026 15:11

mfleader requested changes Mar 5, 2026

View reviewed changes

jgarciao reviewed Mar 6, 2026

View reviewed changes

config/samples/v1alpha1/llamastackdistribution.yaml Show resolved Hide resolved

VaishnaviHire force-pushed the implement_run_config_schema branch 2 times, most recently from 3fde13b to 673d4e7 Compare March 6, 2026 17:21

mfleader self-requested a review March 6, 2026 19:26

mfleader requested changes Mar 6, 2026

View reviewed changes

api/v1alpha2/llamastackdistribution_webhook.go Show resolved Hide resolved

mfleader requested changes Mar 6, 2026

View reviewed changes

VaishnaviHire force-pushed the implement_run_config_schema branch from 673d4e7 to ac0683d Compare March 9, 2026 09:35

VaishnaviHire force-pushed the implement_run_config_schema branch from a38bd8f to fc77468 Compare March 20, 2026 06:08

eoinfennessy reviewed Mar 20, 2026

View reviewed changes

VaishnaviHire force-pushed the implement_run_config_schema branch from fc77468 to 958b6f3 Compare March 20, 2026 11:56

eoinfennessy reviewed Mar 20, 2026

View reviewed changes

api/v1alpha2/llamastackdistribution_webhook.go Show resolved Hide resolved

api/v1alpha1/llamastackdistribution_conversion.go Show resolved Hide resolved

api/v1alpha1/llamastackdistribution_conversion.go Show resolved Hide resolved

VaishnaviHire force-pushed the implement_run_config_schema branch from 958b6f3 to 4021d2f Compare March 23, 2026 07:00

eoinfennessy mentioned this pull request Mar 23, 2026

feat(api): v1alpha2 CRD types and config generation pipeline #266

Open

6 tasks

mergify bot added the needs-rebase label Mar 27, 2026

eoinfennessy reviewed Mar 27, 2026

View reviewed changes

eoinfennessy reviewed Mar 30, 2026

View reviewed changes

api/v1alpha2/llamastackdistribution_types.go Outdated Show resolved Hide resolved

eoinfennessy reviewed Mar 30, 2026

View reviewed changes

VaishnaviHire force-pushed the implement_run_config_schema branch from 4021d2f to 43e2ce6 Compare March 30, 2026 18:06

VaishnaviHire added 11 commits March 30, 2026 14:44

feat: Add overlay specific to v1alpha1 support

cb853e1

Add v1alpha1-only overlay for deployment. This allows the v1alpha2 api to be incrementally enabled for GA releases. Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Address comments for controller/

60db565

Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

Remove support for deprecated apis

2de6d1e

Remove support for eval, safety and related apis Signed-off-by: Vaishnavi Hire <vhire@redhat.com> Assisted-by : claude-4.6-opus

VaishnaviHire force-pushed the implement_run_config_schema branch from 43e2ce6 to 2de6d1e Compare March 30, 2026 18:44

mergify bot removed the needs-rebase label Mar 30, 2026

Conversation

VaishnaviHire commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

v1alpha2 Example

Review Guide

1. v1alpha2 CRD Schema & Conversion (commit: e7cc5c2)

2. Validating Webhook (commit: a45c9f6)

3. Config Generation Pipeline (commit: 3feb572)

4. Controller Integration (commit: bf90527)

5. OpenShift Webhook Overlay (commit: e3f405b)

6. E2E Tests (commit: c6d24e8)

7. Documentation & Samples (commit: 3b6972b)

8. Build Tooling & Release (commit: b5a2df7)

Uh oh!

VaishnaviHire commented Mar 5, 2026

Uh oh!

mergify bot commented Mar 5, 2026

✅ Branch has been successfully rebased

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eoinfennessy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eoinfennessy Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eoinfennessy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VaishnaviHire commented Feb 23, 2026 •

edited

Loading

1. v1alpha2 CRD Schema & Conversion (commit: `e7cc5c2`)

2. Validating Webhook (commit: `a45c9f6`)

3. Config Generation Pipeline (commit: `3feb572`)

4. Controller Integration (commit: `bf90527`)

5. OpenShift Webhook Overlay (commit: `e3f405b`)

6. E2E Tests (commit: `c6d24e8`)

7. Documentation & Samples (commit: `3b6972b`)

8. Build Tooling & Release (commit: `b5a2df7`)

eoinfennessy Mar 20, 2026 •

edited

Loading

eoinfennessy commented Mar 23, 2026 •

edited

Loading