Skip to content

feat(api): v1alpha2 CRD types and config generation pipeline#266

Open
rhuss wants to merge 11 commits intollamastack:feature/sdd-v1alpha2from
rhuss:002-v1alpha2-types-and-config-pipeline
Open

feat(api): v1alpha2 CRD types and config generation pipeline#266
rhuss wants to merge 11 commits intollamastack:feature/sdd-v1alpha2from
rhuss:002-v1alpha2-types-and-config-pipeline

Conversation

@rhuss
Copy link
Copy Markdown
Collaborator

@rhuss rhuss commented Mar 11, 2026

Summary

Adds complete v1alpha2 CRD types and the config generation pipeline (Phase 1 + Phase 2 of the operator-generated config spec). Also updates the provider merge strategy from replace_by_api_type to overlay_by_provider_id across spec artifacts (per review feedback from @VaishnaviHire on PR #263).

Spec changes

  • Provider merge strategy changed to overlay_by_provider_id in contracts, spec.md (FR-035a), research.md, REVIEW.md, plan.md, tasks.md
  • Semantics: matching IDs replaced, unmatched user IDs appended, unmatched base IDs preserved

Phase 1: v1alpha2 CRD Types (api/v1alpha2/)

Complete type hierarchy with CEL validation:

  • DistributionSpec, ProvidersSpec, ProviderConfig (typed []ProviderConfig slices)
  • ResourcesSpec, ModelConfig, StorageSpec (kv/sql), NetworkingSpec, WorkloadSpec
  • Status types: ResolvedDistributionStatus, ConfigGenerationStatus, condition constants
  • CEL validation: mutual exclusivity, provider ID enforcement, TLS/storage conditionals, disabled+provider conflict, PDB mutual exclusivity
  • Defense-in-depth validation (from @eoinfennessy review on PR feat(api): v1alpha2 CRD types with typed provider slices (Phase 1) #264): MinItems=1 on all optional slices, MinLength=1 on required strings, CEL non-empty guards on optional strings, reverse storage/TLS validation, port range, autoscaling constraints

Phase 2: Config Generation Engine (pkg/config/)

Pure Go pipeline with no Kubernetes API dependencies:

  • ConfigResolver interface with EmbeddedConfigResolver (go:embed for 4 distributions: starter, remote-vllm, meta-reference-gpu, postgres-demo)
  • Provider expansion: auto-ID, remote:: prefix, endpoint mapping, settings merge, secretRef env var placeholders
  • Resource expansion: model/tool/shield registration with provider fallback to base config
  • Storage config: sqlite/redis/postgres with secretKeyRef env var substitution
  • overlay_by_provider_id merge: matching IDs replaced, new IDs appended, unmatched base IDs preserved
  • Disabled API subtractive removal
  • GenerateConfig orchestrator producing deterministic YAML + SHA256 content hash
  • Secret reference collection from apiKey, secretRefs, and storage fields

Generated artifacts

  • docs/api-overview.md: API reference for v1alpha2 types
  • release/operator.yaml: Updated installer manifest with v1alpha2 CRD
  • config/crd/bases/: Updated CRD with v1alpha2 schema + CEL rules

Tests

53 unit tests covering all pipeline components, all passing:

  • Resolver, provider expansion, resource expansion, secret collection, storage, merge, generator (end-to-end with real embedded configs, determinism, overlay preservation, error cases)

Supersedes

Test plan

  • All 53 pkg/config/ unit tests pass
  • Full project builds (go build ./...)
  • CRD manifests generate successfully (make manifests)
  • Deepcopy generates successfully (make generate)
  • All pre-commit hooks pass (linters, API docs, installer, error messages, DCO)
  • Reviewer validates embedded configs match upstream distribution defaults

Assisted-By: 🤖 Claude Code

@rhuss
Copy link
Copy Markdown
Collaborator Author

rhuss commented Mar 11, 2026

Addressed review feedback from @eoinfennessy on PR #264

The latest commit (8fdf89e) incorporates all review comments from #264:

Empty collection guards (MinItems=1):

  • All provider slices (inference, safety, vectorIo, toolRuntime, telemetry)
  • Resource slices (models, tools, shields)
  • Workload slices (env, command, args, volumes, volumeMounts, topologySpreadConstraints, configMapKeys, namespaces, labels)
  • External providers inference slice
  • MinProperties=1 on secretRefs map

Empty string guards:

  • MinLength=1 on required strings: provider, name, configMapName, providerId, image, SecretKeyRef name/key
  • CEL != '' on optional strings: id, provider (ModelConfig), modelType, quantization, hostname, serviceAccountName, mountPath, configMapNamespace

Tighter storage validation:

  • endpoint and password only valid when KV type is redis
  • connectionString only valid when SQL type is postgres

Tighter TLS validation:

  • secretName only valid when TLS is enabled
  • caBundle only valid when TLS is enabled

Range and constraint validation:

  • Port: Minimum=1, Maximum=65535
  • Replicas: Minimum=0
  • Autoscaling: minReplicas and maxReplicas Minimum=1, percentages 1-100, CEL maxReplicas >= minReplicas
  • PDB: minAvailable and maxUnavailable mutually exclusive, at least one required

ExposeConfig.Enabled (*bool): Kept as pointer. The three-state semantics are intentional: nil = not specified (no Ingress created), false = explicitly disabled, true = create Ingress/Route. Added a comment documenting this.

rhuss added 5 commits March 11, 2026 07:57
Replace the `replace_by_api_type` provider merge strategy with
`overlay_by_provider_id` across all spec artifacts. The previous
strategy would silently drop base providers that users didn't
explicitly re-declare, breaking distributions that ship multiple
default providers per API type.

The overlay strategy uses provider-ID-based matching: matching IDs
are replaced, unmatched user IDs are appended, and unmatched base
IDs are preserved. This keeps distributions intact when users only
need to override a single provider.

Addresses review feedback from @VaishnaviHire on PR llamastack#263 and aligns
with the merge semantics already implemented in PR llamastack#253.

Changes:
- contracts/config-generation.yaml: strategy + examples
- spec.md: add FR-035a for overlay merge semantics
- research.md: update R3 decision and rationale
- REVIEW.md: update Decision 3
- tasks.md: update T018 and T029 descriptions

Assisted-By: 🤖 Claude Code

Signed-off-by: Roland Huß <rhuss@redhat.com>
…ovider_id

Aligns plan.md with the overlay_by_provider_id change made in the
config-generation contract and other spec artifacts.

Assisted-By: 🤖 Claude Code

Signed-off-by: Roland Huß <rhuss@redhat.com>
…validation

Implements Phase 1 (T001-T013) of the operator-generated config spec:

- v1alpha2 package scaffolding (doc.go, groupversion_info.go)
- Complete type hierarchy: DistributionSpec, ProvidersSpec, ProviderConfig,
  ResourcesSpec, ModelConfig, StorageSpec (kv/sql), NetworkingSpec, TLSSpec,
  ExposeConfig, WorkloadSpec, WorkloadOverrides, OverrideConfigSpec
- Status types: ResolvedDistributionStatus, ConfigGenerationStatus,
  condition type/reason constants, phase enum
- CEL validation rules:
  - Mutual exclusivity (overrideConfig vs providers/resources/storage/disabled)
  - Distribution name/image mutual exclusivity
  - Provider ID required when multiple providers per API type
  - TLS secretName required when enabled
  - Redis endpoint required, Postgres connectionString required
  - Disabled + provider conflict detection
- Generated deepcopy and CRD manifests

Assisted-By: 🤖 Claude Code

Signed-off-by: Roland Huß <rhuss@redhat.com>
…e 2)

Implements Phase 2 (T014-T030) of the operator-generated config spec:

Config generation engine (pkg/config/):
- ConfigResolver interface with EmbeddedConfigResolver (go:embed)
- Embedded base configs for all 4 distributions (starter, remote-vllm,
  meta-reference-gpu, postgres-demo)
- Config version detection and validation
- Provider expansion: auto-ID generation, remote:: prefix, endpoint
  mapping, settings merge, secretRef env var placeholders
- Resource expansion: model registration with default/explicit provider
  assignment, tool group and shield registration with provider fallback
- Storage config: KV (sqlite/redis) and SQL (sqlite/postgres) backends
  with secretKeyRef env var substitution
- Merge logic with overlay_by_provider_id strategy: matching IDs
  replaced, unmatched user IDs appended, unmatched base IDs preserved
- Disabled API subtractive removal
- GenerateConfig orchestrator producing deterministic YAML + SHA256 hash
- Secret reference collection from apiKey, secretRefs, and storage fields

Tests (53 tests, all passing):
- EmbeddedConfigResolver: all distributions + unknown error
- Provider expansion: single/multi, auto-ID, settings, apiKey, secretRefs
- Resource expansion: models, tools, shields, provider fallback, errors
- Secret collection: apiKey, secretRefs, storage, env var normalization
- Storage: sqlite/redis/postgres, independent subsection merge
- Merge: overlay match/append/preserve, disabled, base immutability
- Generator: end-to-end with real embedded configs, determinism,
  overlay preservation, error cases

Assisted-By: 🤖 Claude Code

Signed-off-by: Roland Huß <rhuss@redhat.com>
…alidation

Incorporates all review feedback from @eoinfennessy on PR llamastack#264:

- Add MinItems=1 on all optional slices to reject explicitly empty lists
  (provider slices, models, tools, shields, env, command, args, volumes,
  volumeMounts, topologySpreadConstraints, configMapKeys, namespaces, labels)
- Add MinProperties=1 on secretRefs map
- Add MinLength=1 on required string fields (provider, name, configMapName,
  providerId, image, secretKeyRef name/key)
- Add CEL non-empty guards on optional strings (id, provider, modelType,
  quantization, hostname, serviceAccountName, mountPath, configMapNamespace)
- Add reverse validation on storage types: endpoint/password only valid for
  redis, connectionString only valid for postgres
- Add TLS reverse validation: secretName and caBundle only valid when enabled
- Add port range validation (1-65535)
- Add replicas Minimum=0
- Add autoscaling: minReplicas/maxReplicas Minimum=1, percentage range 1-100,
  CEL rule maxReplicas >= minReplicas
- Add PDB mutual exclusivity: minAvailable and maxUnavailable cannot both be
  set, at least one required
- Document *bool semantics on ExposeConfig.Enabled (nil vs false vs true)

Assisted-By: 🤖 Claude Code

Signed-off-by: Roland Huß <rhuss@redhat.com>
@rhuss rhuss force-pushed the 002-v1alpha2-types-and-config-pipeline branch from 4a77086 to 62acaf5 Compare March 11, 2026 06:57
rhuss added 2 commits March 11, 2026 08:06
Fixes all pre-commit lint failures:

- cyclop: split findProviderForAPI into smaller functions
- errchkjson: check json.Marshal error in test helper
- forcetypeassert: use checked type assertions via test helpers
- goconst: extract storageTypeSQLite constant
- godot: add periods to comment blocks
- govet: fix shadowed err variable in generator.go
- lll: add nolint:lll for long kubebuilder CEL markers
- perfsprint: use errors.New and hex.EncodeToString
- testifylint: use assert.Positive and assert.InDelta

Assisted-By: 🤖 Claude Code
Signed-off-by: Roland Huß <rhuss@redhat.com>
- Prefix all error messages with "failed to" per project convention
- Regenerate API docs (docs/api-overview.md) for v1alpha2 types
- Regenerate installer manifest (release/operator.yaml)

Assisted-By: 🤖 Claude Code
Signed-off-by: Roland Huß <rhuss@redhat.com>
Copy link
Copy Markdown
Collaborator

@eoinfennessy eoinfennessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments on the config gen pipeline, and a few more validation suggestions that became apparent while reviewing it.

spec *v1alpha2.LlamaStackDistributionSpec,
) (map[string]interface{}, error) {
// Start with a copy of the base config
merged := shallowCopyMap(baseConfig)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shallowCopyMap only copies top-level keys. Nested maps (like providers, apis) are still shared references. overlayProviders mutates the providersMap in-place. This means the baseConfig's providers map is mutated.

The test TestMergeConfig_DoesNotMutateBase passes only because applyDisabled replaces config["apis"] with a new slice rather than mutating it in-place.

We should consider a recursive deep copy.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Replaced shallowCopyMap with deepCopyMap that uses a JSON round-trip to ensure nested maps and slices are fully independent copies. Also strengthened TestMergeConfig_DoesNotMutateBase to verify that base providers aren't mutated either (not just apis).

Comment on lines +98 to +106
func resolveProviderID(p v1alpha2.ProviderConfig, isSingle bool) string {
if p.ID != "" {
return p.ID
}
if isSingle {
return p.Provider
}
return p.Provider
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSingle is effectively unused because p.Provider is returned either way.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, dead logic. Removed the isSingle parameter entirely. Also extracted the ID resolution into a shared ResolveProviderID helper in provider.go (addresses comment #11 about duplication too).

Comment on lines +128 to +148
func countMergedProviders(config map[string]interface{}) int {
providersMap, ok := config["providers"].(map[string]interface{})
if !ok {
return 0
}

count := 0
// Sort keys for determinism
keys := make([]string, 0, len(providersMap))
for k := range providersMap {
keys = append(keys, k)
}
sort.Strings(keys)

for _, k := range keys {
if providers, ok := providersMap[k].([]interface{}); ok {
count += len(providers)
}
}
return count
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting is not needed, and neither is creating a keys slice

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the keys slice and sorting. The function just sums counts, so iteration order is irrelevant.

Comment on lines +93 to +96
if model.Provider != "" {
// Explicit provider assignment (FR-041)
providerID = model.Provider
providerType = "remote::" + providerID
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user sets provider: "my-vllm-provider" (a custom ID, not a provider name), this produces provider_type: "remote::my-vllm-provider". Is this the expected behavior? Should the provider type come from the actual provider's Provider field instead of its ID?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this was a bug. When model.Provider is a custom ID like vllm-primary, the code was producing provider_type: remote::vllm-primary instead of remote::vllm. Added lookupProviderType() that searches user providers and base config to resolve the actual provider name. Test updated to verify provider_type is correct.

id = p.Provider
}

providerType := "remote::" + p.Provider
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All providers will have the remote:: prefix. Are inline:: providers not possible to configure with the CRD? If this is intentional we should ensure it is documented.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional. The CRD only supports configuring remote providers. Inline providers from the base config are preserved during merge but can't be added or configured through the CRD. Added documentation on ExpandProviders and an inline comment at the remote:: prefix line to make this explicit.

Comment on lines +85 to +95
// CountProviders counts total providers across all API types in the spec.
func CountProviders(providers *v1alpha2.ProvidersSpec) int {
if providers == nil {
return 0
}
return len(providers.Inference) +
len(providers.Safety) +
len(providers.VectorIo) +
len(providers.ToolRuntime) +
len(providers.Telemetry)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unused?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, CountProviders was only used in its own tests, not in production code. Removed it along with the test.

Comment on lines +103 to +126
// AllAPIProviders returns all configured API types and their providers in deterministic order.
func AllAPIProviders(providers *v1alpha2.ProvidersSpec) []apiProviders {
if providers == nil {
return nil
}

var result []apiProviders
if len(providers.Inference) > 0 {
result = append(result, apiProviders{APIType: "inference", Providers: providers.Inference})
}
if len(providers.Safety) > 0 {
result = append(result, apiProviders{APIType: "safety", Providers: providers.Safety})
}
if len(providers.VectorIo) > 0 {
result = append(result, apiProviders{APIType: "vector_io", Providers: providers.VectorIo})
}
if len(providers.ToolRuntime) > 0 {
result = append(result, apiProviders{APIType: "tool_runtime", Providers: providers.ToolRuntime})
}
if len(providers.Telemetry) > 0 {
result = append(result, apiProviders{APIType: "telemetry", Providers: providers.Telemetry})
}
return result
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include agents, datasetio, eval, scoring?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These API types (agents, datasetio, eval, scoring) don't have corresponding fields in ProvidersSpec, so they can't be configured through the CRD. This is intentional since these are typically handled by inline providers from the base config. The AllAPIProviders function only returns API types that have CRD fields. This ties into the remote:: only design discussed above.

Comment on lines +130 to +151
disabledSet := make(map[string]bool, len(disabled))
for _, d := range disabled {
disabledSet[d] = true
}

// Remove from apis list
if apis, ok := config["apis"].([]interface{}); ok {
filtered := make([]interface{}, 0, len(apis))
for _, api := range apis {
if apiStr, ok := api.(string); ok && !disabledSet[apiStr] {
filtered = append(filtered, api)
}
}
config["apis"] = filtered
}

// Remove from providers map
if providersMap, ok := config["providers"].(map[string]interface{}); ok {
for _, d := range disabled {
delete(providersMap, d)
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling an API removes it from apis and providers, but doesn't clean up any registered_resources entries that reference providers for that API type. For example, disabling safety leaves any shield registrations pointing to a now-removed provider. This could cause Llama Stack to fail at startup with a confusing error about a missing provider rather than a clear "disabled" message.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this was a real bug. applyDisabled now collects the provider IDs being removed before deleting them from the map, then filters registered_resources to remove entries whose provider_id matches. Added TestApplyDisabled_CleansUpRegisteredResources to verify.

Comment on lines +137 to +148
// Check user providers first
if id, pt, ok := findUserProvider(apiType, userProviders); ok {
return id, pt, nil
}

// Fall back to base config
if id, pt, ok := findBaseConfigProvider(apiType, baseConfig); ok {
return id, pt, nil
}

return "", "", fmt.Errorf("failed to find %s provider", apiType)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's duplication of provider ID resolution logic between here and the ExpandProviders function in provider.go.

We should consider creating a shared helper function to ensure logic doesn't diverge.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Extracted a shared ResolveProviderID(p ProviderConfig) string helper in provider.go. Both findUserProvider and CollectSecretRefs now use it, eliminating the duplication risk.

Comment on lines +110 to +124
var existing []interface{}
if rr, ok := config["registered_resources"].([]interface{}); ok {
existing = rr
}

for _, r := range resources {
existing = append(existing, map[string]interface{}{
"resource_type": r.ResourceType,
"provider": map[string]interface{}{
"provider_id": r.Provider.ProviderID,
"provider_type": r.Provider.ProviderType,
},
"params": r.Params,
})
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the base config already registers a model (say llama3.2-8b) and the user's CRD also specifies llama3.2-8b in resources.models, both entries end up in the final registered_resources list. Llama Stack would see two registrations for the same model ID.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. appendResources now builds a set of existing resource keys (resource_type/id) and skips duplicates. Added TestGenerateConfig_DeduplicatesResources to verify that a model already in the base config isn't added again.

rhuss added 3 commits March 21, 2026 10:26
- Replace shallowCopyMap with deepCopyMap (JSON round-trip) to prevent
  base config mutation through shared nested references
- Fix wrong provider_type when model.Provider is a custom ID by adding
  lookupProviderType() to resolve actual provider names
- Clean up registered_resources when disabling APIs to prevent orphaned
  entries that could cause startup failures
- Deduplicate registered_resources in appendResources to avoid duplicate
  model/shield/tool registrations from base config overlap
- Extract shared ResolveProviderID helper to eliminate duplicated
  provider ID resolution logic between resource.go and secret.go
- Remove dead isSingle parameter from resolveProviderID
- Remove unnecessary sorting in countMergedProviders
- Remove unused CountProviders function and its tests
- Add CEL validation for disabled API names (enum of 9 valid values)
- Add CEL uniqueness rule for provider IDs in all provider slices
- Document that CRD only supports remote:: providers (inline:: preserved
  from base config but not configurable via CRD)

Assisted-By: 🤖 Claude Code
Signed-off-by: Roland Huß <rhuss@redhat.com>
Regenerated by pre-commit hooks after adding CEL validation rules
for disabled API enum and provider ID uniqueness.

Assisted-By: 🤖 Claude Code
Signed-off-by: Roland Huß <rhuss@redhat.com>
Remove spaces in CEL expression and shorten message to stay under
the 180-character line length limit enforced by golangci-lint.

Assisted-By: 🤖 Claude Code
Signed-off-by: Roland Huß <rhuss@redhat.com>
@rhuss
Copy link
Copy Markdown
Collaborator Author

rhuss commented Mar 21, 2026

Hey @eoinfennessy , thanks a ton for review. All your comments are very legit and highly appreciated. I also used those to harden the quality gates of the plugin to hopefully reduce these kind of (confessedly partly really stupid) errors. btw, the responses of your comments were auto generated (you guessed that anyway, I guess :)

Sorry for the delay, but I'm in the middle of two very distracting weeks (PTO and KubeCon :-), but I'm still dedicated to get this PR through and learn more how to improve the workflow. Especially with the REVIEWERS.md I'm not happy it's not really useful as guidance for human reviewers. I will work on that. In the meantime, the cc-sdd plugin evolved with some new trais (and removing beads because it doesn't really add any extra value to that plugin, also beads has introduced more hefty dependencies like dolt)

Thanks again for your support!

Assisted-By: 🤖 Claude Code
Signed-off-by: Roland Huß <rhuss@redhat.com>
@rhuss rhuss force-pushed the 002-v1alpha2-types-and-config-pipeline branch from 6856c54 to e80ebf0 Compare March 21, 2026 11:28
@eoinfennessy
Copy link
Copy Markdown
Collaborator

@rhuss thanks for the review! Having acted as HITL for this PR and #253 , there's two major things I'd like to see introduced in future SDD workflows:

  1. Agentic review loop: Before seeking review from a human, I think it would be helpful to have rounds of reviews with agents. My agent-assisted reviews on the PRs picked up many, many apparent issues, both small and critical. I think we could introduce an agentic review loop to identify and fix many of these issues before having humans take time to review the code.

    My idea would be to have many different sub-agents review the code, each with a particular area of review focus (e.g. security, maintainability, UX, testing, etc. etc.). They might compile a doc of actionable review comments / suggestions. Implementation agent(s) can then work on addressing each item in the review doc. When implementation is finished, another round of reviews begins to verify the changes have been implemented and identify more issues. We could have a gating mechanism to stop this loop once all the review comments are minor nits, and / or we could have a set maximum number of review loops.

  2. Smaller PRs: as a reviewer, I definitely prefer the phased approach. The review process is less overwhelming, and I am more confident that I have thoroughly reviewed everything to my standards. I think the size of this PR I am commenting on is probably the limit of what we should ask humans to review. PR feat: Introduce v1alpha2 version of LlamaStackDistribution CRD #253 (with over 10k lines of non-generated changes) is far too big for most humans.

@eoinfennessy
Copy link
Copy Markdown
Collaborator

P.S. I will probably focus review efforts on PR #253 in the coming days, but I definitely intend to come back to this soon. Have fun at KubeCon :)

@rhuss
Copy link
Copy Markdown
Collaborator Author

rhuss commented Mar 28, 2026

Thanks @eoinfennessy , that feedback is very helpful and helps to improve the methodology:

My main take aways:

  • More thorough reviews of the generated code, ideally coming from different angles and maybe different agents. I think tools like copilot or coderabbit could be a great help here
  • Muliple smaller PRs than one single monster PR for the implementation. The flow looks then like (initial spec + plan, + review guide) -> (phase 1 implementation) -> ... -> (phase N implementation)`

No worries about the reviews, I will merge that one and work on the next phase, with some better code review process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants