HYPERFLEET-896 - doc: CRD-Driven Generic Resource API Design#122
HYPERFLEET-896 - doc: CRD-Driven Generic Resource API Design#122rh-amarin wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughA new design document, "Generic Resource Registry," was added describing a descriptor-driven, contract-first API that consolidates per-entity endpoints into a generic Sequence Diagram(s)sequenceDiagram
participant Client
participant Router
participant ResourceHandler
participant ResourceService
participant ResourceDao
participant DB
Client->>Router: HTTP request (e.g., POST /clusters)
Router->>ResourceHandler: route to handler (based on descriptor)
ResourceHandler->>ResourceService: validate, enforce descriptor rules
ResourceService->>ResourceDao: create/read/update/delete by type
ResourceDao->>DB: persist/query from resources + assoc tables
DB-->>ResourceDao: result
ResourceDao-->>ResourceService: result
ResourceService-->>ResourceHandler: response payload
ResourceHandler-->>Router: HTTP response
Router-->>Client: HTTP response
sequenceDiagram
participant Startup
participant DescriptorLoader
participant RouteRegistrar
participant Router
Startup->>DescriptorLoader: load EntityDescriptors
DescriptorLoader-->>RouteRegistrar: descriptors
RouteRegistrar->>Router: register top-level and nested routes (CRUD, /statuses)
Router-->>Startup: routes ready
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
99bdd5a to
c4aa85d
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
hyperfleet/docs/generic-resource-registry-design.md (1)
703-706: Add error handling for JSON unmarshaling in PresentResource.The code silently ignores the error from
json.Unmarshal(r.Spec, &spec). While theSpecfield is stored as GORMdatatypes.JSONand should contain valid JSON, defensive error handling would improve robustness and make debugging easier if data corruption occurs.Consider logging the error or returning a fallback value:
var spec map[string]interface{} if err := json.Unmarshal(r.Spec, &spec); err != nil { log.Warnf("failed to unmarshal spec for resource %s: %v", r.ID, err) spec = make(map[string]interface{}) // or return error to caller }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/docs/generic-resource-registry-design.md` around lines 703 - 706, In PresentResource, handle the error returned by json.Unmarshal(r.Spec, &spec) instead of ignoring it: after calling json.Unmarshal on r.Spec, check the error and either log a warning/error that includes the resource identifier (e.g., r.ID) and the error details, then set spec = make(map[string]interface{}) as a safe fallback, or propagate/return the error to the caller; update the PresentResource function to perform this check and logging around json.Unmarshal to make failures observable and prevent nil/spec panics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hyperfleet/docs/generic-resource-registry-design.md`:
- Line 9: Change the misspelled word "altternatives" to "alternatives" in the
introduction of the Generic Resource Registry proposal (look for the line
containing "This document contains the proposal for the Generic Resource
Registry and altternatives" in the generic-resource-registry-design.md file and
correct the typo).
- Around line 1309-1396: The doc exposes a security risk: authorization is
opt-in via EntityDescriptor.Authz (EntityAuthzConfig) so new entities default to
no checks; update the design by (1) adding a note in §11 documenting
default-allow risk and audit burden, (2) specifying a registry validation step
that inspects registrations (registry.Register/EntityDescriptor) and emits a
warning or fails if Authz == nil for critical types, and (3) defining a
configuration toggle (e.g., EnforceAuthForAllEntities) that, when enabled, makes
the registry validation error instead of warn and requires OperationPermissions
be present (ResourceCheck optional).
- Around line 406-438: The Replace method in sqlResourceDao reads
existing.Generation then writes back without checking the row hasn't changed,
allowing lost increments under concurrent Replace calls; modify the Save step in
Replace to perform an optimistic-locking conditional update by adding a
Where("generation = ?", existing.Generation) before
Omit("Labels","Conditions").Save(resource) and detect a failed update (e.g.,
gorm.ErrRecordNotFound or RowsAffected == 0) to mark the transaction for
rollback and return a conflict error (e.g., errors.Conflict("resource was
modified concurrently")), keeping the existing db.MarkForRollback(ctx, err)
behavior for other errors.
- Around line 1240-1258: Add three new risk subsections after "R2 — Loss of
domain modeling flexibility": create "R3 — Schema evolution and backward
compatibility" documenting how to add/remove EntityDescriptor fields,
deprecation/versioning strategy, and schema migration guidance; "R4 — Security
considerations" listing checks for SpecSchemaName existence, label validation,
authorization gaps, and audit logging requirements; and "R5 — Debugging and
observability" covering entity-specific error messages, metrics/logging tagging,
and tracing. For each new risk include a short remediation paragraph (e.g.,
validation checks, descriptor versioning, RBAC/audit policy, and observability
tags) mirroring the style of the existing "R1" and "R2" sections so they
integrate consistently into the Risks section.
- Around line 572-574: registry.Validate() currently only checks ParentType
references but must also ensure descriptor.Plural values are unique to prevent
ambiguous routing in RegisterEntityRoutes (which uses descriptor.Plural as the
path segment). Update Validate() to iterate over All() descriptors, track seen
plurals in a map[string]string, and panic with a descriptive message including
both descriptor.Type values when a duplicate Plural is found (e.g., "duplicate
Plural %q: used by both %q and %q"). Ensure the new check runs alongside the
existing ParentType validation.
- Around line 918-940: The Delete method ignores errors from s.dao.CountByOwner
and s.dao.FindByTypeAndOwner which can hide DB failures and cause incorrect
delete behavior; update sqlResourceService.Delete to capture the returned error
from CountByOwner and if non-nil return it (wrap via handleDeleteError or
convert to *errors.ServiceError) instead of assuming count==0, and likewise
capture and return any error from FindByTypeAndOwner before iterating children
so cascade deletion only runs when the query succeeded; keep the existing call
to handleDeleteError(s.dao.Delete(...)) for the final delete.
- Around line 607-641: Update the design text to explicitly state
owner-reference immutability and how Replace behaves: specify that OwnerID,
OwnerType, and OwnerHref are immutable once created and that resources.href is
computed once from id, Plural and the resolved parent chain and will not change;
clarify that ResourcePatchRequest intentionally excludes owner fields and that
Replace cannot modify owner references (it updates other mutable resource fields
but must reject any incoming OwnerID/OwnerType/OwnerHref changes), and note that
the service will validate and return an error if an update attempts to change
owner fields to preserve href correctness and enforce the constraint described
in §10.6.
---
Nitpick comments:
In `@hyperfleet/docs/generic-resource-registry-design.md`:
- Around line 703-706: In PresentResource, handle the error returned by
json.Unmarshal(r.Spec, &spec) instead of ignoring it: after calling
json.Unmarshal on r.Spec, check the error and either log a warning/error that
includes the resource identifier (e.g., r.ID) and the error details, then set
spec = make(map[string]interface{}) as a safe fallback, or propagate/return the
error to the caller; update the PresentResource function to perform this check
and logging around json.Unmarshal to make failures observable and prevent
nil/spec panics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6e1cf04f-6d26-4d8a-8415-3a107600bc1c
📒 Files selected for processing (1)
hyperfleet/docs/generic-resource-registry-design.md
| // Validate checks all ParentType references resolve. Called at startup before | ||
| // the server accepts requests. Panics with a descriptive message on failure. | ||
| func Validate() |
There was a problem hiding this comment.
Add Plural uniqueness validation to registry.Validate().
The RegisterEntityRoutes function (lines 840-876) registers routes using descriptor.Plural as the path segment. If two descriptors have the same Plural value, route registration would succeed but behavior would be unpredictable (gorilla/mux would match one arbitrarily).
The registry.Validate() function should verify that all registered descriptors have unique Plural values:
func Validate() {
// existing ParentType validation...
plurals := make(map[string]string)
for _, d := range All() {
if existing, ok := plurals[d.Plural]; ok {
panic(fmt.Sprintf("duplicate Plural %q: used by both %q and %q",
d.Plural, existing, d.Type))
}
plurals[d.Plural] = d.Type
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hyperfleet/docs/generic-resource-registry-design.md` around lines 572 - 574,
registry.Validate() currently only checks ParentType references but must also
ensure descriptor.Plural values are unique to prevent ambiguous routing in
RegisterEntityRoutes (which uses descriptor.Plural as the path segment). Update
Validate() to iterate over All() descriptors, track seen plurals in a
map[string]string, and panic with a descriptive message including both
descriptor.Type values when a duplicate Plural is found (e.g., "duplicate Plural
%q: used by both %q and %q"). Ensure the new check runs alongside the existing
ParentType validation.
There was a problem hiding this comment.
this will be left to the implementation
There was a problem hiding this comment.
@rh-amarin, understood — that makes sense to handle during the implementation phase rather than the design doc.
| func (s *sqlResourceService) Delete(ctx context.Context, resourceType, id string) *errors.ServiceError { | ||
| for _, child := range registry.ChildrenOf(resourceType) { | ||
| count, _ := s.dao.CountByOwner(ctx, child.Type, id) | ||
| if count == 0 { | ||
| continue | ||
| } | ||
| switch child.OnParentDelete { | ||
| case registry.OnParentDeleteCascade: | ||
| children, _ := s.dao.FindByTypeAndOwner(ctx, child.Type, id) | ||
| for _, c := range children { | ||
| if err := s.Delete(ctx, c.Type, c.ID); err != nil { | ||
| return err | ||
| } | ||
| } | ||
| default: // OnParentDeleteRestrict | ||
| return errors.Conflict("HYPERFLEET-CNF-001", | ||
| "cannot delete %s %q: has %d active %s(s)", | ||
| resourceType, id, count, child.Type) | ||
| } | ||
| } | ||
| return handleDeleteError(resourceType, s.dao.Delete(ctx, resourceType, id)) | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Fix error handling in the Delete implementation.
The implementation silently ignores errors from CountByOwner (line 920) and FindByTypeAndOwner (line 926). If these queries fail due to database errors, the function would proceed with incorrect assumptions:
- If
CountByOwnerfails,countis 0, and the function skips processing children that actually exist. - If
FindByTypeAndOwnerfails,childrenis empty, and the cascade loop doesn't execute, leaving orphaned children.
Both scenarios violate the intended delete semantics and could lead to data integrity issues.
🐛 Proposed fix to handle errors properly
func (s *sqlResourceService) Delete(ctx context.Context, resourceType, id string) *errors.ServiceError {
for _, child := range registry.ChildrenOf(resourceType) {
- count, _ := s.dao.CountByOwner(ctx, child.Type, id)
+ count, err := s.dao.CountByOwner(ctx, child.Type, id)
+ if err != nil {
+ return errors.GeneralError("failed to count child %s: %v", child.Type, err)
+ }
if count == 0 {
continue
}
switch child.OnParentDelete {
case registry.OnParentDeleteCascade:
- children, _ := s.dao.FindByTypeAndOwner(ctx, child.Type, id)
+ children, err := s.dao.FindByTypeAndOwner(ctx, child.Type, id)
+ if err != nil {
+ return errors.GeneralError("failed to find child %s: %v", child.Type, err)
+ }
for _, c := range children {
if err := s.Delete(ctx, c.Type, c.ID); err != nil {
return err
}
}
default: // OnParentDeleteRestrict
return errors.Conflict("HYPERFLEET-CNF-001",
"cannot delete %s %q: has %d active %s(s)",
resourceType, id, count, child.Type)
}
}
return handleDeleteError(resourceType, s.dao.Delete(ctx, resourceType, id))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (s *sqlResourceService) Delete(ctx context.Context, resourceType, id string) *errors.ServiceError { | |
| for _, child := range registry.ChildrenOf(resourceType) { | |
| count, _ := s.dao.CountByOwner(ctx, child.Type, id) | |
| if count == 0 { | |
| continue | |
| } | |
| switch child.OnParentDelete { | |
| case registry.OnParentDeleteCascade: | |
| children, _ := s.dao.FindByTypeAndOwner(ctx, child.Type, id) | |
| for _, c := range children { | |
| if err := s.Delete(ctx, c.Type, c.ID); err != nil { | |
| return err | |
| } | |
| } | |
| default: // OnParentDeleteRestrict | |
| return errors.Conflict("HYPERFLEET-CNF-001", | |
| "cannot delete %s %q: has %d active %s(s)", | |
| resourceType, id, count, child.Type) | |
| } | |
| } | |
| return handleDeleteError(resourceType, s.dao.Delete(ctx, resourceType, id)) | |
| } | |
| ``` | |
| func (s *sqlResourceService) Delete(ctx context.Context, resourceType, id string) *errors.ServiceError { | |
| for _, child := range registry.ChildrenOf(resourceType) { | |
| count, err := s.dao.CountByOwner(ctx, child.Type, id) | |
| if err != nil { | |
| return errors.GeneralError("failed to count child %s: %v", child.Type, err) | |
| } | |
| if count == 0 { | |
| continue | |
| } | |
| switch child.OnParentDelete { | |
| case registry.OnParentDeleteCascade: | |
| children, err := s.dao.FindByTypeAndOwner(ctx, child.Type, id) | |
| if err != nil { | |
| return errors.GeneralError("failed to find child %s: %v", child.Type, err) | |
| } | |
| for _, c := range children { | |
| if err := s.Delete(ctx, c.Type, c.ID); err != nil { | |
| return err | |
| } | |
| } | |
| default: // OnParentDeleteRestrict | |
| return errors.Conflict("HYPERFLEET-CNF-001", | |
| "cannot delete %s %q: has %d active %s(s)", | |
| resourceType, id, count, child.Type) | |
| } | |
| } | |
| return handleDeleteError(resourceType, s.dao.Delete(ctx, resourceType, id)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hyperfleet/docs/generic-resource-registry-design.md` around lines 918 - 940,
The Delete method ignores errors from s.dao.CountByOwner and
s.dao.FindByTypeAndOwner which can hide DB failures and cause incorrect delete
behavior; update sqlResourceService.Delete to capture the returned error from
CountByOwner and if non-nil return it (wrap via handleDeleteError or convert to
*errors.ServiceError) instead of assuming count==0, and likewise capture and
return any error from FindByTypeAndOwner before iterating children so cascade
deletion only runs when the query succeeded; keep the existing call to
handleDeleteError(s.dao.Delete(...)) for the final delete.
There was a problem hiding this comment.
left for the real implementation
There was a problem hiding this comment.
@rh-amarin, understood — the code snippets in this design doc are illustrative, so deferring proper error handling to the actual implementation makes sense. Just make sure the fix is applied when the Delete service method gets written.
c4aa85d to
5e30455
Compare
Summary
hyperfleet/docs/generic-resource-registry-design.md)Resourcetype + declarativeEntityDescriptorregistryresourcestable), spec validation via JSON Schema, authorization config, delete semantics, and trade-offsContext
HYPERFLEET-896. Currently adding a new entity type (e.g. NodePool) requires duplicated handler, DAO, and route code plus a coordinated deploy. This design eliminates that by letting teams register new entities via a descriptor at startup — routes, validation, status aggregation, and delete behavior are derived automatically.
Summary by CodeRabbit