Skip to content

HYPERFLEET-896 - doc: CRD-Driven Generic Resource API Design#122

Open
rh-amarin wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
rh-amarin:HYPERFLEET-896
Open

HYPERFLEET-896 - doc: CRD-Driven Generic Resource API Design#122
rh-amarin wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
rh-amarin:HYPERFLEET-896

Conversation

@rh-amarin
Copy link
Copy Markdown
Collaborator

@rh-amarin rh-amarin commented Apr 9, 2026

Summary

  • Adds design document for the CRD-Driven Generic Resource API (hyperfleet/docs/generic-resource-registry-design.md)
  • Proposes replacing per-entity handler code with a single Resource type + declarative EntityDescriptor registry
  • Covers OpenAPI contract, data layer (GORM + single resources table), spec validation via JSON Schema, authorization config, delete semantics, and trade-offs

Context

HYPERFLEET-896. Currently adding a new entity type (e.g. NodePool) requires duplicated handler, DAO, and route code plus a coordinated deploy. This design eliminates that by letting teams register new entities via a descriptor at startup — routes, validation, status aggregation, and delete behavior are derived automatically.

Summary by CodeRabbit

  • Documentation
    • Added a draft architecture design for a Generic Resource Registry: contract-first OpenAPI with a generic resource schema and type discrimination; descriptor-driven top-level and nested routing with CRUD and status endpoints; unified data-layer approach using a type discriminator; consolidated handler/route registration patterns; documented delete semantics, constraints, alternatives and performance considerations.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 9, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign pnguyen44 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 57c228dd-b359-403f-bdfe-a9ff2afd1b03

📥 Commits

Reviewing files that changed from the base of the PR and between c4aa85d and 5e30455.

📒 Files selected for processing (1)
  • hyperfleet/docs/generic-resource-registry-design.md
✅ Files skipped from review due to trivial changes (1)
  • hyperfleet/docs/generic-resource-registry-design.md

Walkthrough

A new design document, "Generic Resource Registry," was added describing a descriptor-driven, contract-first API that consolidates per-entity endpoints into a generic Resource model (typed via a type discriminator with untyped spec at the API layer and separate OpenAPI component schemas for validation). It defines autogenerated top-level and nested routes from EntityDescriptor configs, a single resources data table with type discriminator plus association tables for labels and status conditions, consolidated handlers/services/DAOs, and descriptor-driven delete semantics (restrict vs cascade) with recursion for owned resources.

Sequence Diagram(s)

sequenceDiagram
participant Client
participant Router
participant ResourceHandler
participant ResourceService
participant ResourceDao
participant DB
Client->>Router: HTTP request (e.g., POST /clusters)
Router->>ResourceHandler: route to handler (based on descriptor)
ResourceHandler->>ResourceService: validate, enforce descriptor rules
ResourceService->>ResourceDao: create/read/update/delete by type
ResourceDao->>DB: persist/query from resources + assoc tables
DB-->>ResourceDao: result
ResourceDao-->>ResourceService: result
ResourceService-->>ResourceHandler: response payload
ResourceHandler-->>Router: HTTP response
Router-->>Client: HTTP response
Loading
sequenceDiagram
participant Startup
participant DescriptorLoader
participant RouteRegistrar
participant Router
Startup->>DescriptorLoader: load EntityDescriptors
DescriptorLoader-->>RouteRegistrar: descriptors
RouteRegistrar->>Router: register top-level and nested routes (CRUD, /statuses)
Router-->>Startup: routes ready
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: adding a design document for a CRD-Driven Generic Resource API. It is concise and directly summarizes the changeset's primary objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@rh-amarin rh-amarin force-pushed the HYPERFLEET-896 branch 2 times, most recently from 99bdd5a to c4aa85d Compare April 10, 2026 06:22
@rh-amarin rh-amarin marked this pull request as ready for review April 10, 2026 06:23
@openshift-ci openshift-ci bot requested review from aredenba-rh and pnguyen44 April 10, 2026 06:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
hyperfleet/docs/generic-resource-registry-design.md (1)

703-706: Add error handling for JSON unmarshaling in PresentResource.

The code silently ignores the error from json.Unmarshal(r.Spec, &spec). While the Spec field is stored as GORM datatypes.JSON and should contain valid JSON, defensive error handling would improve robustness and make debugging easier if data corruption occurs.

Consider logging the error or returning a fallback value:

var spec map[string]interface{}
if err := json.Unmarshal(r.Spec, &spec); err != nil {
    log.Warnf("failed to unmarshal spec for resource %s: %v", r.ID, err)
    spec = make(map[string]interface{}) // or return error to caller
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/docs/generic-resource-registry-design.md` around lines 703 - 706,
In PresentResource, handle the error returned by json.Unmarshal(r.Spec, &spec)
instead of ignoring it: after calling json.Unmarshal on r.Spec, check the error
and either log a warning/error that includes the resource identifier (e.g.,
r.ID) and the error details, then set spec = make(map[string]interface{}) as a
safe fallback, or propagate/return the error to the caller; update the
PresentResource function to perform this check and logging around json.Unmarshal
to make failures observable and prevent nil/spec panics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hyperfleet/docs/generic-resource-registry-design.md`:
- Line 9: Change the misspelled word "altternatives" to "alternatives" in the
introduction of the Generic Resource Registry proposal (look for the line
containing "This document contains the proposal for the Generic Resource
Registry and altternatives" in the generic-resource-registry-design.md file and
correct the typo).
- Around line 1309-1396: The doc exposes a security risk: authorization is
opt-in via EntityDescriptor.Authz (EntityAuthzConfig) so new entities default to
no checks; update the design by (1) adding a note in §11 documenting
default-allow risk and audit burden, (2) specifying a registry validation step
that inspects registrations (registry.Register/EntityDescriptor) and emits a
warning or fails if Authz == nil for critical types, and (3) defining a
configuration toggle (e.g., EnforceAuthForAllEntities) that, when enabled, makes
the registry validation error instead of warn and requires OperationPermissions
be present (ResourceCheck optional).
- Around line 406-438: The Replace method in sqlResourceDao reads
existing.Generation then writes back without checking the row hasn't changed,
allowing lost increments under concurrent Replace calls; modify the Save step in
Replace to perform an optimistic-locking conditional update by adding a
Where("generation = ?", existing.Generation) before
Omit("Labels","Conditions").Save(resource) and detect a failed update (e.g.,
gorm.ErrRecordNotFound or RowsAffected == 0) to mark the transaction for
rollback and return a conflict error (e.g., errors.Conflict("resource was
modified concurrently")), keeping the existing db.MarkForRollback(ctx, err)
behavior for other errors.
- Around line 1240-1258: Add three new risk subsections after "R2 — Loss of
domain modeling flexibility": create "R3 — Schema evolution and backward
compatibility" documenting how to add/remove EntityDescriptor fields,
deprecation/versioning strategy, and schema migration guidance; "R4 — Security
considerations" listing checks for SpecSchemaName existence, label validation,
authorization gaps, and audit logging requirements; and "R5 — Debugging and
observability" covering entity-specific error messages, metrics/logging tagging,
and tracing. For each new risk include a short remediation paragraph (e.g.,
validation checks, descriptor versioning, RBAC/audit policy, and observability
tags) mirroring the style of the existing "R1" and "R2" sections so they
integrate consistently into the Risks section.
- Around line 572-574: registry.Validate() currently only checks ParentType
references but must also ensure descriptor.Plural values are unique to prevent
ambiguous routing in RegisterEntityRoutes (which uses descriptor.Plural as the
path segment). Update Validate() to iterate over All() descriptors, track seen
plurals in a map[string]string, and panic with a descriptive message including
both descriptor.Type values when a duplicate Plural is found (e.g., "duplicate
Plural %q: used by both %q and %q"). Ensure the new check runs alongside the
existing ParentType validation.
- Around line 918-940: The Delete method ignores errors from s.dao.CountByOwner
and s.dao.FindByTypeAndOwner which can hide DB failures and cause incorrect
delete behavior; update sqlResourceService.Delete to capture the returned error
from CountByOwner and if non-nil return it (wrap via handleDeleteError or
convert to *errors.ServiceError) instead of assuming count==0, and likewise
capture and return any error from FindByTypeAndOwner before iterating children
so cascade deletion only runs when the query succeeded; keep the existing call
to handleDeleteError(s.dao.Delete(...)) for the final delete.
- Around line 607-641: Update the design text to explicitly state
owner-reference immutability and how Replace behaves: specify that OwnerID,
OwnerType, and OwnerHref are immutable once created and that resources.href is
computed once from id, Plural and the resolved parent chain and will not change;
clarify that ResourcePatchRequest intentionally excludes owner fields and that
Replace cannot modify owner references (it updates other mutable resource fields
but must reject any incoming OwnerID/OwnerType/OwnerHref changes), and note that
the service will validate and return an error if an update attempts to change
owner fields to preserve href correctness and enforce the constraint described
in §10.6.

---

Nitpick comments:
In `@hyperfleet/docs/generic-resource-registry-design.md`:
- Around line 703-706: In PresentResource, handle the error returned by
json.Unmarshal(r.Spec, &spec) instead of ignoring it: after calling
json.Unmarshal on r.Spec, check the error and either log a warning/error that
includes the resource identifier (e.g., r.ID) and the error details, then set
spec = make(map[string]interface{}) as a safe fallback, or propagate/return the
error to the caller; update the PresentResource function to perform this check
and logging around json.Unmarshal to make failures observable and prevent
nil/spec panics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6e1cf04f-6d26-4d8a-8415-3a107600bc1c

📥 Commits

Reviewing files that changed from the base of the PR and between 967e5eb and c4aa85d.

📒 Files selected for processing (1)
  • hyperfleet/docs/generic-resource-registry-design.md

Comment on lines +572 to +574
// Validate checks all ParentType references resolve. Called at startup before
// the server accepts requests. Panics with a descriptive message on failure.
func Validate()
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add Plural uniqueness validation to registry.Validate().

The RegisterEntityRoutes function (lines 840-876) registers routes using descriptor.Plural as the path segment. If two descriptors have the same Plural value, route registration would succeed but behavior would be unpredictable (gorilla/mux would match one arbitrarily).

The registry.Validate() function should verify that all registered descriptors have unique Plural values:

func Validate() {
    // existing ParentType validation...
    
    plurals := make(map[string]string)
    for _, d := range All() {
        if existing, ok := plurals[d.Plural]; ok {
            panic(fmt.Sprintf("duplicate Plural %q: used by both %q and %q", 
                d.Plural, existing, d.Type))
        }
        plurals[d.Plural] = d.Type
    }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/docs/generic-resource-registry-design.md` around lines 572 - 574,
registry.Validate() currently only checks ParentType references but must also
ensure descriptor.Plural values are unique to prevent ambiguous routing in
RegisterEntityRoutes (which uses descriptor.Plural as the path segment). Update
Validate() to iterate over All() descriptors, track seen plurals in a
map[string]string, and panic with a descriptive message including both
descriptor.Type values when a duplicate Plural is found (e.g., "duplicate Plural
%q: used by both %q and %q"). Ensure the new check runs alongside the existing
ParentType validation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be left to the implementation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rh-amarin, understood — that makes sense to handle during the implementation phase rather than the design doc.

Comment on lines +918 to +940
func (s *sqlResourceService) Delete(ctx context.Context, resourceType, id string) *errors.ServiceError {
for _, child := range registry.ChildrenOf(resourceType) {
count, _ := s.dao.CountByOwner(ctx, child.Type, id)
if count == 0 {
continue
}
switch child.OnParentDelete {
case registry.OnParentDeleteCascade:
children, _ := s.dao.FindByTypeAndOwner(ctx, child.Type, id)
for _, c := range children {
if err := s.Delete(ctx, c.Type, c.ID); err != nil {
return err
}
}
default: // OnParentDeleteRestrict
return errors.Conflict("HYPERFLEET-CNF-001",
"cannot delete %s %q: has %d active %s(s)",
resourceType, id, count, child.Type)
}
}
return handleDeleteError(resourceType, s.dao.Delete(ctx, resourceType, id))
}
```
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix error handling in the Delete implementation.

The implementation silently ignores errors from CountByOwner (line 920) and FindByTypeAndOwner (line 926). If these queries fail due to database errors, the function would proceed with incorrect assumptions:

  1. If CountByOwner fails, count is 0, and the function skips processing children that actually exist.
  2. If FindByTypeAndOwner fails, children is empty, and the cascade loop doesn't execute, leaving orphaned children.

Both scenarios violate the intended delete semantics and could lead to data integrity issues.

🐛 Proposed fix to handle errors properly
 func (s *sqlResourceService) Delete(ctx context.Context, resourceType, id string) *errors.ServiceError {
     for _, child := range registry.ChildrenOf(resourceType) {
-        count, _ := s.dao.CountByOwner(ctx, child.Type, id)
+        count, err := s.dao.CountByOwner(ctx, child.Type, id)
+        if err != nil {
+            return errors.GeneralError("failed to count child %s: %v", child.Type, err)
+        }
         if count == 0 {
             continue
         }
         switch child.OnParentDelete {
         case registry.OnParentDeleteCascade:
-            children, _ := s.dao.FindByTypeAndOwner(ctx, child.Type, id)
+            children, err := s.dao.FindByTypeAndOwner(ctx, child.Type, id)
+            if err != nil {
+                return errors.GeneralError("failed to find child %s: %v", child.Type, err)
+            }
             for _, c := range children {
                 if err := s.Delete(ctx, c.Type, c.ID); err != nil {
                     return err
                 }
             }
         default: // OnParentDeleteRestrict
             return errors.Conflict("HYPERFLEET-CNF-001",
                 "cannot delete %s %q: has %d active %s(s)",
                 resourceType, id, count, child.Type)
         }
     }
     return handleDeleteError(resourceType, s.dao.Delete(ctx, resourceType, id))
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *sqlResourceService) Delete(ctx context.Context, resourceType, id string) *errors.ServiceError {
for _, child := range registry.ChildrenOf(resourceType) {
count, _ := s.dao.CountByOwner(ctx, child.Type, id)
if count == 0 {
continue
}
switch child.OnParentDelete {
case registry.OnParentDeleteCascade:
children, _ := s.dao.FindByTypeAndOwner(ctx, child.Type, id)
for _, c := range children {
if err := s.Delete(ctx, c.Type, c.ID); err != nil {
return err
}
}
default: // OnParentDeleteRestrict
return errors.Conflict("HYPERFLEET-CNF-001",
"cannot delete %s %q: has %d active %s(s)",
resourceType, id, count, child.Type)
}
}
return handleDeleteError(resourceType, s.dao.Delete(ctx, resourceType, id))
}
```
func (s *sqlResourceService) Delete(ctx context.Context, resourceType, id string) *errors.ServiceError {
for _, child := range registry.ChildrenOf(resourceType) {
count, err := s.dao.CountByOwner(ctx, child.Type, id)
if err != nil {
return errors.GeneralError("failed to count child %s: %v", child.Type, err)
}
if count == 0 {
continue
}
switch child.OnParentDelete {
case registry.OnParentDeleteCascade:
children, err := s.dao.FindByTypeAndOwner(ctx, child.Type, id)
if err != nil {
return errors.GeneralError("failed to find child %s: %v", child.Type, err)
}
for _, c := range children {
if err := s.Delete(ctx, c.Type, c.ID); err != nil {
return err
}
}
default: // OnParentDeleteRestrict
return errors.Conflict("HYPERFLEET-CNF-001",
"cannot delete %s %q: has %d active %s(s)",
resourceType, id, count, child.Type)
}
}
return handleDeleteError(resourceType, s.dao.Delete(ctx, resourceType, id))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/docs/generic-resource-registry-design.md` around lines 918 - 940,
The Delete method ignores errors from s.dao.CountByOwner and
s.dao.FindByTypeAndOwner which can hide DB failures and cause incorrect delete
behavior; update sqlResourceService.Delete to capture the returned error from
CountByOwner and if non-nil return it (wrap via handleDeleteError or convert to
*errors.ServiceError) instead of assuming count==0, and likewise capture and
return any error from FindByTypeAndOwner before iterating children so cascade
deletion only runs when the query succeeded; keep the existing call to
handleDeleteError(s.dao.Delete(...)) for the final delete.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left for the real implementation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rh-amarin, understood — the code snippets in this design doc are illustrative, so deferring proper error handling to the actual implementation makes sense. Just make sure the fix is applied when the Delete service method gets written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant