|
| 1 | +# Addon Dependency Management |
| 2 | + |
| 3 | +## Release Signoff Checklist |
| 4 | + |
| 5 | +- [ ] Enhancement is `implementable` |
| 6 | +- [ ] Design details are appropriately documented from clear requirements |
| 7 | +- [ ] Test plan is defined |
| 8 | +- [ ] Graduation criteria for dev preview, tech preview, GA |
| 9 | +- [ ] User-facing documentation is created in [website](https://github.com/open-cluster-management/website/) |
| 10 | + |
| 11 | +## Summary |
| 12 | + |
| 13 | +This proposal introduces a dependency management mechanism for managed cluster addons, allowing addon authors to declare dependencies between addons. The system will ensure that dependent addons are installed and available before installing or marking an addon as healthy. |
| 14 | + |
| 15 | +## Motivation |
| 16 | + |
| 17 | +Currently, there is no way to declare or enforce dependencies between addons in Open Cluster Management. This creates several challenges: |
| 18 | + |
| 19 | +1. **Manual coordination required**: Administrators must manually ensure that prerequisite addons are installed before installing dependent addons. |
| 20 | + |
| 21 | +2. **Silent failures**: When a dependent addon is missing, addons may fail at runtime with unclear error messages, making troubleshooting difficult. |
| 22 | + |
| 23 | +3. **Configuration complexity**: Some addons rely on Custom Resource Definitions (CRDs) or resources provided by other addons (e.g., Managed Service Account addon provides ManagedServiceAccount API that other addons use). Without dependency tracking, these relationships are implicit and undocumented. |
| 24 | + |
| 25 | +4. **Installation ordering**: There's no automated way to ensure correct installation ordering when multiple interdependent addons are deployed simultaneously. |
| 26 | + |
| 27 | +### Goals |
| 28 | + |
| 29 | +- Provide a declarative way for addon authors to specify dependencies on other addons |
| 30 | +- Automatically validate that all dependencies are satisfied before considering an addon healthy |
| 31 | +- Provide clear status feedback when dependencies are not met |
| 32 | +- Maintain backward compatibility with existing addons that have no dependencies |
| 33 | + |
| 34 | +### Non-Goals |
| 35 | + |
| 36 | +- Automatic installation of dependencies (users still need to explicitly install required addons) |
| 37 | +- Dependency resolution and ordering during installation |
| 38 | +- Support for circular dependencies |
| 39 | +- Dependency management across different managed clusters (dependencies are validated per cluster) |
| 40 | +- Version constraints for dependencies (addon API does not currently track version information) |
| 41 | + |
| 42 | +## Proposal |
| 43 | + |
| 44 | +We propose to extend the `ClusterManagementAddOn` API to include a `dependencies` field that allows addon authors to declare dependencies on other addons. The addon-manager component will validate these dependencies on each managed cluster and set appropriate status conditions on the `ManagedClusterAddOn` resource. |
| 45 | + |
| 46 | +### User Stories |
| 47 | + |
| 48 | +#### Story 1: Addon depending on Managed Service Account |
| 49 | + |
| 50 | +As an addon author, I want to declare a dependency on the Managed Service Account addon because my addon uses the `ManagedServiceAccount` API to provision service accounts on managed clusters. When the Managed Service Account addon is not installed, I want users to see a clear error message in the addon status. |
| 51 | + |
| 52 | + |
| 53 | +## Design Details |
| 54 | + |
| 55 | +### API Changes |
| 56 | + |
| 57 | +#### ClusterManagementAddOn |
| 58 | + |
| 59 | +Add a new `dependencies` field to the `ClusterManagementAddOnSpec`: |
| 60 | + |
| 61 | +```go |
| 62 | +// ClusterManagementAddOnSpec provides information for the add-on. |
| 63 | +type ClusterManagementAddOnSpec struct { |
| 64 | + // ... existing fields ... |
| 65 | + |
| 66 | + // dependencies is a list of add-ons that this add-on depends on. |
| 67 | + // The add-on will only be installed and considered healthy if all its dependencies |
| 68 | + // are installed and available on the managed cluster. |
| 69 | + // An empty list means the add-on has no dependencies. |
| 70 | + // The default is an empty list. |
| 71 | + // +optional |
| 72 | + Dependencies []AddonDependency `json:"dependencies,omitempty"` |
| 73 | +} |
| 74 | + |
| 75 | +// AddonDependency represents a dependency on another add-on. |
| 76 | +type AddonDependency struct { |
| 77 | + // name is the name of the dependent add-on. |
| 78 | + // This should match the name of a ClusterManagementAddOn resource. |
| 79 | + // +required |
| 80 | + // +kubebuilder:validation:Required |
| 81 | + // +kubebuilder:validation:MinLength:=1 |
| 82 | + Name string `json:"name"` |
| 83 | + |
| 84 | + // type specifies the type of the dependency. |
| 85 | + // Valid values are: |
| 86 | + // - "Optional" (default): The addon can work with reduced functionality without this dependency. |
| 87 | + // The Degraded condition will be set with reason DependencyNotSatisfied when the dependency is not satisfied, |
| 88 | + // but the Available condition may remain True if the addon is otherwise functional. |
| 89 | + // - "Required": The addon cannot function without this dependency. |
| 90 | + // The Degraded condition will be set with reason RequiredDependencyNotSatisfied, and the Available |
| 91 | + // condition should be set to False when the dependency is not satisfied. |
| 92 | + // +optional |
| 93 | + // +kubebuilder:validation:Enum=Optional;Required |
| 94 | + // +kubebuilder:default=Optional |
| 95 | + Type DependencyType `json:"type,omitempty"` |
| 96 | +} |
| 97 | + |
| 98 | +// DependencyType describes the type of dependency |
| 99 | +// +kubebuilder:validation:Enum=Optional;Required |
| 100 | +type DependencyType string |
| 101 | + |
| 102 | +const ( |
| 103 | + // DependencyTypeOptional indicates the addon can work with reduced functionality without this dependency |
| 104 | + DependencyTypeOptional DependencyType = "Optional" |
| 105 | + // DependencyTypeRequired indicates the addon cannot function without this dependency |
| 106 | + DependencyTypeRequired DependencyType = "Required" |
| 107 | +) |
| 108 | +``` |
| 109 | + |
| 110 | +#### ManagedClusterAddOn Status |
| 111 | + |
| 112 | +Add a new reason for the existing `Degraded` condition: |
| 113 | + |
| 114 | +```go |
| 115 | +// the reasons of condition ManagedClusterAddOnConditionDegraded |
| 116 | +const ( |
| 117 | + // AddonDegradedReasonDependencyNotSatisfied is the reason of condition Degraded indicating that one or more |
| 118 | + // soft (optional) dependencies of the addon are not satisfied (not installed or not available). |
| 119 | + // The addon may still be Available with reduced functionality. |
| 120 | + AddonDegradedReasonDependencyNotSatisfied = "DependencyNotSatisfied" |
| 121 | + |
| 122 | + // AddonDegradedReasonRequiredDependencyNotSatisfied is the reason of condition Degraded indicating that one or more |
| 123 | + // hard (required) dependencies of the addon are not satisfied (not installed or not available). |
| 124 | + // The Available condition should also be set to False as the addon cannot function. |
| 125 | + AddonDegradedReasonRequiredDependencyNotSatisfied = "RequiredDependencyNotSatisfied" |
| 126 | +) |
| 127 | +``` |
| 128 | + |
| 129 | +#### Dependency Types |
| 130 | + |
| 131 | +There are two types of dependencies: |
| 132 | + |
| 133 | +- **Optional dependencies (default, type=Optional)**: The addon can still function with reduced functionality when the dependency is missing. When an optional dependency is not satisfied, the addon-manager will set `Degraded=True` with reason `DependencyNotSatisfied`, but the klusterlet-agent will not modify the `Available` condition, allowing the addon to remain available if its health checks pass. |
| 134 | + |
| 135 | +- **Required dependencies (type=Required)**: The addon cannot function at all without the dependency. When a required dependency is not satisfied, the addon-manager will set `Degraded=True` with reason `RequiredDependencyNotSatisfied`, and the klusterlet-agent will detect this specific reason and set `Available=False`. |
| 136 | + |
| 137 | +### Implementation Details |
| 138 | + |
| 139 | +#### Dependency Validation |
| 140 | + |
| 141 | +**Addon-Manager (on Hub):** |
| 142 | + |
| 143 | +The addon-manager component will implement dependency validation with the following logic: |
| 144 | + |
| 145 | +1. **Read dependencies**: When reconciling a `ManagedClusterAddOn`, read the corresponding `ClusterManagementAddOn` to get the list of dependencies. |
| 146 | + |
| 147 | +2. **Check each dependency**: For each dependency in the list: |
| 148 | + - Check if a `ManagedClusterAddOn` with the same name exists in the same namespace (managed cluster namespace) |
| 149 | + - Check if the dependent addon's `Available` condition is `True` |
| 150 | + |
| 151 | +3. **Set Degraded condition based on dependency type**: |
| 152 | + - **If all dependencies are satisfied**: |
| 153 | + - Ensure the `Degraded` condition is not set with reason `DependencyNotSatisfied` or `RequiredDependencyNotSatisfied` |
| 154 | + |
| 155 | + - **If any optional dependency is not satisfied** (type=Optional): |
| 156 | + - Set the `Degraded` condition to `True` with: |
| 157 | + - Reason: `DependencyNotSatisfied` |
| 158 | + - Message: Clear description of which dependencies are missing (e.g., "Optional addon 'managed-serviceaccount' is not installed or not available") |
| 159 | + - Do NOT modify the `Available` condition |
| 160 | + |
| 161 | + - **If any required dependency is not satisfied** (type=Required): |
| 162 | + - Set the `Degraded` condition to `True` with: |
| 163 | + - Reason: `RequiredDependencyNotSatisfied` (different reason!) |
| 164 | + - Message: Clear description of which dependencies are missing (e.g., "Required addon 'managed-serviceaccount' is not installed or not available") |
| 165 | + - Do NOT modify the `Available` condition (klusterlet-agent will do this) |
| 166 | + |
| 167 | +**Klusterlet-Agent (on Managed Cluster):** |
| 168 | + |
| 169 | +The klusterlet-agent component will check the `Degraded` condition when determining addon availability: |
| 170 | + |
| 171 | +1. **When reconciling addon health**: After checking lease/probe health status |
| 172 | +2. **Check for required dependency failures**: |
| 173 | + - If `Degraded=True` with reason `RequiredDependencyNotSatisfied`, set `Available=False` |
| 174 | + - If `Degraded=True` with reason `DependencyNotSatisfied` (optional dependency), do not modify `Available` - let the addon's own health checks determine availability |
| 175 | + |
| 176 | +This design ensures clear component ownership and avoids conflicts: |
| 177 | +- **addon-manager** owns dependency validation and sets `Degraded` with appropriate reason |
| 178 | +- **klusterlet-agent** owns `Available` and considers dependency information when making availability decisions |
| 179 | + |
| 180 | +**Key Design Benefits:** |
| 181 | +1. The klusterlet-agent does not need access to the `ClusterManagementAddOn` API - all dependency type information is encoded in the `Degraded` condition reason |
| 182 | +2. Component ownership is clear and prevents components from fighting over the same condition |
| 183 | +3. Different dependency types (Optional vs Required) result in different status behaviors automatically |
| 184 | + |
| 185 | +### Example Usage |
| 186 | + |
| 187 | +#### Example 1: Optional Dependency (Default) |
| 188 | + |
| 189 | +An addon that can work with reduced functionality without the dependency: |
| 190 | + |
| 191 | +```yaml |
| 192 | +apiVersion: addon.open-cluster-management.io/v1alpha1 |
| 193 | +kind: ClusterManagementAddOn |
| 194 | +metadata: |
| 195 | + name: my-addon |
| 196 | +spec: |
| 197 | + addOnMeta: |
| 198 | + displayName: "My Addon" |
| 199 | + description: "An addon that optionally uses ManagedServiceAccount API" |
| 200 | + dependencies: |
| 201 | + - name: managed-serviceaccount |
| 202 | + # type: Optional is the default, can be omitted |
| 203 | + installStrategy: |
| 204 | + type: Manual |
| 205 | +``` |
| 206 | +
|
| 207 | +When the Managed Service Account addon is not installed, the `ManagedClusterAddOn` status would show: |
| 208 | + |
| 209 | +```yaml |
| 210 | +apiVersion: addon.open-cluster-management.io/v1alpha1 |
| 211 | +kind: ManagedClusterAddOn |
| 212 | +metadata: |
| 213 | + name: my-addon |
| 214 | + namespace: cluster1 |
| 215 | +status: |
| 216 | + conditions: |
| 217 | + - type: Available |
| 218 | + status: "True" # Addon is still available with reduced functionality |
| 219 | + reason: AddonAvailable |
| 220 | + message: "Addon is available" |
| 221 | + lastTransitionTime: "2025-10-22T10:00:00Z" |
| 222 | + - type: Degraded |
| 223 | + status: "True" |
| 224 | + reason: DependencyNotSatisfied |
| 225 | + message: "Optional addon 'managed-serviceaccount' is not installed or not available" |
| 226 | + lastTransitionTime: "2025-10-22T10:00:00Z" |
| 227 | +``` |
| 228 | + |
| 229 | +#### Example 2: Required Dependency |
| 230 | + |
| 231 | +An addon that cannot function without the dependency: |
| 232 | + |
| 233 | +```yaml |
| 234 | +apiVersion: addon.open-cluster-management.io/v1alpha1 |
| 235 | +kind: ClusterManagementAddOn |
| 236 | +metadata: |
| 237 | + name: my-critical-addon |
| 238 | +spec: |
| 239 | + addOnMeta: |
| 240 | + displayName: "My Critical Addon" |
| 241 | + description: "An addon that requires ManagedServiceAccount API" |
| 242 | + dependencies: |
| 243 | + - name: managed-serviceaccount |
| 244 | + type: Required # Required dependency |
| 245 | + installStrategy: |
| 246 | + type: Manual |
| 247 | +``` |
| 248 | + |
| 249 | +When the Managed Service Account addon is not installed, the `ManagedClusterAddOn` status would show: |
| 250 | + |
| 251 | +```yaml |
| 252 | +apiVersion: addon.open-cluster-management.io/v1alpha1 |
| 253 | +kind: ManagedClusterAddOn |
| 254 | +metadata: |
| 255 | + name: my-critical-addon |
| 256 | + namespace: cluster1 |
| 257 | +status: |
| 258 | + conditions: |
| 259 | + - type: Available |
| 260 | + status: "False" # Set by klusterlet-agent |
| 261 | + reason: RequiredDependencyNotSatisfied |
| 262 | + message: "Required addon 'managed-serviceaccount' is not installed or not available" |
| 263 | + lastTransitionTime: "2025-10-22T10:00:00Z" |
| 264 | + - type: Degraded |
| 265 | + status: "True" # Set by addon-manager |
| 266 | + reason: RequiredDependencyNotSatisfied |
| 267 | + message: "Required addon 'managed-serviceaccount' is not installed or not available" |
| 268 | + lastTransitionTime: "2025-10-22T10:00:00Z" |
| 269 | +``` |
| 270 | + |
| 271 | +### Risks and Mitigations |
| 272 | + |
| 273 | +#### Risk: Circular Dependencies |
| 274 | + |
| 275 | +**Risk**: Users might accidentally create circular dependencies (A depends on B, B depends on A). |
| 276 | + |
| 277 | +**Mitigation**: |
| 278 | +- Document that circular dependencies are not supported and will result in both addons being marked as degraded |
| 279 | +- Consider adding validation webhook to detect and reject circular dependencies |
| 280 | +- Future enhancement: Add a validation controller that detects circular dependencies |
| 281 | + |
| 282 | +#### Risk: Dependency Chain Complexity |
| 283 | + |
| 284 | +**Risk**: Long dependency chains might make troubleshooting difficult. |
| 285 | + |
| 286 | +**Mitigation**: |
| 287 | +- Provide clear error messages that list all unsatisfied dependencies |
| 288 | +- Document best practices for keeping dependency chains shallow |
| 289 | +- Consider adding status field to show the full dependency tree |
| 290 | + |
| 291 | +### Test Plan |
| 292 | + |
| 293 | +#### Unit Tests |
| 294 | + |
| 295 | +- Test dependency validation logic with various scenarios: |
| 296 | + - No dependencies |
| 297 | + - Single dependency (satisfied/unsatisfied) |
| 298 | + - Multiple dependencies (all satisfied, some unsatisfied, none satisfied) |
| 299 | + |
| 300 | +#### Integration Tests |
| 301 | + |
| 302 | +- Test complete workflow: |
| 303 | + 1. Install addon A with dependency on addon B (B not installed) - verify Degraded condition |
| 304 | + 2. Install addon B - verify addon A becomes Available |
| 305 | + 3. Delete addon B - verify addon A becomes Degraded |
| 306 | + |
| 307 | +#### E2E Tests |
| 308 | + |
| 309 | +- Deploy real addons with dependencies on a test cluster |
| 310 | +- Verify status conditions are correctly set |
| 311 | +- Verify addon behavior when dependencies are not met |
| 312 | + |
| 313 | +### Graduation Criteria |
| 314 | + |
| 315 | +#### Beta (v1beta1) |
| 316 | + |
| 317 | +- API changes implemented in `open-cluster-management.io/api` addon v1beta1 |
| 318 | +- Dependency validation fully supported by addon-manager component |
| 319 | +- Klusterlet-agent updated to handle `RequiredDependencyNotSatisfied` reason |
| 320 | +- Unit and integration tests passing |
| 321 | +- E2E tests with real addons |
| 322 | +- At least 2 real addons using dependency declarations |
| 323 | +- Metrics for dependency validation failures |
| 324 | +- Documentation of the feature and API |
| 325 | + |
| 326 | +#### GA (v1) |
| 327 | + |
| 328 | +- Proven stability over at least 2 releases in v1beta1 |
| 329 | +- Comprehensive documentation including troubleshooting guides |
| 330 | +- No critical bugs reported |
| 331 | +- Widely adopted by addon authors (at least 5 addons using dependencies in production) |
| 332 | +- Performance validated at scale (tested with 1000+ clusters) |
| 333 | + |
| 334 | +### Upgrade / Downgrade Strategy |
| 335 | + |
| 336 | +#### Upgrade |
| 337 | + |
| 338 | +- **From version without dependency support to version with dependency support**: |
| 339 | + - Existing addons without dependencies continue to work unchanged |
| 340 | + - New or updated addons can add dependencies field |
| 341 | + - No migration required |
| 342 | + |
| 343 | +#### Downgrade |
| 344 | + |
| 345 | +- **From version with dependency support to version without dependency support**: |
| 346 | + - Dependencies field will be ignored by older controllers |
| 347 | + - Addons will function as if they have no dependencies |
| 348 | + - Status conditions related to dependencies will not be updated |
| 349 | + |
| 350 | +### Version Skew Strategy |
| 351 | + |
| 352 | +- Hub and managed cluster components need to be aware of version compatibility |
| 353 | +- Dependency validation happens on the hub side in the addon-manager component |
| 354 | +- No version skew issues expected as long as the API version is compatible |
| 355 | + |
| 356 | +## Implementation History |
| 357 | + |
| 358 | +- 2025-10-22: Initial KEP draft |
| 359 | +- TBD: Beta implementation |
| 360 | +- TBD: GA promotion |
| 361 | + |
| 362 | +## Drawbacks |
| 363 | + |
| 364 | +- Adds complexity to the addon API |
| 365 | +- Requires addon authors to maintain accurate dependency information |
| 366 | +- Does not automatically install dependencies (users must still manually install them) |
| 367 | + |
| 368 | +## Alternatives |
| 369 | + |
| 370 | +### Alternative 1: Only Set Degraded Condition |
| 371 | + |
| 372 | +Instead of adding a `required` field, the addon-manager could only set the `Degraded` condition when dependencies are not satisfied, and never modify the `Available` condition. |
| 373 | + |
| 374 | +**Pros**: |
| 375 | +- Simpler implementation - no need for the `required` field |
| 376 | +- Addon controllers have full control over their own availability status |
| 377 | +- Cleaner separation of concerns |
| 378 | + |
| 379 | +**Cons**: |
| 380 | +- Less clear semantics - users cannot declare hard dependencies in the API |
| 381 | +- Addon controllers must implement their own logic to set `Available=False` when dependencies are missing |
| 382 | +- More work for addon authors who want hard dependency behavior |
| 383 | + |
| 384 | +**Decision**: Not chosen as the primary approach because the `type` field provides clearer semantics and reduces boilerplate for addon authors. However, this remains a valid alternative if the `type` field proves too complex in practice. |
| 385 | + |
| 386 | +## Infrastructure Needed |
| 387 | + |
| 388 | +- No new infrastructure required |
| 389 | +- Existing CI/CD pipelines can be used for testing |
| 390 | +- Documentation updates needed in open-cluster-management website |
0 commit comments