fix minor issues found during KServe demo#127
Conversation
📝 WalkthroughWalkthroughThe pull request introduces Kind cluster support and extract job log-level configuration across the GKM stack. Changes span Makefile build targets, Kubernetes manifests, API type definitions, and controller initialization logic. The CRD makes StorageClassName optional with fallback to cluster defaults. Controllers now propagate KIND_CLUSTER and EXTRACT_GO_LOG environment variables, and logic switches from NO_GPU to KIND_CLUSTER for PV provisioning in Kind environments. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
internal/controller/gkm-agent/common.go (1)
830-933:⚠️ Potential issue | 🟠 MajorThe KIND/manual-PV path does not implement the documented default StorageClass fallback.
The CRD for
storageClassNameis optional and documents that "If not provided, then default Storage Class will be used." However,GetStorageClassName()returns the raw field value without any defaulting logic, and bothCreatePv()andCreatePvc()assign this value directly—including empty strings—to the PV/PVC spec. In Kubernetes, explicitly settingstorageClassName: ""differs from omitting the field; the former disables default selection, while the latter enables it. The result is that omittedstorageClassNamein the CR breaks the documented contract by causing explicit empty-string assignments to the PV and PVC, preventing cluster default storage class usage.Implement either:
- Logic in
GetStorageClassName()or the KIND-path callers to skip settingStorageClassNameif empty, or- A default-resolution step before calling
CreatePv()andCreatePvc()that looks up and assigns the cluster default when the CR field is omitted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/gkm-agent/common.go` around lines 830 - 933, The KIND/manual-PV path is passing an explicit empty string for storageClassName (via (*gkmCache).GetStorageClassName()) into CreatePv and CreatePvc, which prevents Kubernetes from using the cluster default; change the call/site or helper so empty means "omit the field": update the callers here to capture sc := (*gkmCache).GetStorageClassName() and if sc == "" pass a nil/omitted value (or a pointer set to nil) into common.CreatePv and common.CreatePvc, and/or update common.CreatePv/common.CreatePvc to treat an empty storageClassName string by not setting Spec.StorageClassName (leave it nil) instead of writing "" to the PV/PVC; ensure GetStorageClassName need not change if you implement the omit-in-caller or omit-in-Create* behavior.cmd/main.go (1)
239-250:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWire
ExtractLogLevelinto the cluster-scoped reconciler too.
commonClnever receivesextractLogLevel, soClusterGKMCacheextraction jobs will still run with the zero-value log level even though the env/default was resolved above. That leaves half of the new log-level plumbing inactive.Suggested fix
commonCl := gkmOperator.ReconcilerCommonOperator[ gkmv1alpha1.ClusterGKMCache, gkmv1alpha1.ClusterGKMCacheList, gkmv1alpha1.ClusterGKMCacheNode, gkmv1alpha1.ClusterGKMCacheNodeList, ]{ Client: mgr.GetClient(), Scheme: mgr.GetScheme(), NoGpu: noGpu, KindCluster: kindCluster, + ExtractLogLevel: extractLogLevel, ExtractImage: extractImage, CrdCacheStr: "ClusterGKMCache", CrdCacheNodeStr: "ClusterGKMCacheNode", }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/main.go` around lines 239 - 250, The Cluster-scoped reconciler's commonCl initializer is missing the ExtractLogLevel field, so ClusterGKMCache extraction uses the zero-value; update the commonCl composite literal in cmd/main.go to include ExtractLogLevel: extractLogLevel (the same variable resolved earlier) so that the gkmOperator.ReconcilerCommonOperator for ClusterGKMCache receives the log level; ensure the field name matches the struct (ExtractLogLevel) and rebuild to propagate the log level into the ClusterGKMCache reconciler.internal/controller/gkm-operator/common.go (1)
767-784:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
KIND_CLUSTERis only wired into half of the operator path.This
LaunchJobcall now usesr.KindCluster, butmanagePVandPVC()still decides whether to manually create the hostPath PV fromr.NoGpuon Line 614. Non-KINDNO_GPU=truedeployments will still take the KIND-only PV path, so the new flag does not actually separate the two behaviors yet.Suggested fix
- if r.NoGpu { + if r.KindCluster {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/gkm-operator/common.go` around lines 767 - 784, managePVandPVC is still using r.NoGpu to decide creating the hostPath PV so NON-KIND deployments with NO_GPU=true incorrectly follow the KIND path; update managePVandPVC to check the cluster type (r.KindCluster or equivalent) when deciding to create hostPath PVs and/or when calling the KIND-specific helper so that only when r.KindCluster is true will the hostPath PV be created; ensure any logic branches or callers (e.g., where managePVandPVC is invoked and where LaunchJob is called) are consistent and reference r.KindCluster alongside/ instead of r.NoGpu to separate KIND behavior from the NO_GPU flag.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/v1alpha1/shared_types.go`:
- Around line 80-84: The spec documents that omitting StorageClassName should
allow the cluster default, but CreatePvc currently always sets StorageClassName
by taking a pointer to the (possibly empty) string; change CreatePvc (the
function that builds the PVC spec) to only set the PVC's StorageClassName
pointer when the input StorageClassName is non-empty (e.g., check if
spec.StorageClassName != "" after trimming whitespace) and leave the field nil
otherwise so Kubernetes will select the default StorageClass; update any helper
that constructs &storageClass to follow the same conditional logic.
In `@config/configMap/configMap.yaml`:
- Line 15: The constant ConfigMapIndexKindCluster is defined with the wrong key
string ("gkm.kind.cluster") causing lookups to miss the YAML key; update the
value of ConfigMapIndexKindCluster to "gkm.kindcluster" so it matches the keys
used in the YAML files (e.g., configMap entries) and ensure any reference to
ConfigMapIndexKindCluster continues to compile and read the corrected key.
- Around line 8-10: The operator Deployment is not injecting EXTRACT_GO_LOG from
the ConfigMap so gkm.extract.log.level is ignored; update the operator manifest
(config/operator/operator.yaml) to add an environment variable named
EXTRACT_GO_LOG in the operator container that sources its valueFrom the
ConfigMap key gkm.extract.log.level (use configMapKeyRef, similar to
NO_GPU/KIND_CLUSTER entries), and remove the dead code in
internal/controller/gkm-operator/configmap.go that reads the extractLogLevel
variable only to log it (delete the unused extractLogLevel lookup and its log
statement) so the value is injected via the manifest instead of being a no-op in
code.
In `@config/configMap/kustomization.yaml`:
- Around line 11-12: Remove the hard-coded gkm.kindcluster=true from the shared
base kustomization (config/configMap/kustomization.yaml) and instead add that
patch/setting to the KIND overlay kustomization
(config/kind-gpu/kustomization.yaml); locate the base configMap entry that
currently sets gkm.kindcluster and delete it so the base uses the default in
configMap.yaml (false), then add a kustomizeConfigMapGenerator/patch or overlay
entry in config/kind-gpu/kustomization.yaml to set gkm.kindcluster=true so only
the KIND/demo overlay enables kind-specific behavior.
In `@Makefile`:
- Around line 299-314: The Makefile uses ifdef for KIND_CLUSTER and NO_GPU which
treats any non-empty value as true; replace those conditionals with explicit
value checks (ifeq ($(KIND_CLUSTER),true) and ifeq ($(NO_GPU),true)) and adjust
the sed invocations so you always generate kustomization.yaml but insert the
correct literal values based on the true/false check (set gkm.kindcluster=true
when KIND_CLUSTER == true else gkm.kindcluster=false; set gkm.nogpu=true when
NO_GPU == true else gkm.nogpu=false), keeping the existing AGENT_IMG and
EXTRACT_IMG substitutions (AGENT_IMG, EXTRACT_IMG, kustomization.yaml.env,
kustomization.yaml) intact.
---
Outside diff comments:
In `@cmd/main.go`:
- Around line 239-250: The Cluster-scoped reconciler's commonCl initializer is
missing the ExtractLogLevel field, so ClusterGKMCache extraction uses the
zero-value; update the commonCl composite literal in cmd/main.go to include
ExtractLogLevel: extractLogLevel (the same variable resolved earlier) so that
the gkmOperator.ReconcilerCommonOperator for ClusterGKMCache receives the log
level; ensure the field name matches the struct (ExtractLogLevel) and rebuild to
propagate the log level into the ClusterGKMCache reconciler.
In `@internal/controller/gkm-agent/common.go`:
- Around line 830-933: The KIND/manual-PV path is passing an explicit empty
string for storageClassName (via (*gkmCache).GetStorageClassName()) into
CreatePv and CreatePvc, which prevents Kubernetes from using the cluster
default; change the call/site or helper so empty means "omit the field": update
the callers here to capture sc := (*gkmCache).GetStorageClassName() and if sc ==
"" pass a nil/omitted value (or a pointer set to nil) into common.CreatePv and
common.CreatePvc, and/or update common.CreatePv/common.CreatePvc to treat an
empty storageClassName string by not setting Spec.StorageClassName (leave it
nil) instead of writing "" to the PV/PVC; ensure GetStorageClassName need not
change if you implement the omit-in-caller or omit-in-Create* behavior.
In `@internal/controller/gkm-operator/common.go`:
- Around line 767-784: managePVandPVC is still using r.NoGpu to decide creating
the hostPath PV so NON-KIND deployments with NO_GPU=true incorrectly follow the
KIND path; update managePVandPVC to check the cluster type (r.KindCluster or
equivalent) when deciding to create hostPath PVs and/or when calling the
KIND-specific helper so that only when r.KindCluster is true will the hostPath
PV be created; ensure any logic branches or callers (e.g., where managePVandPVC
is invoked and where LaunchJob is called) are consistent and reference
r.KindCluster alongside/ instead of r.NoGpu to separate KIND behavior from the
NO_GPU flag.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 7213e779-877f-4c9f-bdbd-ea53d8893f64
📒 Files selected for processing (14)
Makefileagent/main.goapi/v1alpha1/shared_types.gocmd/main.goconfig/agent/gkm-agent.yamlconfig/configMap/configMap.yamlconfig/configMap/kustomization.yamlconfig/crd/bases/gkm.io_clustergkmcaches.yamlconfig/crd/bases/gkm.io_gkmcaches.yamlconfig/operator/operator.yamlinternal/controller/gkm-agent/common.gointernal/controller/gkm-operator/common.gopkg/common/k8s.gopkg/utils/contants.go
| // storageClassName contains the name of the Kubernetes Storage Class, which | ||
| // will be used for the PersistentVolume and PersistentVolumeClaim the GKM will | ||
| // create in order to store the extract GPU Kernel Cache. | ||
| // +required | ||
| // +kubebuilder:validation:Required | ||
| StorageClassName string `json:"storageClassName"` | ||
| // create in order to store the extract GPU Kernel Cache. If not provided, then | ||
| // default Storage Class will be used. | ||
| StorageClassName string `json:"storageClassName,omitempty"` |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Kubernetes PersistentVolumeClaimSpec, what is the behavior difference between omitting storageClassName(or leaving it null) and settingstorageClassName: ""? Which form uses the cluster's default StorageClass?
💡 Result:
Omitting storageClassName (or leaving it null) in a Kubernetes PersistentVolumeClaimSpec allows the PVC to use the cluster's default StorageClass if one is configured. The control plane will set storageClassName to the default for new PVCs without it, and retroactively update existing unbound PVCs lacking the field or with empty value to the default when a default becomes available. Setting storageClassName: "" explicitly requests a PV with no class (storageClassName unset or ""), preventing use of any default StorageClass. PVCs with storageClassName: "" will not be updated to the default even if one is later configured, and they only bind to classless PVs. The form that uses the cluster's default StorageClass is omitting storageClassName (or null).
Citations:
- 1: https://www.k8s.io/docs/concepts/storage/storage-classes
- 2: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
- 3: https://kubernetes.io/docs/concepts/storage/storage-classes/
- 4: https://www.k8s.io/docs/concepts/storage/persistent-volumes/
🏁 Script executed:
# First, check the StorageClassName field definition and its doc comment
echo "=== File: api/v1alpha1/shared_types.go (lines 75-90) ==="
sed -n '75,90p' api/v1alpha1/shared_types.go
echo ""
echo "=== File: pkg/common/k8s.go (lines 471-539) ==="
sed -n '471,539p' pkg/common/k8s.goRepository: redhat-et/GKM
Length of output: 2866
🏁 Script executed:
# Search for calls to CreatePvc to see how storageClass argument is passed
echo "=== Searching for CreatePvc calls ==="
rg -n "CreatePvc" --type go -A 5 -B 2 | head -100Repository: redhat-et/GKM
Length of output: 1546
🏁 Script executed:
# Get full context for the CreatePvc calls
echo "=== Call 1: internal/controller/gkm-operator/common.go ==="
sed -n '698,730p' internal/controller/gkm-operator/common.go
echo ""
echo "=== Call 2: internal/controller/gkm-agent/common.go ==="
sed -n '920,952p' internal/controller/gkm-agent/common.goRepository: redhat-et/GKM
Length of output: 1480
🏁 Script executed:
# Find GetStorageClassName method definition
echo "=== Searching for GetStorageClassName definition ==="
rg -n "GetStorageClassName" --type go -A 3 -B 1 | head -50Repository: redhat-et/GKM
Length of output: 3299
🏁 Script executed:
# Get the implementation of GetStorageClassName
echo "=== GetStorageClassName implementation ==="
sed -n '93,110p' api/v1alpha1/gkmcache_types.goRepository: redhat-et/GKM
Length of output: 561
The optional storageClassName fallback is not implemented end-to-end.
The schema now documents that "if not provided, default Storage Class will be used," but when StorageClassName is omitted from the spec, it unmarshals to an empty string "" in Go. This empty string is then passed to CreatePvc (at pkg/common/k8s.go:514), which unconditionally sets StorageClassName: &storageClass. In Kubernetes, a pointer to an empty string is semantically equivalent to explicitly setting storageClassName: "", which prevents the default StorageClass from being selected. To honor the documented behavior, CreatePvc must treat empty/zero-value StorageClassName specially by omitting the field from the PVC spec entirely (or using a nil pointer).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/v1alpha1/shared_types.go` around lines 80 - 84, The spec documents that
omitting StorageClassName should allow the cluster default, but CreatePvc
currently always sets StorageClassName by taking a pointer to the (possibly
empty) string; change CreatePvc (the function that builds the PVC spec) to only
set the PVC's StorageClassName pointer when the input StorageClassName is
non-empty (e.g., check if spec.StorageClassName != "" after trimming whitespace)
and leave the field nil otherwise so Kubernetes will select the default
StorageClass; update any helper that constructs &storageClass to follow the same
conditional logic.
| gkm.operator.log.level: debug | ||
| gkm.agent.log.level: debug | ||
| gkm.extract.log.level: debug |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C2 'ExtractLogLevel|EXTRACT_GO_LOG|gkm\.extract\.log\.level' --glob '*.go' --glob '*.yaml' --glob '*.yml'Repository: redhat-et/GKM
Length of output: 3919
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Checking configmap.go around line 86 to see what happens with extractLogLevel ==="
sed -n '84,120p' internal/controller/gkm-operator/configmap.go
echo ""
echo "=== Searching for where extractLogLevel is used within configmap.go ==="
rg -n 'extractLogLevel' internal/controller/gkm-operator/configmap.go -A5 -B2
echo ""
echo "=== Checking if pod templates set EXTRACT_GO_LOG env var from config ==="
rg -n 'EXTRACT_GO_LOG|extractLogLevel' internal/controller/gkm-operator/common.go | head -20
echo ""
echo "=== Checking GetPodTemplate implementation to see env var injection ==="
rg -n 'GetPodTemplate' internal/controller/gkm-operator/common.go -A 100 | grep -A 100 'func.*GetPodTemplate' | head -80Repository: redhat-et/GKM
Length of output: 2253
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Full ReconcileGKMConfigMap function ==="
rg -n 'func.*ReconcileGKMConfigMap' internal/controller/gkm-operator/configmap.go -A 200 | head -150
echo ""
echo "=== Check if extractLogLevel or any ConfigMap values are stored/passed to a struct ==="
rg -n 'extractLogLevel|agentLogLevel' internal/controller/gkm-operator/ -B2 -A2
echo ""
echo "=== Look for where ExtractLogLevel struct field is populated in operator's common.go ==="
rg -n 'ExtractLogLevel.*=' internal/controller/gkm-operator/common.go -B3 -A3Repository: redhat-et/GKM
Length of output: 3306
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Search for where ExtractLogLevel is assigned/set in reconcilers ==="
rg -n 'ExtractLogLevel\s*[:=]' internal/controller/ -B5 -A3
echo ""
echo "=== Look for ReconcilerCommonAgent or ReconcilerCommonOperator initialization ==="
rg -n 'ReconcilerCommon.*{' internal/controller/gkm-operator/common.go -A 50 | head -100
echo ""
echo "=== Check how operator pod specs inject env vars ==="
rg -n 'Env.*=' internal/controller/gkm-operator/common.go | grep -i 'log\|extract'Repository: redhat-et/GKM
Length of output: 129
🏁 Script executed:
#!/bin/bash
set -eulo pipefail
echo "=== Search broadly for ReconcilerCommon initialization in operator ==="
rg -n 'ReconcilerCommon' internal/controller/gkm-operator/common.go -B5 -A15 | head -150
echo ""
echo "=== Look for operator main setup/initialization ==="
rg -n 'func main' cmd/main.go -A 200 | head -200
echo ""
echo "=== Check if operator reads ConfigMap and populates ExtractLogLevel ==="
rg -n 'configMapKey\|ConfigMapIndexExtractLogLevel\|EXTRACT_GO_LOG' cmd/ internal/controller/gkm-operator/ -B3 -A3Repository: redhat-et/GKM
Length of output: 14452
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Look for operator Deployment or manifest that sets EXTRACT_GO_LOG env var ==="
find . -type f \( -name "*.yaml" -o -name "*.yml" \) | xargs grep -l 'EXTRACT_GO_LOG\|operator' | head -20
echo ""
echo "=== Check operator Deployment manifest specifically ==="
fd -t f 'operator.*deployment\|deployment.*operator' config/ --iglob '*.yaml' --iglob '*.yml'
echo ""
echo "=== Look for references to EXTRACT_GO_LOG in manifests ==="
rg -n 'EXTRACT_GO_LOG' config/ --glob '*.yaml' --glob '*.yml'
echo ""
echo "=== Check if operator pod spec generation injects env vars from ConfigMap ==="
rg -n 'ValueFrom.*ConfigMap\|configMapRef\|EXTRACT' internal/controller/gkm-operator/common.go -B5 -A5 | head -80Repository: redhat-et/GKM
Length of output: 1249
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check operator.yaml for Deployment and env vars ==="
cat ./config/operator/operator.yaml
echo ""
echo "=== Search for any EXTRACT_GO_LOG in all yaml files ==="
rg -n 'EXTRACT_GO_LOG' config/ -t yamlRepository: redhat-et/GKM
Length of output: 4875
Add EXTRACT_GO_LOG environment variable to operator Deployment manifest.
The operator Deployment currently does not inject EXTRACT_GO_LOG from the ConfigMap, so gkm.extract.log.level in the ConfigMap has no effect. Extract jobs always use the hardcoded default log level ("info"). Inject EXTRACT_GO_LOG from gkm.extract.log.level to the operator container environment in config/operator/operator.yaml, similar to how NO_GPU, KIND_CLUSTER, and other values are currently injected. Additionally, remove the dead code in internal/controller/gkm-operator/configmap.go that reads extractLogLevel but only logs it without using the value.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/configMap/configMap.yaml` around lines 8 - 10, The operator Deployment
is not injecting EXTRACT_GO_LOG from the ConfigMap so gkm.extract.log.level is
ignored; update the operator manifest (config/operator/operator.yaml) to add an
environment variable named EXTRACT_GO_LOG in the operator container that sources
its valueFrom the ConfigMap key gkm.extract.log.level (use configMapKeyRef,
similar to NO_GPU/KIND_CLUSTER entries), and remove the dead code in
internal/controller/gkm-operator/configmap.go that reads the extractLogLevel
variable only to log it (delete the unused extractLogLevel lookup and its log
statement) so the value is injected via the manifest instead of being a no-op in
code.
| gkm.agent.image: quay.io/gkm/agent:latest | ||
| gkm.extract.image: quay.io/gkm/gkm-extract:latest | ||
| gkm.nogpu: false | ||
| gkm.kindcluster: false |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C2 'ConfigMapIndexKindCluster|gkm\.kindcluster|gkm\.kind\.cluster' --glob '*.go' --glob '*.yaml' --glob '*.yml'Repository: redhat-et/GKM
Length of output: 2240
Fix the key mismatch: update the constant to match the YAML keys in use.
The Go code uses ConfigMapIndexKindCluster from pkg/utils/contants.go (defined as "gkm.kind.cluster") to read this setting, but all YAML configuration files use gkm.kindcluster. This mismatch will cause the runtime lookup to fail silently, and the setting will not be applied. Update the constant in pkg/utils/contants.go line 107 from "gkm.kind.cluster" to "gkm.kindcluster" to match the keys used across config/configMap/configMap.yaml, config/configMap/kustomization.yaml, config/operator/operator.yaml, and config/agent/gkm-agent.yaml.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/configMap/configMap.yaml` at line 15, The constant
ConfigMapIndexKindCluster is defined with the wrong key string
("gkm.kind.cluster") causing lookups to miss the YAML key; update the value of
ConfigMapIndexKindCluster to "gkm.kindcluster" so it matches the keys used in
the YAML files (e.g., configMap entries) and ensure any reference to
ConfigMapIndexKindCluster continues to compile and read the corrected key.
| - gkm.nogpu=true | ||
| - gkm.kindcluster=true |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C2 'config/configMap|gkm\.kindcluster|gkm\.nogpu' --iglob 'kustomization.yaml'Repository: redhat-et/GKM
Length of output: 462
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find all kustomization.yaml files
echo "=== All kustomization.yaml files ==="
fd 'kustomization.yaml' --type f | sort
echo -e "\n=== References to config/configMap in kustomization files ==="
rg -l 'config/configMap' --type yaml | xargs -I {} sh -c 'echo "File: {}"; grep -n "config/configMap" "{}"'
echo -e "\n=== Search for gkm.kindcluster overrides in all yaml files ==="
rg 'gkm\.kindcluster' --type yaml -n
echo -e "\n=== Directory structure of overlays (if present) ==="
if [ -d "config" ]; then
find config -type d | sort
fiRepository: redhat-et/GKM
Length of output: 762
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== config/default/kustomization.yaml ==="
cat config/default/kustomization.yaml
echo -e "\n=== config/kind-gpu/kustomization.yaml ==="
cat config/kind-gpu/kustomization.yaml
echo -e "\n=== config/configMap/kustomization.yaml structure ==="
cat config/configMap/kustomization.yaml
echo -e "\n=== Search for configMap references (without full path) ==="
rg 'configMap' config/default/kustomization.yaml config/kind-gpu/kustomization.yaml 2>/dev/null || trueRepository: redhat-et/GKM
Length of output: 6047
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check kind-gpu overlay for any configMap overrides ==="
cat config/kind-gpu/agent-patch.yaml 2>/dev/null || echo "No agent-patch.yaml"
echo -e "\n=== Look for all overlay directories (directories with kustomization.yaml) ==="
find config -name kustomization.yaml -type f | xargs dirname | sort
echo -e "\n=== Check configMap.yaml to see default values ==="
cat config/configMap/configMap.yaml
echo -e "\n=== Search for any patches or replacements that might override gkm.kindcluster ==="
rg 'gkm\.kindcluster|kindcluster' --type yaml -A 2 -B 2Repository: redhat-et/GKM
Length of output: 2690
Move gkm.kindcluster=true to the KIND/demo overlay, not the shared base configMap.
The hard-coded value in config/configMap/kustomization.yaml lines 11-12 applies to all deployments via config/default, including production runs. Only config/kind-gpu/kustomization.yaml should set gkm.kindcluster=true; the base config/configMap/configMap.yaml correctly defaults to false. Without an override mechanism in production overlays, this forces KIND-specific behavior (manual PV/init-container) everywhere and defeats the new separation from NO_GPU.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/configMap/kustomization.yaml` around lines 11 - 12, Remove the
hard-coded gkm.kindcluster=true from the shared base kustomization
(config/configMap/kustomization.yaml) and instead add that patch/setting to the
KIND overlay kustomization (config/kind-gpu/kustomization.yaml); locate the base
configMap entry that currently sets gkm.kindcluster and delete it so the base
uses the default in configMap.yaml (false), then add a
kustomizeConfigMapGenerator/patch or overlay entry in
config/kind-gpu/kustomization.yaml to set gkm.kindcluster=true so only the
KIND/demo overlay enables kind-specific behavior.
| ifdef KIND_CLUSTER | ||
| cd config/configMap && \ | ||
| $(SED) \ | ||
| -e '/literals:/a\ - gkm.nogpu=true' \ | ||
| -e '/literals:/a\ - gkm.kindcluster=true' \ | ||
| -e 's@gkm\.agent\.image=.*@gkm.agent.image=$(AGENT_IMG)@' \ | ||
| -e 's@gkm\.extract\.image=.*@gkm.extract.image=$(EXTRACT_IMG)@' \ | ||
| kustomization.yaml.env > kustomization.yaml | ||
| else ifdef NO_GPU | ||
| cd config/configMap && \ | ||
| $(SED) \ | ||
| -e '/literals:/a\ - gkm.nogpu=true' \ | ||
| -e '/literals:/a\ - gkm.kindcluster=false' \ | ||
| -e 's@gkm\.agent\.image=.*@gkm.agent.image=$(AGENT_IMG)@' \ | ||
| -e 's@gkm\.extract\.image=.*@gkm.extract.image=$(EXTRACT_IMG)@' \ | ||
| kustomization.yaml.env > kustomization.yaml |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
out="$(make -n prepare-deploy KIND_CLUSTER=false AGENT_IMG=test/agent EXTRACT_IMG=test/extract 2>/dev/null || true)"
printf '%s\n' "$out" | rg -n 'gkm\.(kindcluster|nogpu)'Repository: redhat-et/GKM
Length of output: 39
🏁 Script executed:
# Check if the file exists and its size
wc -l Makefile
# View lines 299-314 to confirm the code snippet
sed -n '299,314p' Makefile
# Search for prepare-deploy target definition
rg -n 'prepare-deploy' MakefileRepository: redhat-et/GKM
Length of output: 963
🏁 Script executed:
# Get full context around prepare-deploy target
sed -n '290,320p' Makefile | cat -n
# Try running prepare-deploy without dry-run to see what happens
make prepare-deploy KIND_CLUSTER=false AGENT_IMG=test/agent EXTRACT_IMG=test/extract 2>&1 | head -20 || true
# Check if SED variable is defined
rg -n '^\s*SED\s*=' MakefileRepository: redhat-et/GKM
Length of output: 1736
🏁 Script executed:
# Verify GNU Make ifdef behavior with a test Makefile
cat > /tmp/test_ifdef.mk << 'EOF'
# Test ifdef behavior
ifdef MY_VAR
$(info ifdef MY_VAR: true)
else
$(info ifdef MY_VAR: false)
endif
ifeq ($(MY_VAR),true)
$(info ifeq true: true)
else
$(info ifeq true: false)
endif
EOF
# Document what ifdef does
cat > /tmp/check_ifdef.txt << 'EOF'
GNU Make ifdef syntax:
- ifdef VAR: checks if VAR is defined (non-empty)
- endif
Therefore:
- make -f /tmp/test_ifdef.mk MY_VAR=false → ifdef branch WILL execute (treats "false" as non-empty)
- make -f /tmp/test_ifdef.mk MY_VAR=true → ifdef branch WILL execute
- make -f /tmp/test_ifdef.mk → ifdef branch will NOT execute (undefined)
This is a well-known GNU Make behavior that can cause bugs.
EOF
cat /tmp/check_ifdef.txtRepository: redhat-et/GKM
Length of output: 453
Use value checks instead of ifdef for boolean flags.
ifdef KIND_CLUSTER and ifdef NO_GPU treat any non-empty value as true, so make deploy KIND_CLUSTER=false will still generate gkm.kindcluster=true and gkm.nogpu=true in the ConfigMap. Use ifeq ($(KIND_CLUSTER),true) instead to correctly distinguish between explicit values.
Suggested fix
-ifdef KIND_CLUSTER
+ifeq ($(KIND_CLUSTER),true)
cd config/configMap && \
$(SED) \
-e '/literals:/a\ - gkm.nogpu=true' \
-e '/literals:/a\ - gkm.kindcluster=true' \
-e 's@gkm\.agent\.image=.*@gkm.agent.image=$(AGENT_IMG)@' \
-e 's@gkm\.extract\.image=.*@gkm.extract.image=$(EXTRACT_IMG)@' \
kustomization.yaml.env > kustomization.yaml
-else ifdef NO_GPU
+else ifeq ($(NO_GPU),true)
cd config/configMap && \
$(SED) \
-e '/literals:/a\ - gkm.nogpu=true' \
-e '/literals:/a\ - gkm.kindcluster=false' \
-e 's@gkm\.agent\.image=.*@gkm.agent.image=$(AGENT_IMG)@' \
-e 's@gkm\.extract\.image=.*@gkm.extract.image=$(EXTRACT_IMG)@' \
kustomization.yaml.env > kustomization.yaml📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ifdef KIND_CLUSTER | |
| cd config/configMap && \ | |
| $(SED) \ | |
| -e '/literals:/a\ - gkm.nogpu=true' \ | |
| -e '/literals:/a\ - gkm.kindcluster=true' \ | |
| -e 's@gkm\.agent\.image=.*@gkm.agent.image=$(AGENT_IMG)@' \ | |
| -e 's@gkm\.extract\.image=.*@gkm.extract.image=$(EXTRACT_IMG)@' \ | |
| kustomization.yaml.env > kustomization.yaml | |
| else ifdef NO_GPU | |
| cd config/configMap && \ | |
| $(SED) \ | |
| -e '/literals:/a\ - gkm.nogpu=true' \ | |
| -e '/literals:/a\ - gkm.kindcluster=false' \ | |
| -e 's@gkm\.agent\.image=.*@gkm.agent.image=$(AGENT_IMG)@' \ | |
| -e 's@gkm\.extract\.image=.*@gkm.extract.image=$(EXTRACT_IMG)@' \ | |
| kustomization.yaml.env > kustomization.yaml | |
| ifeq ($(KIND_CLUSTER),true) | |
| cd config/configMap && \ | |
| $(SED) \ | |
| -e '/literals:/a\ - gkm.nogpu=true' \ | |
| -e '/literals:/a\ - gkm.kindcluster=true' \ | |
| -e 's@gkm\.agent\.image=.*@gkm.agent.image=$(AGENT_IMG)@' \ | |
| -e 's@gkm\.extract\.image=.*@gkm.extract.image=$(EXTRACT_IMG)@' \ | |
| kustomization.yaml.env > kustomization.yaml | |
| else ifeq ($(NO_GPU),true) | |
| cd config/configMap && \ | |
| $(SED) \ | |
| -e '/literals:/a\ - gkm.nogpu=true' \ | |
| -e '/literals:/a\ - gkm.kindcluster=false' \ | |
| -e 's@gkm\.agent\.image=.*@gkm.agent.image=$(AGENT_IMG)@' \ | |
| -e 's@gkm\.extract\.image=.*@gkm.extract.image=$(EXTRACT_IMG)@' \ | |
| kustomization.yaml.env > kustomization.yaml |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` around lines 299 - 314, The Makefile uses ifdef for KIND_CLUSTER
and NO_GPU which treats any non-empty value as true; replace those conditionals
with explicit value checks (ifeq ($(KIND_CLUSTER),true) and ifeq
($(NO_GPU),true)) and adjust the sed invocations so you always generate
kustomization.yaml but insert the correct literal values based on the true/false
check (set gkm.kindcluster=true when KIND_CLUSTER == true else
gkm.kindcluster=false; set gkm.nogpu=true when NO_GPU == true else
gkm.nogpu=false), keeping the existing AGENT_IMG and EXTRACT_IMG substitutions
(AGENT_IMG, EXTRACT_IMG, kustomization.yaml.env, kustomization.yaml) intact.
Filippe found a few issues while trying to run GKM and KServe in the same cluster. - Requested that agents print the node they are running on in the logs. - The initContainer that gets added to the Job to extract the cache is only supposed to be there for KIND clusters but is triggering off the noGpu flag. Add a KIND Cluster flag that distinguishes between NoGPU and KIND Clusters. - The StorageClassName in the GKMCache and ClusterGKMCache is required when it should have been optional. Move it to optional. - To help with the Demo, it would be easier to manage if the PVC associated with ReadWriteOnce did not have the hash appended, which was done so each node has a unique PVC name. Add a "HACK" so that if the AccessMode of ReadWriteOncePod, the Operator treats the PVC as if ReadOnlyMany was entered and only creates one PVC for the Cluster, but creates it with AccessMode of ReadWriteOnce. This will not work where pods across multiple nodes want to share a PVC. But will "probably" work on one Node systems. Signed-off-by: Billy McFall <22157057+Billy99@users.noreply.github.com>
Filippe found a few issues while trying to run GKM and KServe in the same cluster.
Summary by CodeRabbit
New Features
Bug Fixes