-
Notifications
You must be signed in to change notification settings - Fork 82
fix: wait for DataScienceCluster CRD before applying DSC #1187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -374,6 +374,22 @@ jobs: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| oc wait csv "$CSV" -n redhat-ods-operator \ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| --for=jsonpath='{.status.phase}'=Succeeded --timeout=600s | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| - name: Wait for DataScienceCluster CRD to be available | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| run: | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo "Waiting for DataScienceCluster CRD to be registered..." | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| for i in $(seq 1 60); do | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if oc get crd datascienceclusters.datasciencecluster.opendatahub.io &>/dev/null; then | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo "DataScienceCluster CRD is available" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| break | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if [ "$i" -eq 60 ]; then | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo "::error::DataScienceCluster CRD did not become available within timeout" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| exit 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| echo "Attempt $i/60 - CRD not yet available, waiting 10s..." | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| sleep 10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| done | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+377
to
+392
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: # First, locate and examine the workflow file
find . -name "prod-release-deploy.yaml" -type fRepository: ambient-code/platform Length of output: 109 🏁 Script executed: # Once found, let's check the relevant lines
cat -n .github/workflows/prod-release-deploy.yaml | sed -n '370,410p'Repository: ambient-code/platform Length of output: 2089 🏁 Script executed: # Also search for references to dsci.yaml and datasciencecluster.yaml to understand what they do
rg "dsci\.yaml|datasciencecluster\.yaml" --type yaml -B 2 -A 2Repository: ambient-code/platform Length of output: 477 🏁 Script executed: # Check if there are other workflow files with similar patterns
find .github/workflows -name "*.yaml" -o -name "*.yml" | head -20Repository: ambient-code/platform Length of output: 818 🏁 Script executed: cat -n components/manifests/components/openshift-ai/dsci.yamlRepository: ambient-code/platform Length of output: 414 🏁 Script executed: # Also check if dsci.yaml is a template or has any conditionals
file components/manifests/components/openshift-ai/dsci.yaml
wc -l components/manifests/components/openshift-ai/dsci.yamlRepository: ambient-code/platform Length of output: 182 🏁 Script executed: # Search for any DSCInitialization references in the codebase
rg "DSCInitialization|dscinitialization" --type yaml -iRepository: ambient-code/platform Length of output: 246 Wait for both DSCInitialization and DataScienceCluster CRDs before applying manifests. Line 395 applies Suggested patch- - name: Wait for DataScienceCluster CRD to be available
+ - name: Wait for required OpenDataHub CRDs to be available
run: |
- echo "Waiting for DataScienceCluster CRD to be registered..."
- for i in $(seq 1 60); do
- if oc get crd datascienceclusters.datasciencecluster.opendatahub.io &>/dev/null; then
- echo "DataScienceCluster CRD is available"
- break
- fi
- if [ "$i" -eq 60 ]; then
- echo "::error::DataScienceCluster CRD did not become available within timeout"
- exit 1
- fi
- echo "Attempt $i/60 - CRD not yet available, waiting 10s..."
- sleep 10
- done
+ for crd in \
+ dscinitializations.dscinitialization.opendatahub.io \
+ datascienceclusters.datasciencecluster.opendatahub.io; do
+ echo "Waiting for ${crd} CRD to be registered..."
+ for i in $(seq 1 60); do
+ if oc get crd "$crd" &>/dev/null; then
+ echo "${crd} CRD is available"
+ break
+ fi
+ if [ "$i" -eq 60 ]; then
+ echo "::error::${crd} CRD did not become available within timeout"
+ exit 1
+ fi
+ echo "Attempt $i/60 - ${crd} CRD not yet available, waiting 10s..."
+ sleep 10
+ done
+ done📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| - name: Apply DSCInitialization and DataScienceCluster | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| run: | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| oc apply -f components/manifests/components/openshift-ai/dsci.yaml | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -395,6 +411,14 @@ jobs: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| sleep 10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| done | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| - name: Ensure mlflow database exists in PostgreSQL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| run: | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| oc exec -n ambient-code deploy/postgresql -- \ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| psql -U postgres -tAc \ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1 \ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| || oc exec -n ambient-code deploy/postgresql -- \ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| psql -U postgres -c "CREATE DATABASE mlflow" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+414
to
+421
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: # First, locate and read the specific file and lines
head -n 425 .github/workflows/prod-release-deploy.yaml | tail -n 20Repository: ambient-code/platform Length of output: 961 🏁 Script executed: # Also verify the file exists and get context around those lines
wc -l .github/workflows/prod-release-deploy.yamlRepository: ambient-code/platform Length of output: 111 🏁 Script executed: # Let's check the exact content at lines 414-421
sed -n '410,425p' .github/workflows/prod-release-deploy.yamlRepository: ambient-code/platform Length of output: 807 🏁 Script executed: # Check if there are other similar database creation patterns in workflows
rg "SELECT.*pg_database" .github/workflows/ -A 3 -B 1Repository: ambient-code/platform Length of output: 978 🏁 Script executed: # Check if there are any ACTION versions that need pinning in this file
rg "uses:" .github/workflows/prod-release-deploy.yamlRepository: ambient-code/platform Length of output: 665 Harden The current logic has a TOCTOU race: if another process creates the database between the Suggested patch - name: Ensure mlflow database exists in PostgreSQL
run: |
- oc exec -n ambient-code deploy/postgresql -- \
- psql -U postgres -tAc \
- "SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1 \
- || oc exec -n ambient-code deploy/postgresql -- \
- psql -U postgres -c "CREATE DATABASE mlflow"
+ set -euo pipefail
+ if ! oc exec -n ambient-code deploy/postgresql -- \
+ psql -U postgres -d postgres -tAc \
+ "SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1; then
+ oc exec -n ambient-code deploy/postgresql -- \
+ psql -U postgres -d postgres -v ON_ERROR_STOP=1 -c "CREATE DATABASE mlflow" \
+ || oc exec -n ambient-code deploy/postgresql -- \
+ psql -U postgres -d postgres -tAc \
+ "SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1
+ fiSame pattern exists in 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| - name: Verify mlflow-db-credentials secret exists | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| run: | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if ! oc get secret mlflow-db-credentials -n redhat-ods-applications &>/dev/null; then | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,7 +3,7 @@ kind: MLflow | |
| metadata: | ||
| name: mlflow | ||
| spec: | ||
| replicas: 2 | ||
| replicas: 1 | ||
|
|
||
| resources: | ||
| requests: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: ambient-code/platform
Length of output: 112
🏁 Script executed:
Repository: ambient-code/platform
Length of output: 1176
🏁 Script executed:
Repository: ambient-code/platform
Length of output: 810
🏁 Script executed:
Repository: ambient-code/platform
Length of output: 412
Make DB creation idempotent to avoid race-condition failures.
The check-then-create pattern fails when another concurrent run creates the database between the check and create steps. The
CREATE DATABASEcommand will error with "already exists", causing the step to fail even though the desired end state (database exists) is correct. The suggested patch adds explicit error handling and a verification fallback to ensure idempotence.🤖 Prompt for AI Agents