Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/workflows/components-build-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,22 @@ jobs:
oc wait csv "$CSV" -n redhat-ods-operator \
--for=jsonpath='{.status.phase}'=Succeeded --timeout=600s

- name: Wait for DataScienceCluster CRD to be available
run: |
echo "Waiting for DataScienceCluster CRD to be registered..."
for i in $(seq 1 60); do
if oc get crd datascienceclusters.datasciencecluster.opendatahub.io &>/dev/null; then
echo "DataScienceCluster CRD is available"
break
fi
if [ "$i" -eq 60 ]; then
echo "::error::DataScienceCluster CRD did not become available within timeout"
exit 1
fi
echo "Attempt $i/60 - CRD not yet available, waiting 10s..."
sleep 10
done

- name: Apply DSCInitialization and DataScienceCluster
run: |
oc apply -f components/manifests/components/openshift-ai/dsci.yaml
Expand All @@ -256,6 +272,14 @@ jobs:
sleep 10
done

- name: Ensure mlflow database exists in PostgreSQL
run: |
oc exec -n ambient-code deploy/postgresql -- \
psql -U postgres -tAc \
"SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1 \
|| oc exec -n ambient-code deploy/postgresql -- \
psql -U postgres -c "CREATE DATABASE mlflow"

Comment on lines +275 to +282
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the workflow file
find . -path "*/.github/workflows/components-build-deploy.yml" -type f

Repository: ambient-code/platform

Length of output: 112


🏁 Script executed:

# Read the workflow file, particularly around lines 275-282
if [ -f ".github/workflows/components-build-deploy.yml" ]; then
  sed -n '270,290p' ".github/workflows/components-build-deploy.yml" | cat -n
else
  echo "File not found"
fi

Repository: ambient-code/platform

Length of output: 1176


🏁 Script executed:

# Get broader context - check how many steps/jobs exist and what the workflow does
if [ -f ".github/workflows/components-build-deploy.yml" ]; then
  wc -l ".github/workflows/components-build-deploy.yml"
  # Check for action version pinning in the file
  grep -n "uses:" ".github/workflows/components-build-deploy.yml" | head -20
fi

Repository: ambient-code/platform

Length of output: 810


🏁 Script executed:

# Check if there are any secrets referenced in the step or nearby context
if [ -f ".github/workflows/components-build-deploy.yml" ]; then
  sed -n '260,295p' ".github/workflows/components-build-deploy.yml" | grep -i "secret\|credential\|password\|token" || echo "No secrets found in context"
fi

Repository: ambient-code/platform

Length of output: 412


Make DB creation idempotent to avoid race-condition failures.

The check-then-create pattern fails when another concurrent run creates the database between the check and create steps. The CREATE DATABASE command will error with "already exists", causing the step to fail even though the desired end state (database exists) is correct. The suggested patch adds explicit error handling and a verification fallback to ensure idempotence.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/components-build-deploy.yml around lines 275 - 282,
Replace the current check-then-create sequence with an idempotent create that
tolerates a concurrent creator: run oc exec ... psql -U postgres -c "CREATE
DATABASE mlflow" and if that command fails, run the SELECT "SELECT 1 FROM
pg_database WHERE datname = 'mlflow'" (the same query currently used) and treat
the step as successful if the SELECT returns 1, otherwise fail. Update the
workflow step named "Ensure mlflow database exists in PostgreSQL" to implement
this retry/fallback logic so the CREATE failure due to "already exists" is
handled gracefully.

- name: Verify mlflow-db-credentials secret exists
run: |
if ! oc get secret mlflow-db-credentials -n redhat-ods-applications &>/dev/null; then
Expand Down
24 changes: 24 additions & 0 deletions .github/workflows/prod-release-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -374,6 +374,22 @@ jobs:
oc wait csv "$CSV" -n redhat-ods-operator \
--for=jsonpath='{.status.phase}'=Succeeded --timeout=600s

- name: Wait for DataScienceCluster CRD to be available
run: |
echo "Waiting for DataScienceCluster CRD to be registered..."
for i in $(seq 1 60); do
if oc get crd datascienceclusters.datasciencecluster.opendatahub.io &>/dev/null; then
echo "DataScienceCluster CRD is available"
break
fi
if [ "$i" -eq 60 ]; then
echo "::error::DataScienceCluster CRD did not become available within timeout"
exit 1
fi
echo "Attempt $i/60 - CRD not yet available, waiting 10s..."
sleep 10
done

Comment on lines +377 to +392
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the workflow file
find . -name "prod-release-deploy.yaml" -type f

Repository: ambient-code/platform

Length of output: 109


🏁 Script executed:

# Once found, let's check the relevant lines
cat -n .github/workflows/prod-release-deploy.yaml | sed -n '370,410p'

Repository: ambient-code/platform

Length of output: 2089


🏁 Script executed:

# Also search for references to dsci.yaml and datasciencecluster.yaml to understand what they do
rg "dsci\.yaml|datasciencecluster\.yaml" --type yaml -B 2 -A 2

Repository: ambient-code/platform

Length of output: 477


🏁 Script executed:

# Check if there are other workflow files with similar patterns
find .github/workflows -name "*.yaml" -o -name "*.yml" | head -20

Repository: ambient-code/platform

Length of output: 818


🏁 Script executed:

cat -n components/manifests/components/openshift-ai/dsci.yaml

Repository: ambient-code/platform

Length of output: 414


🏁 Script executed:

# Also check if dsci.yaml is a template or has any conditionals
file components/manifests/components/openshift-ai/dsci.yaml
wc -l components/manifests/components/openshift-ai/dsci.yaml

Repository: ambient-code/platform

Length of output: 182


🏁 Script executed:

# Search for any DSCInitialization references in the codebase
rg "DSCInitialization|dscinitialization" --type yaml -i

Repository: ambient-code/platform

Length of output: 246


Wait for both DSCInitialization and DataScienceCluster CRDs before applying manifests.

Line 395 applies dsci.yaml, which requires the dscinitializations.dscinitialization.opendatahub.io CRD. The current wait (lines 377-392) only checks datascienceclusters.datasciencecluster.opendatahub.io, leaving a race condition where dsci.yaml could fail if its CRD hasn't registered yet.

Suggested patch
-      - name: Wait for DataScienceCluster CRD to be available
+      - name: Wait for required OpenDataHub CRDs to be available
         run: |
-          echo "Waiting for DataScienceCluster CRD to be registered..."
-          for i in $(seq 1 60); do
-            if oc get crd datascienceclusters.datasciencecluster.opendatahub.io &>/dev/null; then
-              echo "DataScienceCluster CRD is available"
-              break
-            fi
-            if [ "$i" -eq 60 ]; then
-              echo "::error::DataScienceCluster CRD did not become available within timeout"
-              exit 1
-            fi
-            echo "Attempt $i/60 - CRD not yet available, waiting 10s..."
-            sleep 10
-          done
+          for crd in \
+            dscinitializations.dscinitialization.opendatahub.io \
+            datascienceclusters.datasciencecluster.opendatahub.io; do
+            echo "Waiting for ${crd} CRD to be registered..."
+            for i in $(seq 1 60); do
+              if oc get crd "$crd" &>/dev/null; then
+                echo "${crd} CRD is available"
+                break
+              fi
+              if [ "$i" -eq 60 ]; then
+                echo "::error::${crd} CRD did not become available within timeout"
+                exit 1
+              fi
+              echo "Attempt $i/60 - ${crd} CRD not yet available, waiting 10s..."
+              sleep 10
+            done
+          done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Wait for DataScienceCluster CRD to be available
run: |
echo "Waiting for DataScienceCluster CRD to be registered..."
for i in $(seq 1 60); do
if oc get crd datascienceclusters.datasciencecluster.opendatahub.io &>/dev/null; then
echo "DataScienceCluster CRD is available"
break
fi
if [ "$i" -eq 60 ]; then
echo "::error::DataScienceCluster CRD did not become available within timeout"
exit 1
fi
echo "Attempt $i/60 - CRD not yet available, waiting 10s..."
sleep 10
done
- name: Wait for required OpenDataHub CRDs to be available
run: |
for crd in \
dscinitializations.dscinitialization.opendatahub.io \
datascienceclusters.datasciencecluster.opendatahub.io; do
echo "Waiting for ${crd} CRD to be registered..."
for i in $(seq 1 60); do
if oc get crd "$crd" &>/dev/null; then
echo "${crd} CRD is available"
break
fi
if [ "$i" -eq 60 ]; then
echo "::error::${crd} CRD did not become available within timeout"
exit 1
fi
echo "Attempt $i/60 - ${crd} CRD not yet available, waiting 10s..."
sleep 10
done
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/prod-release-deploy.yaml around lines 377 - 392, Update
the CRD wait step so it waits for both CRDs before proceeding: instead of only
checking datascienceclusters.datasciencecluster.opendatahub.io in the step
currently named "Wait for DataScienceCluster CRD to be available", extend the
loop to verify both datascienceclusters.datasciencecluster.opendatahub.io and
dscinitializations.dscinitialization.opendatahub.io are registered (break only
when both succeed), update the status messages to reflect which CRD(s) are still
pending, and keep the same timeout/exit-on-failure behavior to avoid applying
dsci.yaml before its CRD exists.

- name: Apply DSCInitialization and DataScienceCluster
run: |
oc apply -f components/manifests/components/openshift-ai/dsci.yaml
Expand All @@ -395,6 +411,14 @@ jobs:
sleep 10
done

- name: Ensure mlflow database exists in PostgreSQL
run: |
oc exec -n ambient-code deploy/postgresql -- \
psql -U postgres -tAc \
"SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1 \
|| oc exec -n ambient-code deploy/postgresql -- \
psql -U postgres -c "CREATE DATABASE mlflow"

Comment on lines +414 to +421
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the specific file and lines
head -n 425 .github/workflows/prod-release-deploy.yaml | tail -n 20

Repository: ambient-code/platform

Length of output: 961


🏁 Script executed:

# Also verify the file exists and get context around those lines
wc -l .github/workflows/prod-release-deploy.yaml

Repository: ambient-code/platform

Length of output: 111


🏁 Script executed:

# Let's check the exact content at lines 414-421
sed -n '410,425p' .github/workflows/prod-release-deploy.yaml

Repository: ambient-code/platform

Length of output: 807


🏁 Script executed:

# Check if there are other similar database creation patterns in workflows
rg "SELECT.*pg_database" .github/workflows/ -A 3 -B 1

Repository: ambient-code/platform

Length of output: 978


🏁 Script executed:

# Check if there are any ACTION versions that need pinning in this file
rg "uses:" .github/workflows/prod-release-deploy.yaml

Repository: ambient-code/platform

Length of output: 665


Harden mlflow DB creation against check-then-create races.

The current logic has a TOCTOU race: if another process creates the database between the SELECT check and CREATE statement, the latter fails, causing unnecessary release failures despite correct final state.

Suggested patch
       - name: Ensure mlflow database exists in PostgreSQL
         run: |
-          oc exec -n ambient-code deploy/postgresql -- \
-            psql -U postgres -tAc \
-            "SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1 \
-          || oc exec -n ambient-code deploy/postgresql -- \
-            psql -U postgres -c "CREATE DATABASE mlflow"
+          set -euo pipefail
+          if ! oc exec -n ambient-code deploy/postgresql -- \
+            psql -U postgres -d postgres -tAc \
+            "SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1; then
+            oc exec -n ambient-code deploy/postgresql -- \
+              psql -U postgres -d postgres -v ON_ERROR_STOP=1 -c "CREATE DATABASE mlflow" \
+            || oc exec -n ambient-code deploy/postgresql -- \
+              psql -U postgres -d postgres -tAc \
+              "SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1
+          fi

Same pattern exists in components-build-deploy.yml.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Ensure mlflow database exists in PostgreSQL
run: |
oc exec -n ambient-code deploy/postgresql -- \
psql -U postgres -tAc \
"SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1 \
|| oc exec -n ambient-code deploy/postgresql -- \
psql -U postgres -c "CREATE DATABASE mlflow"
- name: Ensure mlflow database exists in PostgreSQL
run: |
set -euo pipefail
if ! oc exec -n ambient-code deploy/postgresql -- \
psql -U postgres -d postgres -tAc \
"SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1; then
oc exec -n ambient-code deploy/postgresql -- \
psql -U postgres -d postgres -v ON_ERROR_STOP=1 -c "CREATE DATABASE mlflow" \
|| oc exec -n ambient-code deploy/postgresql -- \
psql -U postgres -d postgres -tAc \
"SELECT 1 FROM pg_database WHERE datname = 'mlflow'" | grep -q 1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/prod-release-deploy.yaml around lines 414 - 421, Replace
the TOCTOU check-then-create pattern in the "Ensure mlflow database exists in
PostgreSQL" step (the oc exec ... psql invocation) with a single idempotent psql
statement that attempts to create the database and swallows the
duplicate-database error (use a PL/pgSQL DO block that runs CREATE DATABASE
mlflow and catches duplicate_database to do nothing); apply the same replacement
to the corresponding step in components-build-deploy.yml so both workflows use
the exception-safe create approach instead of separate SELECT and CREATE
commands.

- name: Verify mlflow-db-credentials secret exists
run: |
if ! oc get secret mlflow-db-credentials -n redhat-ods-applications &>/dev/null; then
Expand Down
2 changes: 1 addition & 1 deletion components/manifests/components/openshift-ai/mlflow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ kind: MLflow
metadata:
name: mlflow
spec:
replicas: 2
replicas: 1

resources:
requests:
Expand Down
Loading