Skip to content

feat (infra): Added probes for UI and Proxy containers.#112

Merged
KaranJagtiani merged 1 commit intoskyflo-ai:mainfrom
tarunpandey23:feature/add-ui-proxy-probes
Feb 25, 2026
Merged

feat (infra): Added probes for UI and Proxy containers.#112
KaranJagtiani merged 1 commit intoskyflo-ai:mainfrom
tarunpandey23:feature/add-ui-proxy-probes

Conversation

@tarunpandey23
Copy link
Contributor

Description

Please include a summary of the changes and the motivation behind them.

Added readiness/liveness probes for UI and proxy, add /health for proxy

  • Added HTTP readiness and liveness probes to UI (port 3000, path /) and proxy (port 80, path /health) in install.yaml and local.install.yaml
  • Added location /health in nginx.conf so proxy answers health checks directly and avoids 502 from proxying to unready backend

Output after the change:

kubectl get pods -n default
NAME                                    READY   STATUS    RESTARTS       AGE
skyflo-ai-controller-84d4bcb79b-wrpvp   1/1     Running   0              5m35s
skyflo-ai-engine-9dd448c85-f5whw        1/1     Running   1 (2m9s ago)   5m35s
skyflo-ai-mcp-7d86745958-74h2p          1/1     Running   0              5m35s
skyflo-ai-postgres-0                    1/1     Running   0              5m35s
skyflo-ai-redis-0                       1/1     Running   0              5m35s
skyflo-ai-ui-7cffc646cb-4mpq9           2/2     Running   0              5m35s

kubectl exec skyflo-ai-ui-7cffc646cb-4mpq9 -c proxy -- kill 1

kubectl get pods
NAME                                    READY   STATUS     RESTARTS        AGE
skyflo-ai-controller-84d4bcb79b-wrpvp   1/1     Running    0               5m51s
skyflo-ai-engine-9dd448c85-f5whw        1/1     Running    1 (2m25s ago)   5m51s
skyflo-ai-mcp-7d86745958-74h2p          1/1     Running    0               5m51s
skyflo-ai-postgres-0                    1/1     Running    0               5m51s
skyflo-ai-redis-0                       1/1     Running    0               5m51s
skyflo-ai-ui-7cffc646cb-4mpq9           1/2     NotReady   0               5m51s
kubectl get pods
NAME                                    READY   STATUS    RESTARTS        AGE
skyflo-ai-controller-84d4bcb79b-wrpvp   1/1     Running   0               6m12s
skyflo-ai-engine-9dd448c85-f5whw        1/1     Running   1 (2m46s ago)   6m12s
skyflo-ai-mcp-7d86745958-74h2p          1/1     Running   0               6m12s
skyflo-ai-postgres-0                    1/1     Running   0               6m12s
skyflo-ai-redis-0                       1/1     Running   0               6m12s
skyflo-ai-ui-7cffc646cb-4mpq9           2/2     Running   1 (22s ago)     6m12s
kubectl port-forward -n default deployment/skyflo-ai-ui 3000:80
Forwarding from 127.0.0.1:3000 -> 80
Forwarding from [::1]:3000 -> 80
Handling connection for 3000
curl -s http://localhost:3000/api/health

{"ok":true}%
kubectl describe pod -n default -l app=skyflo-ai-ui | grep -A 8 "Liveness:\|Readiness:"
    Liveness:   http-get http://:3000/api/health delay=15s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:3000/api/health delay=10s timeout=5s period=10s #success=1 #failure=3
    Environment Variables from:
      skyflo-ui-config  ConfigMap  Optional: false
    Environment:
      APP_VERSION:  v0.5.0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kdh9f (ro)
  proxy:
    Container ID:   docker://f99ee5195ea62df5b0d167109af804ab87f261cfa3326906345aa9b6abf5c222
--
    Liveness:   http-get http://:80/health delay=15s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:80/health delay=10s timeout=5s period=10s #success=1 #failure=3
    Environment:
      APP_VERSION:  v0.5.0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kdh9f (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True

Related Issue(s)

Fixes #93

Type of Change

  • Feature (new functionality)
  • Bug fix (fixes an issue)
  • Documentation update
  • Code refactor
  • Performance improvement
  • Tests
  • Infrastructure/build changes
  • Other (please describe):

Testing

Please describe the tests you've added/performed to verify your changes.

Checklist

Before Requesting Review

  • I have tested my changes locally
  • My code follows the coding standards
  • I have added/updated necessary documentation
  • I have checked for and resolved any merge conflicts
  • I have linked this PR to relevant issue(s)

Code Quality

  • No debug print statements or console.log calls
  • No package-lock.json (we use yarn only for the UI)
  • No redundant or self-explanatory comments
  • Error handling does not expose internal details to users

Screenshots (if applicable)

Additional Notes

Why we changed deployment/ui/nginx.conf
The proxy’s readiness and liveness probes were using path / on port 80. Nginx serves / by proxying to the UI at http://skyflo-ai-ui:3000. The pod is only added to the Service’s endpoints when it is Ready, and the proxy only becomes Ready when its probe gets a 2xx. So the probe hit the proxy → proxy called the Service → the Service had no Ready pods (including this one) → no backend → 502. That kept the proxy (and the pod) from ever becoming Ready.
We added a location = /health that nginx handles itself (no proxy_pass), returning 200 "OK". Probes now use path /health for the proxy, so they no longer depend on the UI or the Service. That removes the circular dependency and allows the proxy to become Ready and stay healthy.

Got this error when running nginx container without any change in nginx.conf.

  Normal   Scheduled  2m31s                default-scheduler  Successfully assigned default/skyflo-ai-ui-687d8495c9-c8fxx to docker-desktop
  Normal   Pulling    2m31s                kubelet            Pulling image "skyfloaiagent/ui:v0.5.0"
  Normal   Pulled     2m26s                kubelet            Successfully pulled image "skyfloaiagent/ui:v0.5.0" in 2.439s (4.873s including waiting). Image size: 275730359 bytes.
  Normal   Created    2m26s                kubelet            Created container: ui
  Normal   Started    2m26s                kubelet            Started container ui
  Normal   Pulled     2m21s                kubelet            Successfully pulled image "skyfloaiagent/proxy:v0.5.0" in 2.518s (4.926s including waiting). Image size: 49697197 bytes.
  Normal   Pulled     108s                 kubelet            Successfully pulled image "skyfloaiagent/proxy:v0.5.0" in 2.468s (2.468s including waiting). Image size: 49697197 bytes.
  Normal   Pulled     68s                  kubelet            Successfully pulled image "skyfloaiagent/proxy:v0.5.0" in 2.408s (2.408s including waiting). Image size: 49697197 bytes.
  Normal   Pulling    31s (x4 over 2m26s)  kubelet            Pulling image "skyfloaiagent/proxy:v0.5.0"
  Normal   Killing    31s (x3 over 111s)   kubelet            Container proxy failed liveness probe, will be restarted
  Normal   Created    28s (x4 over 2m21s)  kubelet            Created container: proxy
  Normal   Started    28s (x4 over 2m21s)  kubelet            Started container proxy
  Normal   Pulled     28s                  kubelet            Successfully pulled image "skyfloaiagent/proxy:v0.5.0" in 2.702s (2.702s including waiting). Image size: 49697197 bytes.
  Warning  Unhealthy  5s (x14 over 2m8s)   kubelet            Readiness probe failed: HTTP probe failed with statuscode: 502
  Warning  Unhealthy  1s (x11 over 2m11s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 502

Why we added UI /api/health
The UI probes originally used path / on port 3000. That triggers full page rendering and is heavier and less reliable for frequent probes. We added a minimal Next.js API route at /api/health that returns { "ok": true } with no rendering or external calls, so readiness/liveness checks are cheap and stable. Probes were updated to use /api/health instead of /.

Why we need to rebuild and push new images
Proxy: The proxy container is built from deployment/ui/proxy.Dockerfile, which copies deployment/ui/nginx.conf into the image. The new /health behaviour only exists in the cluster after that updated nginx config is inside the image. You must rebuild the proxy image (so it includes the new nginx.conf), push it to your registry with the tag your manifests use (e.g. skyfloaiagent/proxy:v0.5.0), and redeploy (e.g. kubectl rollout restart deployment/skyflo-ai-ui) so pods use the new image. Until then, the running proxy will keep proxying /health to the UI and probes can keep getting 502.
UI: The UI image must include the new /api/health route (e.g. ui/src/app/api/health/route.ts). Rebuild the UI image, push it with the tag used in the manifests (e.g. skyfloaiagent/ui:v0.5.0), and redeploy so probes to /api/health succeed.

@coderabbitai
Copy link

coderabbitai bot commented Feb 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Summary by CodeRabbit

  • Chores
    • Added automated health monitoring and recovery mechanisms across core system components to improve reliability and uptime.

Walkthrough

Adds HTTP readiness and liveness probes to manager, UI, and proxy containers in Kubernetes manifests, adds a /health nginx route, and adds a Next.js /api/health GET route that returns JSON { status: "ok" }. (≤50 words)

Changes

Cohort / File(s) Summary
Kubernetes Deployment Manifests
deployment/install.yaml, deployment/local.install.yaml
Added HTTP readiness and liveness probes across containers: manager (manager and proxy containers in install.yaml), UI, and proxy. Probes use HTTP GET to /api/health (app) or /health (proxy) with ports http / http-proxy, timings: readiness initialDelaySeconds 10, liveness initialDelaySeconds 15, periodSeconds 10, timeoutSeconds 5, failureThreshold 3.
Nginx health endpoint
deployment/ui/nginx.conf
Added location /health returning HTTP 200 body ok as text/plain and access_log off, placed before existing /api/v1/ proxy rules.
Next.js health API
ui/src/app/api/health/route.ts
Added GET handler returning JSON { status: "ok" } and exported dynamic = "force-dynamic" for app-level health check.

Sequence Diagram(s)

sequenceDiagram
  participant Kubelet as Kubelet (probe)
  participant Nginx as Nginx (proxy)
  participant App as Next.js app (/api/health)

  Kubelet->>Nginx: HTTP GET /health
  alt Nginx responds directly
    Nginx-->>Kubelet: 200 "ok"
  else Nginx proxies to app
    Nginx->>App: HTTP GET /api/health
    App-->>Nginx: 200 {"status":"ok"}
    Nginx-->>Kubelet: 200 "ok"
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • KaranJagtiani

Poem

🐇 I hopped to /health in the cool cluster light,
I whispered JSON at /api/health — all right,
Probes chimed in chorus, pods stretched and woke,
A nibble, a hop, and no more smoke! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding readiness and liveness probes to UI and proxy containers in the infrastructure deployment files.
Description check ✅ Passed The description comprehensively explains the changes, motivation, and includes test outputs demonstrating that the implementation works as intended.
Linked Issues check ✅ Passed The PR implementation addresses all coding requirements from issue #93: probes added to UI and proxy containers in both manifests with correct timing parameters, and health endpoints implemented to support the probes.
Out of Scope Changes check ✅ Passed All changes are directly related to the linked issue #93 objectives: deployment manifests updated with probes, nginx.conf modified to support proxy health checks, and a Next.js health API route added for UI probes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot requested a review from KaranJagtiani February 22, 2026 19:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ui/src/app/api/health/route.ts`:
- Around line 1-4: Add an explicit dynamic export and remove the unnecessary
async: in the route handler file update the GET handler by adding export const
dynamic = "force-dynamic" at top-level to prevent static build-time caching, and
change the function signature from export async function GET() to export
function GET() (remove async since there is no await) so the handler is forced
to run dynamically and the signature matches usage.

---

Duplicate comments:
In `@deployment/local.install.yaml`:
- Around line 692-739: Update the review summary to correctly reference the
modified Deployment and avoid duplicate notes: the readiness/liveness probes
shown belong to the skyflo-ai-ui Deployment (the UI container using
/api/health:3000 and the proxy container using /health:80 with imagePullPolicy:
Never), not the skyflo-ai-controller; remove the duplicate comment flag and
ensure the summary explicitly mentions "skyflo-ai-ui" and the proxy container
when describing the probe additions so the review matches the actual changes in
the manifest.

@tarunpandey23 tarunpandey23 force-pushed the feature/add-ui-proxy-probes branch from 5467622 to d05109a Compare February 22, 2026 19:18
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
deployment/install.yaml (1)

646-693: 🧹 Nitpick | 🔵 Trivial

Prefer named ports over numeric literals in probe specs for consistency.

Same pattern as local.install.yaml: use port: http (UI, port 3000) and port: http-proxy (proxy, port 80) to match the convention used by all other HTTP probes in this file (engine → http, mcp → http, manager → health).

♻️ Suggested refactor
        readinessProbe:
          httpGet:
            path: /api/health
-           port: 3000
+           port: http
...
        livenessProbe:
          httpGet:
            path: /api/health
-           port: 3000
+           port: http
...
        readinessProbe:
          httpGet:
            path: /health
-           port: 80
+           port: http-proxy
...
        livenessProbe:
          httpGet:
            path: /health
-           port: 80
+           port: http-proxy
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployment/install.yaml` around lines 646 - 693, Replace numeric port
literals in the readinessProbe and livenessProbe specs with the corresponding
named ports for consistency: for the UI container probes use port: http (the
container exposing port 3000), and for the proxy container (name: proxy) use
port: http-proxy (the container exposing port 80); update the probe blocks that
currently use port: 3000 and port: 80 to reference these names so they match the
convention used by other containers (engine → http, mcp → http, manager →
health).
deployment/local.install.yaml (1)

692-739: 🧹 Nitpick | 🔵 Trivial

Prefer named ports over numeric literals in probe specs for consistency.

The UI probe uses port: 3000 and the proxy probe uses port: 80. Every other HTTP probe in this file references the named port (engineport: http, mcpport: http, managerport: health). Using named ports (http and http-proxy) is more resilient to port-number changes.

♻️ Suggested refactor
        readinessProbe:
          httpGet:
            path: /api/health
-           port: 3000
+           port: http
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /api/health
-           port: 3000
+           port: http
          initialDelaySeconds: 15
...
        readinessProbe:
          httpGet:
            path: /health
-           port: 80
+           port: http-proxy
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health
-           port: 80
+           port: http-proxy
          initialDelaySeconds: 15
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployment/local.install.yaml` around lines 692 - 739, Replace numeric port
literals in the readinessProbe and livenessProbe specs with the named ports used
elsewhere: change the probes that currently use port: 3000 to use the named port
"http" and change the proxy probes that use port: 80 to use the named port
"http-proxy"; update both readinessProbe and livenessProbe entries for the UI
probe (the block with path /api/health) and the proxy container (image
skyfloaiagent/proxy:${VERSION}) so they reference port: http and port:
http-proxy respectively.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@deployment/install.yaml`:
- Around line 646-693: Replace numeric port literals in the readinessProbe and
livenessProbe specs with the corresponding named ports for consistency: for the
UI container probes use port: http (the container exposing port 3000), and for
the proxy container (name: proxy) use port: http-proxy (the container exposing
port 80); update the probe blocks that currently use port: 3000 and port: 80 to
reference these names so they match the convention used by other containers
(engine → http, mcp → http, manager → health).

In `@deployment/local.install.yaml`:
- Around line 692-739: Replace numeric port literals in the readinessProbe and
livenessProbe specs with the named ports used elsewhere: change the probes that
currently use port: 3000 to use the named port "http" and change the proxy
probes that use port: 80 to use the named port "http-proxy"; update both
readinessProbe and livenessProbe entries for the UI probe (the block with path
/api/health) and the proxy container (image skyfloaiagent/proxy:${VERSION}) so
they reference port: http and port: http-proxy respectively.

@tarunpandey23 tarunpandey23 force-pushed the feature/add-ui-proxy-probes branch from d05109a to 4d909a1 Compare February 24, 2026 16:01
@tarunpandey23 tarunpandey23 force-pushed the feature/add-ui-proxy-probes branch 2 times, most recently from 02743b5 to d818f4d Compare February 24, 2026 16:12
Copy link
Member

@KaranJagtiani KaranJagtiani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the commit message to follow the repo's conventional commits format (imperative mood, lowercase): feat (infra): add probes for UI and proxy containers

@tarunpandey23 tarunpandey23 force-pushed the feature/add-ui-proxy-probes branch from eac8b2a to 63459c0 Compare February 25, 2026 12:02
@tarunpandey23
Copy link
Contributor Author

feat (infra): add probes for UI and proxy containers

Done.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deployment/install.yaml`:
- Around line 646-661: Add a startupProbe entry alongside the existing
readinessProbe and livenessProbe to prevent liveness checks from killing the pod
during slow initialization: under the same container spec where readinessProbe
and livenessProbe are defined, add startupProbe with an httpGet to path
/api/health and port http, and configure conservative timings (higher
initialDelaySeconds than readiness/liveness, e.g., significantly larger
initialDelaySeconds, with periodSeconds and timeoutSeconds similar to the
existing probes and an appropriate failureThreshold) so the app/proxy can fully
start before readiness/liveness take effect.

In `@deployment/local.install.yaml`:
- Around line 692-707: Add a startupProbe to the deployments that currently
define readinessProbe and livenessProbe (e.g., the UI and proxy pods which
contain the readinessProbe:/api/health and livenessProbe:/api/health blocks) so
the pod has a gate for cold starts before liveness checks begin; implement a
startupProbe.httpGet to /api/health on the same port with a longer
initialDelaySeconds and periodSeconds (for example significantly larger than
readiness initialDelaySeconds) and an appropriate failureThreshold to allow slow
image startup, ensuring it sits alongside the existing readinessProbe and
livenessProbe entries for the same container definitions.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02743b5 and 63459c0.

📒 Files selected for processing (4)
  • deployment/install.yaml
  • deployment/local.install.yaml
  • deployment/ui/nginx.conf
  • ui/src/app/api/health/route.ts

Comment on lines +646 to +661
readinessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add startupProbe here as well to mirror robust startup behavior in production installs.

This keeps liveness from acting before app/proxy initialization completes under slower startup conditions.

Suggested patch
       - name: ui
         image: skyfloaiagent/ui:${VERSION}
@@
         securityContext:
           allowPrivilegeEscalation: false
           runAsUser: 1002
           runAsGroup: 1002
           capabilities:
             drop:
             - ALL
+        startupProbe:
+          httpGet:
+            path: /api/health
+            port: http
+          periodSeconds: 5
+          timeoutSeconds: 5
+          failureThreshold: 12
         readinessProbe:
           httpGet:
             path: /api/health
             port: http
@@
       - name: proxy
         image: skyfloaiagent/proxy:${VERSION}
@@
         resources:
           limits:
             cpu: 200m
             memory: 256Mi
           requests:
             cpu: 100m
             memory: 128Mi
+        startupProbe:
+          httpGet:
+            path: /health
+            port: http-proxy
+          periodSeconds: 5
+          timeoutSeconds: 5
+          failureThreshold: 12
         readinessProbe:
           httpGet:
             path: /health
             port: http-proxy

Also applies to: 678-693

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployment/install.yaml` around lines 646 - 661, Add a startupProbe entry
alongside the existing readinessProbe and livenessProbe to prevent liveness
checks from killing the pod during slow initialization: under the same container
spec where readinessProbe and livenessProbe are defined, add startupProbe with
an httpGet to path /api/health and port http, and configure conservative timings
(higher initialDelaySeconds than readiness/liveness, e.g., significantly larger
initialDelaySeconds, with periodSeconds and timeoutSeconds similar to the
existing probes and an appropriate failureThreshold) so the app/proxy can fully
start before readiness/liveness take effect.

Comment on lines +692 to +707
readinessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding startupProbe for UI and proxy to improve cold-start resilience.

Current probes are valid, but startupProbe prevents premature liveness failures on slower nodes/images and cleanly gates when liveness/readiness begin.

Suggested patch
       - name: ui
         image: skyfloaiagent/ui:${VERSION}
@@
         securityContext:
           allowPrivilegeEscalation: false
           runAsUser: 1002
           runAsGroup: 1002
           capabilities:
             drop:
             - ALL
+        startupProbe:
+          httpGet:
+            path: /api/health
+            port: http
+          periodSeconds: 5
+          timeoutSeconds: 5
+          failureThreshold: 12
         readinessProbe:
           httpGet:
             path: /api/health
             port: http
@@
       - name: proxy
         image: skyfloaiagent/proxy:${VERSION}
@@
         resources:
           limits:
             cpu: 200m
             memory: 256Mi
           requests:
             cpu: 100m
             memory: 128Mi
+        startupProbe:
+          httpGet:
+            path: /health
+            port: http-proxy
+          periodSeconds: 5
+          timeoutSeconds: 5
+          failureThreshold: 12
         readinessProbe:
           httpGet:
             path: /health
             port: http-proxy

Also applies to: 724-739

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployment/local.install.yaml` around lines 692 - 707, Add a startupProbe to
the deployments that currently define readinessProbe and livenessProbe (e.g.,
the UI and proxy pods which contain the readinessProbe:/api/health and
livenessProbe:/api/health blocks) so the pod has a gate for cold starts before
liveness checks begin; implement a startupProbe.httpGet to /api/health on the
same port with a longer initialDelaySeconds and periodSeconds (for example
significantly larger than readiness initialDelaySeconds) and an appropriate
failureThreshold to allow slow image startup, ensuring it sits alongside the
existing readinessProbe and livenessProbe entries for the same container
definitions.

Copy link
Member

@KaranJagtiani KaranJagtiani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go.

@KaranJagtiani KaranJagtiani merged commit e5a1838 into skyflo-ai:main Feb 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add probes for UI and Proxy containers

2 participants