-
Notifications
You must be signed in to change notification settings - Fork 709
fix: Increase the failure threshold for k8s dsr1 trtllm wideep deploy.yaml #4557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
nit: mentioned in the PR overview about https://nvbugspro.nvidia.com/bug/5685145? |
WalkthroughA configuration update that increases the failureThreshold values from 500 to 600 in startup probe blocks for both prefill and decode containers within a Kubernetes deployment YAML file. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The CI seems to be blocked on unrelated CI failure: #4561 |
….yaml (#4557) Signed-off-by: Dan Gil <[email protected]>
Overview:
QA found that 500 might not be sufficient iterations. They suggest adding 100 more iterations for better stability and model loading.
Check https://nvbugspro.nvidia.com/bug/5685145 for more details.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.