Skip to content

Conversation

@dittops
Copy link
Member

@dittops dittops commented Jan 25, 2026

Summary

  • Add ScaleToZeroConfig to CRD with enabled, activationScale, and gracePeriod fields
  • Add ZeroDemandSince status field for grace period tracking
  • Update algorithms (BudScaler, KPA) to support scaling to 0 replicas
  • Handle zero-pod scenarios in metrics collector (external sources work without pods)
  • Track zero demand state with configurable grace period before scaling to zero

Usage Example

apiVersion: scaler.bud.studio/v1alpha1
kind: BudAIScaler
metadata:
  name: llm-scaler
spec:
  scaleTargetRef:
    kind: Deployment
    name: llm-inference
  minReplicas: 0
  maxReplicas: 10

  scaleToZeroConfig:
    enabled: true
    activationScale: 2      # Scale to 2 replicas when waking
    gracePeriod: 5m         # Wait 5 min at zero demand before scaling down

  metricsSources:
    - metricSourceType: prometheus
      targetMetric: request_queue_depth
      targetValue: "10"
      endpoint: "http://prometheus:9090"
      promQL: 'sum(pending_requests{service="llm-inference"})'

Test plan

  • Unit tests pass (go test ./...)
  • CRD generation succeeds (make generate && make manifests)
  • Build succeeds (go build ./...)
  • Deploy with scale-to-zero config and verify grace period behavior
  • Verify scale from 0 uses activation scale
  • Verify external metrics can trigger scale-up from 0

🤖 Generated with Claude Code

Add ability to scale workloads down to 0 replicas when there is no demand,
and scale back up when demand is detected via external metrics.

Changes:
- Add ScaleToZeroConfig to CRD with enabled, activationScale, and gracePeriod fields
- Add ZeroDemandSince status field for grace period tracking
- Extend ScalingContext and ScalingContextProvider interfaces
- Update metrics collector to handle zero-pod scenarios (external sources work without pods)
- Update BudScaler and KPA algorithms to respect scale-to-zero config
- Add grace period logic to prevent premature scale-to-zero
- Add activation scale logic when waking from zero
- Track zero demand state in controller

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@gemini-code-assist
Copy link

Summary of Changes

Hello @dittops, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the BudAIScaler by introducing comprehensive support for scaling services down to zero replicas. It extends the Custom Resource Definition (CRD) to provide configurable options for this behavior, such as a grace period before scaling down and an activation scale for efficient wake-up. The core scaling algorithms have been adapted to respect these new configurations, ensuring that services can dynamically adjust their replica count, including complete shutdown during periods of inactivity, and then scale back up effectively when demand reappears. The changes also refine metric collection to handle zero-pod states, making the autoscaler more robust and resource-efficient.

Highlights

  • CRD Extension for Scale-to-Zero: The BudAIScalerSpec now includes a ScaleToZeroConfig field, allowing users to enable scale-to-zero, define an ActivationScale for waking up, and set a GracePeriod before scaling down to zero. The BudAIScalerStatus also gains a ZeroDemandSince field to track the start of a zero-demand state.
  • Algorithm Updates for Zero Replicas: Both the BudScaler and KPA scaling algorithms have been updated to incorporate the new scale-to-zero logic. This includes respecting the configured grace period before scaling to zero and applying the ActivationScale when scaling up from zero replicas.
  • Metric Collector Enhancements: The metric collector has been improved to gracefully handle scenarios where no pods are running. For external metric sources, collection will proceed as usual, while for pod-based sources, an empty snapshot is returned, indicating zero demand without erroring out.
  • Scaling Context and Annotation Support: The internal ScalingContext has been extended to manage the new scale-to-zero parameters, including the grace period. A new annotation, ScaleToZeroGraceAnnotation, has also been introduced to allow configuration via annotations, though CRD settings take precedence.
  • Controller Logic for Status Tracking: The BudAIScalerReconciler now actively tracks the ZeroDemandSince timestamp in the scaler's status. This timestamp is set when zero demand is first detected and cleared when demand returns, facilitating the grace period calculation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive scale-to-zero functionality for the BudAIScaler, allowing workloads to scale down to zero replicas and efficiently scale back up when demand returns. The changes include adding ScaleToZeroConfig to the CRD, tracking ZeroDemandSince in the status, and updating both BudScaler and KPA algorithms to incorporate grace periods and activation scales. The metrics collector has also been enhanced to handle zero-pod scenarios gracefully. The implementation is consistent across the codebase and includes appropriate CRD definitions, deepcopy logic, and context propagation. The test cases for scaling policies have been updated to reflect the new scale-to-zero capability, which is a good practice.

activationScale := sctx.GetActivationScale()
if activationScale > rec.DesiredReplicas {
rec.DesiredReplicas = activationScale
rec.Reason += fmt.Sprintf(" (activation scale: %d)", activationScale)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The reason string for activation scale is appended to the existing reason. While functional, consider if it would be clearer to construct a more descriptive reason that fully explains the decision, rather than just appending. For example, if the original reason was "Zero demand detected, starting grace period", appending " (activation scale: 2)" might be slightly less clear than a new reason like "Scaling from zero with activation scale: 2". This is a minor readability suggestion.

Suggested change
rec.Reason += fmt.Sprintf(" (activation scale: %d)", activationScale)
rec.Reason = fmt.Sprintf("Scaling from zero with activation scale: %d (original reason: %s)", activationScale, rec.Reason)

Comment on lines 152 to 153
rec.DesiredReplicas = activationScale
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the BudScaler algorithm, consider adding a more explicit reason for applying the activation scale. While the current implementation correctly sets the desired replicas, a clearer reason would improve observability and debugging for users trying to understand why their workload scaled to a specific number from zero.

Suggested change
rec.DesiredReplicas = activationScale
}
rec.Reason = fmt.Sprintf("Scaling from zero with activation scale: %d", activationScale)

Comment on lines 303 to 312
if result.Recommendation != nil {
if result.Recommendation.DesiredReplicas == 0 && result.CurrentReplicas > 0 {
// Entering zero demand state
if scaler.Status.ZeroDemandSince == nil {
scaler.Status.ZeroDemandSince = &now
}
} else if result.Recommendation.DesiredReplicas > 0 {
// Exiting zero demand state
scaler.Status.ZeroDemandSince = nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for updating ZeroDemandSince is correct. However, now is defined once at the beginning of the reconcileCustomScaler function. If there's a significant delay between the start of reconciliation and this specific status update, now might not accurately reflect the exact moment the zero-demand state was detected. For higher precision, consider moving the now := metav1.Now() assignment directly before scaler.Status.ZeroDemandSince = &now when setting the timestamp.

Suggested change
if result.Recommendation != nil {
if result.Recommendation.DesiredReplicas == 0 && result.CurrentReplicas > 0 {
// Entering zero demand state
if scaler.Status.ZeroDemandSince == nil {
scaler.Status.ZeroDemandSince = &now
}
} else if result.Recommendation.DesiredReplicas > 0 {
// Exiting zero demand state
scaler.Status.ZeroDemandSince = nil
}
if result.Recommendation != nil {
if result.Recommendation.DesiredReplicas == 0 && result.CurrentReplicas > 0 {
// Entering zero demand state
if scaler.Status.ZeroDemandSince == nil {
currentTimestamp := metav1.Now()
scaler.Status.ZeroDemandSince = &currentTimestamp
}
} else if result.Recommendation.DesiredReplicas > 0 {
// Exiting zero demand state
scaler.Status.ZeroDemandSince = nil
}
}

- Improve activation scale reason strings in budscaler and kpa algorithms
- Use fresh timestamp for ZeroDemandSince instead of reusing `now` variable

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@dittops dittops merged commit c6f376d into main Jan 25, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant