feat: Implement scale-to-zero support for BudAIScaler #2

dittops · 2026-01-25T16:56:30Z

Summary

Add ScaleToZeroConfig to CRD with enabled, activationScale, and gracePeriod fields
Add ZeroDemandSince status field for grace period tracking
Update algorithms (BudScaler, KPA) to support scaling to 0 replicas
Handle zero-pod scenarios in metrics collector (external sources work without pods)
Track zero demand state with configurable grace period before scaling to zero

Usage Example

apiVersion: scaler.bud.studio/v1alpha1
kind: BudAIScaler
metadata:
  name: llm-scaler
spec:
  scaleTargetRef:
    kind: Deployment
    name: llm-inference
  minReplicas: 0
  maxReplicas: 10

  scaleToZeroConfig:
    enabled: true
    activationScale: 2      # Scale to 2 replicas when waking
    gracePeriod: 5m         # Wait 5 min at zero demand before scaling down

  metricsSources:
    - metricSourceType: prometheus
      targetMetric: request_queue_depth
      targetValue: "10"
      endpoint: "http://prometheus:9090"
      promQL: 'sum(pending_requests{service="llm-inference"})'

Test plan

Unit tests pass (go test ./...)
CRD generation succeeds (make generate && make manifests)
Build succeeds (go build ./...)
Deploy with scale-to-zero config and verify grace period behavior
Verify scale from 0 uses activation scale
Verify external metrics can trigger scale-up from 0

🤖 Generated with Claude Code

Add ability to scale workloads down to 0 replicas when there is no demand, and scale back up when demand is detected via external metrics. Changes: - Add ScaleToZeroConfig to CRD with enabled, activationScale, and gracePeriod fields - Add ZeroDemandSince status field for grace period tracking - Extend ScalingContext and ScalingContextProvider interfaces - Update metrics collector to handle zero-pod scenarios (external sources work without pods) - Update BudScaler and KPA algorithms to respect scale-to-zero config - Add grace period logic to prevent premature scale-to-zero - Add activation scale logic when waking from zero - Track zero demand state in controller Co-Authored-By: Claude Opus 4.5 <[email protected]>

gemini-code-assist · 2026-01-25T16:56:52Z

Summary of Changes

Hello @dittops, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the BudAIScaler by introducing comprehensive support for scaling services down to zero replicas. It extends the Custom Resource Definition (CRD) to provide configurable options for this behavior, such as a grace period before scaling down and an activation scale for efficient wake-up. The core scaling algorithms have been adapted to respect these new configurations, ensuring that services can dynamically adjust their replica count, including complete shutdown during periods of inactivity, and then scale back up effectively when demand reappears. The changes also refine metric collection to handle zero-pod states, making the autoscaler more robust and resource-efficient.

Highlights

CRD Extension for Scale-to-Zero: The BudAIScalerSpec now includes a ScaleToZeroConfig field, allowing users to enable scale-to-zero, define an ActivationScale for waking up, and set a GracePeriod before scaling down to zero. The BudAIScalerStatus also gains a ZeroDemandSince field to track the start of a zero-demand state.
Algorithm Updates for Zero Replicas: Both the BudScaler and KPA scaling algorithms have been updated to incorporate the new scale-to-zero logic. This includes respecting the configured grace period before scaling to zero and applying the ActivationScale when scaling up from zero replicas.
Metric Collector Enhancements: The metric collector has been improved to gracefully handle scenarios where no pods are running. For external metric sources, collection will proceed as usual, while for pod-based sources, an empty snapshot is returned, indicating zero demand without erroring out.
Scaling Context and Annotation Support: The internal ScalingContext has been extended to manage the new scale-to-zero parameters, including the grace period. A new annotation, ScaleToZeroGraceAnnotation, has also been introduced to allow configuration via annotations, though CRD settings take precedence.
Controller Logic for Status Tracking: The BudAIScalerReconciler now actively tracks the ZeroDemandSince timestamp in the scaler's status. This timestamp is set when zero demand is first detected and cleared when demand returns, facilitating the grace period calculation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces comprehensive scale-to-zero functionality for the BudAIScaler, allowing workloads to scale down to zero replicas and efficiently scale back up when demand returns. The changes include adding ScaleToZeroConfig to the CRD, tracking ZeroDemandSince in the status, and updating both BudScaler and KPA algorithms to incorporate grace periods and activation scales. The metrics collector has also been enhanced to handle zero-pod scenarios gracefully. The implementation is consistent across the codebase and includes appropriate CRD definitions, deepcopy logic, and context propagation. The test cases for scaling policies have been updated to reflect the new scale-to-zero capability, which is a good practice.

gemini-code-assist · 2026-01-25T17:00:29Z

pkg/controller/budaiscaler/algorithm/budscaler.go

+		activationScale := sctx.GetActivationScale()
+		if activationScale > rec.DesiredReplicas {
+			rec.DesiredReplicas = activationScale
+			rec.Reason += fmt.Sprintf(" (activation scale: %d)", activationScale)


The reason string for activation scale is appended to the existing reason. While functional, consider if it would be clearer to construct a more descriptive reason that fully explains the decision, rather than just appending. For example, if the original reason was "Zero demand detected, starting grace period", appending " (activation scale: 2)" might be slightly less clear than a new reason like "Scaling from zero with activation scale: 2". This is a minor readability suggestion.

Suggested change

rec.Reason += fmt.Sprintf(" (activation scale: %d)", activationScale)

rec.Reason = fmt.Sprintf("Scaling from zero with activation scale: %d (original reason: %s)", activationScale, rec.Reason)

gemini-code-assist · 2026-01-25T17:00:29Z

pkg/controller/budaiscaler/algorithm/kpa.go

+			rec.DesiredReplicas = activationScale
+		}


Similar to the BudScaler algorithm, consider adding a more explicit reason for applying the activation scale. While the current implementation correctly sets the desired replicas, a clearer reason would improve observability and debugging for users trying to understand why their workload scaled to a specific number from zero.

Suggested change

rec.DesiredReplicas = activationScale

}

rec.Reason = fmt.Sprintf("Scaling from zero with activation scale: %d", activationScale)

gemini-code-assist · 2026-01-25T17:00:29Z

pkg/controller/budaiscaler/budaiscaler_controller.go

+	if result.Recommendation != nil {
+		if result.Recommendation.DesiredReplicas == 0 && result.CurrentReplicas > 0 {
+			// Entering zero demand state
+			if scaler.Status.ZeroDemandSince == nil {
+				scaler.Status.ZeroDemandSince = &now
+			}
+		} else if result.Recommendation.DesiredReplicas > 0 {
+			// Exiting zero demand state
+			scaler.Status.ZeroDemandSince = nil
+		}


The logic for updating ZeroDemandSince is correct. However, now is defined once at the beginning of the reconcileCustomScaler function. If there's a significant delay between the start of reconciliation and this specific status update, now might not accurately reflect the exact moment the zero-demand state was detected. For higher precision, consider moving the now := metav1.Now() assignment directly before scaler.Status.ZeroDemandSince = &now when setting the timestamp.

Suggested change

if result.Recommendation != nil {

if result.Recommendation.DesiredReplicas == 0 && result.CurrentReplicas > 0 {

// Entering zero demand state

if scaler.Status.ZeroDemandSince == nil {

scaler.Status.ZeroDemandSince = &now

}

} else if result.Recommendation.DesiredReplicas > 0 {

// Exiting zero demand state

scaler.Status.ZeroDemandSince = nil

}

if result.Recommendation != nil {

if result.Recommendation.DesiredReplicas == 0 && result.CurrentReplicas > 0 {

// Entering zero demand state

if scaler.Status.ZeroDemandSince == nil {

currentTimestamp := metav1.Now()

scaler.Status.ZeroDemandSince = &currentTimestamp

}

} else if result.Recommendation.DesiredReplicas > 0 {

// Exiting zero demand state

scaler.Status.ZeroDemandSince = nil

}

}

- Improve activation scale reason strings in budscaler and kpa algorithms - Use fresh timestamp for ZeroDemandSince instead of reusing `now` variable Co-Authored-By: Claude Opus 4.5 <[email protected]>

gemini-code-assist bot reviewed Jan 25, 2026

View reviewed changes

fix: address review comments for scale-to-zero implementation

c93a9f3

- Improve activation scale reason strings in budscaler and kpa algorithms - Use fresh timestamp for ZeroDemandSince instead of reusing `now` variable Co-Authored-By: Claude Opus 4.5 <[email protected]>

dittops merged commit c6f376d into main Jan 25, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement scale-to-zero support for BudAIScaler #2

feat: Implement scale-to-zero support for BudAIScaler #2

Uh oh!

dittops commented Jan 25, 2026

Uh oh!

gemini-code-assist bot commented Jan 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	rec.Reason += fmt.Sprintf(" (activation scale: %d)", activationScale)
	rec.Reason = fmt.Sprintf("Scaling from zero with activation scale: %d (original reason: %s)", activationScale, rec.Reason)

	rec.DesiredReplicas = activationScale
	}
	rec.Reason = fmt.Sprintf("Scaling from zero with activation scale: %d", activationScale)

feat: Implement scale-to-zero support for BudAIScaler #2

feat: Implement scale-to-zero support for BudAIScaler #2

Uh oh!

Conversation

dittops commented Jan 25, 2026

Summary

Usage Example

Test plan

Uh oh!

gemini-code-assist bot commented Jan 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant