Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-cluster host scaling with Typha #3817

Merged
merged 10 commits into from
Mar 24, 2025

Conversation

hjiawei
Copy link
Contributor

@hjiawei hjiawei commented Mar 18, 2025

Description

This changeset introduces support for non-cluster Felix auto-scaling with Typha. It implements a dedicated Typha deployment for non-cluster hosts. The newly added Thpha auto-scaler monitors the total number of non-cluster hosts and dynamically scales Typha deployment accordingly. Additionally, the Tigera operator CSR controller has been enhanced to accept CSRs from both Pods and non-cluster HostEndpoints.

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files
  • If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

  • Milestone set according to targeted release.
  • Appropriate labels:
    • kind/bug if this is a bugfix.
    • kind/enhancement if this is a a new feature.
    • enterprise if this PR applies to Calico Enterprise only.

@marvin-tigera marvin-tigera added this to the v1.38.0 milestone Mar 18, 2025
@hjiawei hjiawei force-pushed the noncluster-typha-scaling branch from 0a38df7 to bf7ee6b Compare March 18, 2025 03:10
@hjiawei hjiawei marked this pull request as ready for review March 18, 2025 03:38
@hjiawei hjiawei requested a review from a team as a code owner March 18, 2025 03:38
@@ -72,16 +71,10 @@ func Add(mgr manager.Manager, opts options.AddOptions) error {

// Established deferred watches against the v3 API that should succeed after the Enterprise API Server becomes available.
if opts.EnterpriseCRDExists {
k8sClient, err := kubernetes.NewForConfig(mgr.GetConfig())
Copy link
Contributor Author

@hjiawei hjiawei Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code to create a new k8s clientset is repeated across almost all controllers. It has now been refactored to be initialized once in the main function and passed as part of opts.

@hjiawei hjiawei force-pushed the noncluster-typha-scaling branch from 9d21afc to 9f3fbe7 Compare March 20, 2025 06:21
// Create a Typha autoscaler for non-cluster hosts
var typhaAutoscalerNonClusterHost *typhaAutoscaler
restConfig := mgr.GetConfig()
nonclusterhosts, err := utils.GetNonClusterHostDynamic(restConfig)
Copy link
Member

@rene-dekker rene-dekker Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if there are no NCHs at boot time for the operator, that the core controller won't be watching heps going forward? Maybe we should just always watch them if there are enterprise CRDs in the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the non-cluster host Typha auto-scaler initialization code to the Reconcile function. It will be initialized/started when the NonClusterHost resource exists and the scaler pointer is nil.

@hjiawei hjiawei force-pushed the noncluster-typha-scaling branch from 851f17b to b7b9e12 Compare March 20, 2025 19:32
}
if len(hepList.Items) == 0 {
return nil, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If for some reason the filter returns more items add a warning.

Suggested change
}
if len(hepList.Items) > 1 {
log.WithFields(log.Fields{
"hostname": hostname,
"count": len(hepList.Items),
}).Warn("Multiple HostEndpoints found for node; returning the first one")
}

@hjiawei hjiawei force-pushed the noncluster-typha-scaling branch 2 times, most recently from 4cc6578 to 16fdc27 Compare March 21, 2025 17:51
@hjiawei hjiawei force-pushed the noncluster-typha-scaling branch from 16fdc27 to de9eecc Compare March 21, 2025 21:05
@danudey danudey modified the milestones: v1.38.0, v1.39.0 Mar 21, 2025
@rene-dekker rene-dekker merged commit 725783c into tigera:master Mar 24, 2025
5 checks passed
@hjiawei hjiawei deleted the noncluster-typha-scaling branch March 25, 2025 17:52
hjiawei added a commit to hjiawei/operator that referenced this pull request Mar 25, 2025
* Support non-cluster host scaling with Typha

* Create Kubernetes Clientset from config only once

* Allow certificate signing request from non-cluster hosts

* Add and fix unit tests

* Create a separate key pair for non-cluster host Typha

* Start the non-cluster host Typha autoscaler for enterprise only

* Perform SubjectAccessReview for non-cluster host CSRs

* Start Typha auto-scaler when NonClusterHost resource exists

* Add a 10 second timeout when performing access reviews

* Rerun make generate
rene-dekker pushed a commit that referenced this pull request Mar 26, 2025
* Support non-cluster host scaling with Typha (#3817)

* Support non-cluster host scaling with Typha

* Create Kubernetes Clientset from config only once

* Allow certificate signing request from non-cluster hosts

* Add and fix unit tests

* Create a separate key pair for non-cluster host Typha

* Start the non-cluster host Typha autoscaler for enterprise only

* Perform SubjectAccessReview for non-cluster host CSRs

* Start Typha auto-scaler when NonClusterHost resource exists

* Add a 10 second timeout when performing access reviews

* Rerun make generate

* Render CSR roles when NonClusterHost feature is enabled (#3834)

This change adds the NonClusterHost resource check to render
CSR roles for the CSR controller. It is required to validate and
sign CSRs generated from non-cluster hosts.

* FV test with Calico OSS release-v3.30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants