Skip to content

Conversation

@golgeek
Copy link
Contributor

@golgeek golgeek commented Oct 7, 2025

This patch starts using the roachprod centralized client from the roachprod library when properly configured via environment variables.

Epic: none
Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Oct 7, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@golgeek golgeek force-pushed the ludo/rp_centralized_use branch 13 times, most recently from 64e04b0 to a51c902 Compare October 14, 2025 17:26
@golgeek golgeek force-pushed the ludo/rp_centralized_use branch 15 times, most recently from 2631ee0 to 40af10b Compare October 21, 2025 22:28
This patch prepares the ground work for roachprod-centralized by
enriching the vm.Provider interface with two new methods:
- `String()`: returns a human-readable (and map key) for the provider in
  its current configuration (e.g. `{ProviderName}-{project}`,
`{ProviderName}-{AccountNumber}`, ...)
- `IsLocalProvider()`: returns whether or not the provider is local

These two new methods will ease the work of spawning multiple instances
of the same provider, in different configurations.

Epic: none
Release note: None
This patch refactorizes the different providers to bring:
- a uniform way of instantiating a provider
- a better isolation of providers configurations to allow multiple
  providers to be instantiated in parallel

Epic: none
Release note: None
This patch adds provider management for the AWS SST credentials.

This allows each provider instance to generate its own short lived
credentials and pass them as environment variables to the aws CLI
without impacting other provider instances.

Epic: none
Release note: None
In order for roachprod-centralized to use the roachprod library,
and for the roachprod library to use the roachprod-centralized client,
it is required to move the `cloud.Cluster` and `cloud.Clusters` types
in their own package to avoid import cycles. These two types are moved
under `cloud/types` and all imports are aliased as `cloudcluster` to
ensure clarity in the code.

This patch also refactors `cloud.Cloud` so that `ListCloud` can be
executed against a pre-defined set of providers instead of the global
`vm.AllProviderNames()`. This is achieved by storing two providers
slices (one for remote, one for local) in the `cloud.Cloud` type, and
having a `ListCloud` method that runs against these providers. The
public `ListCloud` helper function has been rewritten to init a new
`cloud.Cloud` with the global providers and use the object's method to
avoid duplicating code.

Epic: none
Release note: None
Prior to this patch, only the GCE cloud provider had support for VMs
public DNS at creation/deletion time. All other Cloud Providers were
relying on the `Sync` operation to create DNS entries for all synced
clusters.

This patch brings support for instantiating all Cloud Providers with a
DNS Provider, and populates the required `vm.VM` fields to make DNS
behave the same across all Cloud Providers. The vm.DNSProvider interface
is extended to also contain the DNS methods of the InfraProvider
interface, which will allow a better separation of concerns.

Epic: none
Release note: None
Prior to this patch the GCE provider was 100% built around executing
gcloud commands. While this works fine with the local roachprod binary,
it creates two major issues when using it as a library:
- the output of gcloud commands can be huge and is cached by the kernel
  in the page cache and on top of this, Go has to reserve more or less
  the same memory allocations for json deserialization. The combination
  leads to memory exhaustion on low memory runners.
- using the auto-magical `glcoud dns record-sets import` command can
  fail due to inconsistencies between the state when the bind file is
  created and when the command is executed if other systems update the
  DNS zone at the same time.

This patch introduces the use of the GCP Golang SDK to list instances
(and instance templates) by patching the GCE provider, and manage DNS
records by creating a new SDK based DNS provider implementation. The new
GCE SDK DNS provider is also named `gce` to be retro-compatible with the
`glcoud` based provider, as they have the exact same capabilities. The
`gcloud` DNS provider will be deprecated.

The introduction of the new SDK DNS provider also ensures a better
independence between the GCE provider and the DNS provider by fully
using an interface in the VM provider. This greatly improves stability
in long running operations and improves the overall quality of the code.

Epic: none
Release note: None
Prior to this patch, ListCloud() and SyncDNS() paths in the roachprod
library did not provide a way to pass a context.Context. This was
preventing roachprod-centralized to respect tasks timeouts.

This patch adds a context in the new `cloud.Cloud.ListCloud()` function
and a `SyncDNSWithContext()` function in the `vm.DNSProvider` task. All
non-context versions call the context version by passing an argument
`context.Background()`.

With the exception of the `gcloud` version of the `gce` provider, the
`provider.List()` path has been updated to respect the provided context.

Epic: none
Release note: None
@golgeek golgeek force-pushed the ludo/rp_centralized_use branch from 40af10b to a7dbded Compare November 13, 2025 22:01
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Nov 13, 2025
@golgeek golgeek force-pushed the ludo/rp_centralized_use branch 3 times, most recently from 703df55 to 2d08927 Compare November 14, 2025 16:51
This patch is the initial commit for roachprod-centralized.

Epic: none
Release note: None
This patch starts using the roachprod centralized client from the
roachprod library when properly configured via environment variables.

Epic: none
Release note: None
@golgeek golgeek force-pushed the ludo/rp_centralized_use branch from 2d08927 to 41acdfd Compare November 14, 2025 17:32
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants