-
Notifications
You must be signed in to change notification settings - Fork 6
feat(BREV-2138): Enable native Nebius integration #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
liveaverage
commented
Nov 17, 2025
- Native Nebius compute management (create, terminate, start, stop, reboot)
- Multi-region instance type enumeration with quota and pricing awareness
- GPU support: H100, H200, L40S, and B200
- VPC and Kubernetes support
…p, remove cloud property, use RefID for resource naming
Remove trailing spaces and fix alignment in Nebius provider files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Resolved conflicts by: - Keeping both .gitignore entries (coverage/* and .claude) - Merging capabilities from both branches (added VPC, ManagedKubernetes, ModifyFirewall, InstanceUserData) - Preserving feature branch client implementation with logger support from main - Keeping comprehensive project discovery and region mapping logic Logger support is now fully integrated in the feature branch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do they have any kind of "out of capacity" error?
|
might need a rebase as well, I see |
| defaultFamilies := []string{ | ||
| "ubuntu22.04-cuda12", | ||
| "ubuntu20.04", | ||
| "ubuntu18.04", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to support such an old image? I feel like we should maybe drop this one and include 24 (if they offer it)?
v1/providers/nebius/instance.go
Outdated
|
|
||
| // Add labels/tags to metadata (always create labels for resource tracking) | ||
| createReq.Metadata.Labels = make(map[string]string) | ||
| c.logger.Info(ctx, "🏷️ Setting instance tags during CreateInstance", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug to reduce logspam (and generally for many of the info statements below)
|
|
||
| // Wait for instance to reach a stable state (RUNNING or terminal failure) | ||
| // This prevents leaving orphaned resources if the instance fails after creation | ||
| c.logger.Info(ctx, "waiting for instance to reach RUNNING state", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug?
v1/providers/nebius/instance.go
Outdated
| InstanceType: instanceTypeID, // Full instance type ID (e.g., "gpu-h100-sxm.8gpu-128vcpu-1600gb") | ||
| InstanceTypeID: v1.InstanceTypeID(instanceTypeID), // Same as InstanceType - required for dev-plane lookup | ||
| ImageID: imageFamily, | ||
| DiskSize: units.Base2Bytes(diskSize), // diskSize is already in bytes from getBootDiskSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is freshly deprecated in favor of DiskSizeBytes!
DiskSize units.Base2Bytes // TODO: deprecate in favor of DiskSizeByteValue
DiskSizeBytes Bytes
v1/providers/nebius/instancetype.go
Outdated
| // ID and Type are the same - no region/provider prefix | ||
| instanceTypeID := fmt.Sprintf("%s.%s", platform.Metadata.Name, preset.Name) | ||
|
|
||
| c.logger.Info(ctx, "building instance type", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug?
v1/providers/nebius/instancetype.go
Outdated
| Location: location.Name, | ||
| Type: instanceTypeID, // Same as ID - both use dot-separated format | ||
| VCPU: preset.Resources.VcpuCount, | ||
| Memory: units.Base2Bytes(int64(preset.Resources.MemoryGibibytes) * 1024 * 1024 * 1024), // Convert GiB to bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also freshly deprecated!
Memory units.Base2Bytes // TODO: deprecate in favor of MemoryByteValue
MemoryBytes Bytes
- Add assertions to test methods to ensure non-zero results: * scripts/images_test.go: Test_EnumerateImages * scripts/instancetypes_test.go: Test_EnumerateInstanceTypesSingleRegion, Test_EnumerateGPUTypes * integration_test.go: TestIntegration_GetInstanceTypes, TestIntegration_GetImages - Remove debug statements from production code (client.go, instancetype.go, credential.go, instance.go) - Remove emojis from test output (smoke_test.go, integration_test.go) - Remove unused extractOSFamily function from image.go - Delete unnecessary markdown files (keep only README.md) - Remove .gitignore for markdown files Generated with Claude Code Co-Authored-By: Claude <[email protected]>
- Replace DiskSize with DiskSizeBytes in instance.go - Replace Memory with MemoryBytes in instancetype.go - Remove emojis from logger statements in instance.go - Change logger.Info to logger.Debug for "building instance type" log Generated with Claude Code Co-Authored-By: Claude <[email protected]>
Update validation tests to use new NewNebiusCredential signature: - validation_kubernetes_test.go: Use NEBIUS_SERVICE_ACCOUNT_JSON and NEBIUS_TENANT_ID - validation_network_test.go: Use NEBIUS_SERVICE_ACCOUNT_JSON and NEBIUS_TENANT_ID The new credential format expects: - serviceAccountJSON: JSON service account key (or file path) - tenantID: Nebius tenant ID Previous format used individual credential components which have been consolidated. Generated with Claude Code Co-Authored-By: Claude <[email protected]>
- Change validation test failures to skips when env vars missing (tests should skip, not fail, when credentials aren't configured) - Fix errcheck issues in integration_test.go: * Explicitly ignore Close() errors in defers * Check fmt.Sscanf error return - Add nolint comments for high cognitive complexity functions: * instancetype.go: getInstanceTypesForLocation * instance.go: parseInstanceType, getWorkingPublicImageID, ListInstances, convertNebiusInstanceToV1 * integration_test.go: TestIntegration_InstanceLifecycle (funlen) These functions are intentionally complex due to: - Multiple fallback strategies - Extensive error handling - Field mapping from provider to v1 types - Complete test lifecycle coverage Generated with Claude Code Co-Authored-By: Claude <[email protected]>
This commit addresses all golangci-lint warnings and test failures reported in CI. All fixes were verified locally using golangci-lint v2.6.0 before committing. Test Fixes: - Update client_test.go to expect all 12 capabilities (VPC, managed-k8s, firewall, userdata) Linting Fixes (21 issues → 0): - Add nolint:funlen for 5 legitimately complex functions - Add nolint:gocyclo for 4 test functions with high cyclomatic complexity - Add nolint:goconst for architecture and GPU type comparison strings - Fix context.Context parameter ordering in 8 test helper functions - Rename 6 unused context parameters to "_" in not-yet-implemented functions - Add nolint:unparam for 3 functions that currently return nil error - Run gofumpt on 5 files to fix formatting issues - Remove unused/incorrect nolint directives All changes verified locally: - golangci-lint v2.6.0: 0 issues - go build: ✅ Success - go test -c: ✅ Success - TestNebiusClient_GetCapabilities: ✅ Pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
660877f to
2749021
Compare
- Add nolint directive to instance_test.go for GPU type comparisons - Remove unused nolint directive from instance.go All linting checks now pass locally with golangci-lint v2.6.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add nolint directive to instance_test.go for GPU type string comparisons - Remove unused nolint directives from instance.go - Verified with: /tmp/golangci-lint run (0 issues) The goconst linter was flagging 4 occurrences of "cpu" string across instance.go and instance_test.go. Adding the nolint to the test file suppresses all occurrences without needing directives in the main code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The goconst linter requires string literals used 4+ times to be constants. Created platformTypeCPU constant and replaced all "cpu" string literals in instance.go and instance_test.go. - Add const platformTypeCPU = "cpu" - Replace all "cpu" string literals with platformTypeCPU - Remove unused nolint directive from instance_test.go - Verified with: /tmp/golangci-lint run (0 issues) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The function was accepting any platform containing "gpu" in the name, including invalid platforms like "random-gpu". Now it only accepts platforms with known GPU model names (h100, h200, l40s, a100, etc). Changes: - Remove generic "gpu" from indicators list - Rename to knownGPUTypes for clarity - Only accept platforms containing specific GPU model names - Platforms like "gpu-h100-sxm" and "h100-sxm" still work (both contain "h100") - "random-gpu" now correctly returns false Fixes test: TestIsPlatformSupported/Random_name_with_gpu 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>