Skip to content

datachain-ai/datachain-compute-cluster-nb

Repository files navigation

Supporting infrastructure for DataChain Compute Clusters on Nebius

This repository contains the supporting infrastructure (service accounts, signing keys, and role assignments) needed to grant DataChain Studio enough permissions to manage compute clusters on Nebius, and for those clusters to access the designated object storage buckets.

Setup

  1. Update variables.tf with values specific to your deployment.
  2. Install Terraform
  3. Run terraform init
  4. Run terraform apply
  5. Run terraform output and copy the output values

Overview

1. Service Accounts

  • Compute Service Account (nebius_iam_v1_service_account.datachain_compute): Assumed by DataChain Studio (SaaS) to create, delete and manage DataChain Managed Kubernetes (MK8s) clusters.
  • Storage Service Account (nebius_iam_v1_service_account.datachain_storage): Assumed by DataChain Studio (SaaS) to read and write object storage buckets.

2. Authentication Keypairs

  • RSA keypairs (tls_private_key.*): One keypair per service account is generated locally by Terraform. The public key is registered on Nebius via nebius_iam_v1_auth_public_key; the private key is exposed as a sensitive output for DataChain Studio to consume.
  • DataChain Studio signs short-lived JWTs with the private key and exchanges them for Nebius IAM tokens (see the ServiceAccountBearer flow in the Nebius SDK).

3. Access Permits (IAM Role Assignments)

  • admin on the project: Granted to the compute service account so it can create/delete MK8s clusters, node groups, subnet lookups, and related IAM for Karpenter inside the project.
  • viewer on the tenant: Granted to the compute service account so it can resolve tenant-scoped resources (e.g. the Karpenter group parent).
  • storage.editor on each authorised bucket: Granted to the storage service account for read/write access.
  • mysterybox.payload-viewer on each authorised secret: Granted to the storage service account so it can read the payload of Nebius Mystery Box secrets.

Security Considerations

  1. Static Credentials: Nebius does not yet support third-party OIDC workload-identity federation, so DataChain Studio authenticates with a static service-account keypair. The private keys live in Terraform state — back the state with an encrypted remote backend and restrict access to it.

  2. Least Privilege Scoping: Storage permits are granted only to explicitly authorised buckets. The storage service account has no project- or tenant-wide role.

  3. Separation of Identities: Compute and storage operations use separate service accounts with separate keypairs, so a leak of one does not grant the other's permissions.

Variables

Name Description Example
nebius_tenant_id Nebius tenant ID "tenant-e00000000000000000"
nebius_parent_id Nebius project ID under which resources are created "project-e00000000000000000"
storage_buckets List of object storage bucket names accessible to Studio Jobs ["example-bucket"]
secrets List of Mystery Box secret names accessible to Studio Jobs ["example-secret"]

Outputs

Name Description
datachain_compute_nebius_tenant_id Nebius tenant ID for compute resources
datachain_compute_nebius_project_id Nebius project ID for compute resources
datachain_compute_nebius_credentials_json Service-account credentials JSON for the compute identity (sensitive)
datachain_compute_karpenter_service_account_id Compute service account ID, for reuse as Karpenter's service account
datachain_storage_nebius_tenant_id Nebius tenant ID for storage resources
datachain_storage_nebius_project_id Nebius project ID for storage resources
datachain_storage_nebius_credentials_json Service-account credentials JSON for the storage identity (sensitive)
datachain_storage_nebius_aws_access_key_id HMAC access key ID for S3-compatible access to the storage buckets
datachain_storage_nebius_aws_secret_access_key HMAC secret access key paired with the access key ID (sensitive)
datachain_storage_nebius_s3_endpoints Map of {bucket_name => endpoint_url} for S3-compatible access

Each *_credentials_json is a ready-to-use service-account credentials JSON (as consumed by the Nebius SDK and nebius CLI). Marked sensitive because it contains the RSA private key.

The aws_access_key_id / aws_secret_access_key pair is an HMAC access key attached to the storage service account, consumable by any AWS SDK / boto3 / aws s3 --endpoint-url=... client. Endpoints are per-bucket because different buckets can live in different regions.

Guidance

Granting Access to Nebius Object Storage Buckets in DataChain Studio Jobs

Update the storage_buckets list in variables.tf with the list of bucket names DataChain Studio Jobs should have access to, and run terraform apply.

The buckets must already exist under the same nebius_parent_id (project) — Terraform looks them up by name via the nebius_storage_v1_bucket data source.

Granting Access to Nebius Secrets in DataChain Studio Jobs

You can securely inject sensitive configuration (such as tokens, passwords, or private URLs) into your compute jobs by referencing Nebius Mystery Box secrets through environment variables. This avoids hardcoding credentials and allows fine-grained secret management.

  1. Create a Secret in Nebius Mystery Box

    Store your secret value under a named key in a Mystery Box secret (see nebius_mysterybox_v1_secret / nebius_mysterybox_v1_secret_version).

  2. Grant Access to the Secret through Terraform

    Update the secrets list in variables.tf with the name of the Mystery Box secret, and run terraform apply. The secret must live under the same nebius_parent_id (project).

  3. Set an Environment Variable in the Studio Job Settings

    In DataChain Studio, configure your job with an environment variable that references the secret using the nbsecret:// syntax:

    EXAMPLE_SECRET=nbsecret://mbsec-e00xxxxxxxxxxxxxxx#EXAMPLE_SECRET
    
    • Replace mbsec-e00xxxxxxxxxxxxxxx with the full ID of your Mystery Box secret.
    • The part after the # (e.g., #EXAMPLE_SECRET) refers to the key inside the secret payload.

About

Infrastructure overview and supporting material for deploying DataChain Compute Clusters on Nebius

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages