This repository contains the supporting infrastructure (service accounts, signing keys, and role assignments) needed to grant DataChain Studio enough permissions to manage compute clusters on Nebius, and for those clusters to access the designated object storage buckets.
- Update
variables.tfwith values specific to your deployment. - Install Terraform
- Run
terraform init - Run
terraform apply - Run
terraform outputand copy the output values
- Compute Service Account (
nebius_iam_v1_service_account.datachain_compute): Assumed by DataChain Studio (SaaS) to create, delete and manage DataChain Managed Kubernetes (MK8s) clusters. - Storage Service Account (
nebius_iam_v1_service_account.datachain_storage): Assumed by DataChain Studio (SaaS) to read and write object storage buckets.
- RSA keypairs (
tls_private_key.*): One keypair per service account is generated locally by Terraform. The public key is registered on Nebius vianebius_iam_v1_auth_public_key; the private key is exposed as a sensitive output for DataChain Studio to consume. - DataChain Studio signs short-lived JWTs with the private key and exchanges them for Nebius IAM tokens (see the
ServiceAccountBearerflow in the Nebius SDK).
adminon the project: Granted to the compute service account so it can create/delete MK8s clusters, node groups, subnet lookups, and related IAM for Karpenter inside the project.vieweron the tenant: Granted to the compute service account so it can resolve tenant-scoped resources (e.g. the Karpenter group parent).storage.editoron each authorised bucket: Granted to the storage service account for read/write access.mysterybox.payload-vieweron each authorised secret: Granted to the storage service account so it can read the payload of Nebius Mystery Box secrets.
-
Static Credentials: Nebius does not yet support third-party OIDC workload-identity federation, so DataChain Studio authenticates with a static service-account keypair. The private keys live in Terraform state — back the state with an encrypted remote backend and restrict access to it.
-
Least Privilege Scoping: Storage permits are granted only to explicitly authorised buckets. The storage service account has no project- or tenant-wide role.
-
Separation of Identities: Compute and storage operations use separate service accounts with separate keypairs, so a leak of one does not grant the other's permissions.
| Name | Description | Example |
|---|---|---|
nebius_tenant_id |
Nebius tenant ID | "tenant-e00000000000000000" |
nebius_parent_id |
Nebius project ID under which resources are created | "project-e00000000000000000" |
storage_buckets |
List of object storage bucket names accessible to Studio Jobs | ["example-bucket"] |
secrets |
List of Mystery Box secret names accessible to Studio Jobs | ["example-secret"] |
| Name | Description |
|---|---|
datachain_compute_nebius_tenant_id |
Nebius tenant ID for compute resources |
datachain_compute_nebius_project_id |
Nebius project ID for compute resources |
datachain_compute_nebius_credentials_json |
Service-account credentials JSON for the compute identity (sensitive) |
datachain_compute_karpenter_service_account_id |
Compute service account ID, for reuse as Karpenter's service account |
datachain_storage_nebius_tenant_id |
Nebius tenant ID for storage resources |
datachain_storage_nebius_project_id |
Nebius project ID for storage resources |
datachain_storage_nebius_credentials_json |
Service-account credentials JSON for the storage identity (sensitive) |
datachain_storage_nebius_aws_access_key_id |
HMAC access key ID for S3-compatible access to the storage buckets |
datachain_storage_nebius_aws_secret_access_key |
HMAC secret access key paired with the access key ID (sensitive) |
datachain_storage_nebius_s3_endpoints |
Map of {bucket_name => endpoint_url} for S3-compatible access |
Each *_credentials_json is a ready-to-use service-account credentials JSON (as consumed by the Nebius SDK and nebius CLI). Marked sensitive because it contains the RSA private key.
The aws_access_key_id / aws_secret_access_key pair is an HMAC access key attached to the storage service account, consumable by any AWS SDK / boto3 / aws s3 --endpoint-url=... client. Endpoints are per-bucket because different buckets can live in different regions.
Update the storage_buckets list in variables.tf with the list of bucket names DataChain Studio Jobs should have access to, and run terraform apply.
The buckets must already exist under the same nebius_parent_id (project) — Terraform looks them up by name via the nebius_storage_v1_bucket data source.
You can securely inject sensitive configuration (such as tokens, passwords, or private URLs) into your compute jobs by referencing Nebius Mystery Box secrets through environment variables. This avoids hardcoding credentials and allows fine-grained secret management.
-
Create a Secret in Nebius Mystery Box
Store your secret value under a named key in a Mystery Box secret (see
nebius_mysterybox_v1_secret/nebius_mysterybox_v1_secret_version). -
Grant Access to the Secret through Terraform
Update the
secretslist invariables.tfwith the name of the Mystery Box secret, and runterraform apply. The secret must live under the samenebius_parent_id(project). -
Set an Environment Variable in the Studio Job Settings
In DataChain Studio, configure your job with an environment variable that references the secret using the
nbsecret://syntax:EXAMPLE_SECRET=nbsecret://mbsec-e00xxxxxxxxxxxxxxx#EXAMPLE_SECRET- Replace
mbsec-e00xxxxxxxxxxxxxxxwith the full ID of your Mystery Box secret. - The part after the
#(e.g.,#EXAMPLE_SECRET) refers to the key inside the secret payload.
- Replace