What to install, configure, and verify before running terraform apply
against this reference architecture.
Required for terraform apply (invoked automatically by TF provisioners):
| Tool | Minimum | Used for |
|---|---|---|
| Terraform | 1.5.7 | All reference architecture operations |
| skopeo | 1.14+ | ECR image push (bundle containers/ → private ECR); invoked by the ecr module |
| python3 | 3.9+ | (Optional) Model checkpoint streaming uploader; invoked by the model-checkpoints module (conditional on enable_model_s3_upload = true, default true for the full profile) |
| boto3 | 1.30+ | (Optional) Imported by the checkpoint uploader. Must be importable by the python3 that's first on PATH when Terraform runs. See the boto3 note below. |
Required for the documented operator workflow (run in your shell, NOT invoked by TF):
| Tool | Minimum | Used for |
|---|---|---|
| AWS CLI | v2 | Shell auth (aws sts get-caller-identity), kubectl setup (aws eks update-kubeconfig), ad-hoc verification (aws ecr describe-images, aws s3 ls, etc.), operational escape hatches (aws ecr batch-delete-image, etc.). Not invoked by any Terraform provisioner. |
| kubectl | 1.30+ | Post-apply verification, day-2 troubleshooting |
The reference architecture does not require Docker, Helm, or envsubst on the operator's box. Helm runs via the Terraform Helm provider; ECR push uses skopeo (no Docker daemon).
The Terraform providers do NOT pin an AWS profile. They inherit from your shell. Before running anything:
export AWS_PROFILE=<your-admin-profile>
aws sts get-caller-identity # confirm you're logged in and as whomThis reference architecture creates IAM roles, an OIDC provider, and KMS keys so it needs admin-equivalent access on the target account. A read-only or insufficiently powerful developer profile will fail partway through.
Specifically, the operator identity needs:
- IAM: create/update/delete roles, policies, instance profiles, OIDC providers; tag IAM resources
- EKS: create cluster, access entries, addons
- EC2: VPC/subnets/NAT gateways/security groups/capacity reservation consumption
- RDS: create instances, subnet groups, parameter groups, secrets
- S3 + KMS: create buckets, keys, policies
- ECR: create repositories, push images
- Secrets Manager: managed database master password
- Cognito: create user pools (if
enable_cognito = true)
Several modules spawn subprocesses via local-exec: the ECR push
helper (skopeo) and the model-checkpoints uploader (Python + boto3).
These inherit AWS_PROFILE from your shell. If you pin a profile in
the Terraform provider block but the subprocess uses a different one,
you'll get confusing permission errors only at apply time.
Rule: export AWS_PROFILE in your shell once, then run terraform.
This reference architecture's providers.tf deliberately doesn't override it.
The reference architecture is bundle-driven. Before the first apply, you need an extracted Poolside Helm bundle somewhere on disk, outside this repo:
~/poolside/helm/poolside-helm-<version>/
├── charts/
│ ├── poolside-deployment/
│ └── inference-stack/
├── containers/
│ ├── <image>__<tag>__<arch>.tar (one per container image)
│ └── ...
└── scripts/
This layout comes from extracting the bundle tarball Poolside ships.
Set containers_dir (ECR source) and bundle_root (Helm chart source)
in your terraform.tfvars to the appropriate paths. For example:
containers_dir = "/home/ops/poolside/helm/poolside-helm-<version>/containers"
bundle_root = "/home/ops/poolside/helm/poolside-helm-<version>"The reference architecture never assumes this lives inside the git repo. Treat the bundle as a vendor artifact you stage separately.
If you're running the full profile and using architecture-managed model
uploads (enable_model_s3_upload = true, the default), you also need
model checkpoint tarballs on disk:
~/poolside/models/
├── laguna_xs-20250427_int4.tar
├── malibu-v2.20251021_int4.tar
├── point-v2.20250403.tar
└── ... (one per model you want to deploy)
Filename convention: <model>-<version>[_<quant>].tar. The reference architecture
splits each filename on the first hyphen to derive a model alias (e.g.
malibu-v2.20251021_int4.tar → alias malibu). See
model-checkpoints.md for the full details,
including the BYO-bucket alternative.
The reference architecture is HTTPS-only at the edge: the ALB terminates
TLS against an ACM certificate, and Cognito (when enabled) uses your
public hostname for callback URLs. Both must be ready before
terraform apply:
- A public DNS hostname you've chosen for the deployment, for
example
poolside.example.com. You'll set this aspublic_hostnameinterraform.tfvars. The hostname does not have to resolve yet; you'll point it at the ALB after the apply. - An ACM certificate covering that hostname, issued in the target
region (
var.region). The example roots look it up by domain name viadata "aws_acm_certificate", so if it isn't issued at plan time the lookup fails.
If you don't already have an ACM certificate, request one in the AWS console (or via your usual cert-issuance path) and complete DNS validation before continuing.
After terraform apply creates the EKS cluster, configure kubectl:
eval "$(terraform output -raw kubeconfig_command)"
kubectl get nodesThe cluster is created with both a public API endpoint (gated by the
CIDRs in cluster_endpoint_public_access_cidrs) and a private
endpoint. Terraform communicates with the API via the public endpoint;
in-cluster workloads use the private one.
If your organization requires private-only EKS API access, you'll
need to run Terraform from inside the VPC (bastion, peered VPC, or
transit gateway). See the limitations note in
architecture.md.
-
aws sts get-caller-identityreturns your admin profile -
terraform version≥ 1.5.7 -
skopeo --versionworks -
python3 -c "import boto3; print(boto3.__version__)"succeeds for the samepython3that's first on your PATH (full profile only) - Bundle extracted;
containers/<something>.tarvisible - Model checkpoint tarballs at the expected path (full profile, upload mode enabled)
-
cluster_endpoint_public_access_cidrsincludes the IP you'll run Terraform from (curl -s ifconfig.meto check) -
admin_principal_arnsincludes the role Terraform is running as - Public hostname chosen, ACM certificate issued in
var.region