Skip to content

Latest commit

 

History

History
95 lines (72 loc) · 5.22 KB

File metadata and controls

95 lines (72 loc) · 5.22 KB

Quickstart

Stand up a Poolside deployment on AWS in two profiles: platform-only (no GPU) or full (platform + local GPU inference). Pick a profile, copy its example root out of this repo, fill in your variables, and apply.

For background on what gets created, see architecture.md. For the prerequisite list (tools, AWS credentials, bundle, model checkpoints), see prerequisites.md. For the staged-rollout walkthrough and verification commands, see deployment-guide.md.

Choose a profile

Component platform-only full
VPC, EKS cluster, CPU node group
RDS PostgreSQL, S3 buckets, ECR
ALB controller, External Secrets Operator
poolside-deployment Helm release
GPU node group + NVIDIA GPU Operator
Models S3 bucket + checkpoint upload
inference-stack Helm release
Cognito user pool ☑️ optional ☑️ optional
Profile When to use
platform-only You connect Poolside to an external OpenAI-compatible model API (Anthropic, OpenAI, a Bedrock shim, internally hosted inference). Approximately $3/hr.
full You want everything in one cluster and have GPU quota in your AWS account. Approximately $100/hr at the recommended GPU defaults.

Steps

Both profiles follow the same six steps. The full profile has additional inputs for the GPU node group and model checkpoint uploads.

  1. Confirm prerequisites. AWS credentials with admin-equivalent access on the target account, the required CLI tools, an extracted Poolside Helm bundle, model checkpoint tarballs (full profile only), a public DNS hostname you've chosen, and an ACM certificate covering that hostname issued in your target region. See prerequisites.md for the full list and a sanity checklist.

  2. Copy the example root outside this repo:

    # Full profile
    cp -r path/to/aws-reference-architecture/public/examples/full ~/my-poolside-deployment
    
    # Or platform-only
    cp -r path/to/aws-reference-architecture/public/examples/platform-only ~/my-poolside-deployment
    
    cd ~/my-poolside-deployment
  3. Fill in terraform.tfvars from the supplied example. Required for both profiles:

    Variable Notes
    deployment_name Lowercase prefix used for every AWS resource
    region AWS region
    cluster_endpoint_public_access_cidrs CIDR blocks allowed to reach the EKS public API endpoint
    admin_principal_arns IAM principals granted EKS cluster-admin via Access Entries
    public_hostname Public DNS hostname; an ACM cert covering it must already exist in region
    containers_dir Path to the bundle's containers/ directory
    bundle_root Path to the extracted bundle (parent of charts/ and containers/)

    Additionally required for the full profile:

    Variable Notes
    checkpoints_dir Directory of *.tar model checkpoint archives

    Optional for both profiles: enable_cognito = true to have Terraform create a Cognito user pool instead of bringing your own OIDC.

  4. Export your AWS profile in the shell:

    export AWS_PROFILE=<your-admin-profile>
    aws sts get-caller-identity
  5. Init, plan, apply. The first apply provisions infrastructure, pushes container images to ECR, and (full profile) uploads model checkpoints. The Helm releases are gated off:

    terraform init
    terraform plan
    terraform apply
  6. Flip on the Helm installs once you've verified each layer. Set install_poolside_deployment = true (and install_inference_stack = true for the full profile) in terraform.tfvars, then re-apply. See deployment-guide.md for the full staged walkthrough and verification commands.

After install

  1. Get the ALB hostname from the ingress resource:
    kubectl get ingress -n poolside
  2. Point your public_hostname at the ALB in your DNS provider. Most providers support a CNAME from your hostname to the ALB DNS name. If you host the zone in Route 53, use an alias A record (type = A, alias target = the ALB) instead — Route 53 aliases also work at the zone apex, where CNAMEs aren't valid.
  3. Bind your identity provider by visiting https://<your-hostname> and following the on-screen prompts. With Cognito, retrieve the issuer URL, client ID, and client secret from the Terraform outputs:
    terraform output -raw cognito_user_pool_endpoint
    terraform output -raw cognito_user_pool_client_id
    terraform output -raw cognito_user_pool_client_secret

Notes

  • The minimum supported GPU instance type is p5e.48xlarge. Smaller GPUs do not have enough memory for the shipped models.
  • Python 3 with boto3 importable is required for model checkpoint uploads in the full profile. If that's a non-starter, see the BYO-bucket mode in model-checkpoints.md.
  • External OIDC values are entered in the Poolside Console during first-time setup. Cognito is optional; when enabled, Terraform outputs the values needed to bind.