feat(aws): optionally provision managed ElastiCache Redis by BimaPangestu28 · Pull Request #186 · greenticai/greentic-deployer

BimaPangestu28 · 2026-05-11T00:59:00Z

Summary

Adds opt-in ElastiCache provisioning to the AWS operator module so deploys whose bundles include state-redis actually get a working Redis endpoint without the user having to stand one up out-of-band.

New top-level variable aws_provision_redis (bool, default false).
When aws_provision_redis = true and redis_url is empty, the operator module creates:
- aws_elasticache_subnet_group in the same subnets the ECS service uses.
- aws_security_group with a single ingress on TCP/6379 from the ECS task security group.
- aws_elasticache_cluster (engine redis, cache.t3.micro × 1, port 6379, default.redis7 parameter group).
A new local effective_redis_url picks the external redis_url when supplied, otherwise the managed cluster's primary endpoint. The container's REDIS_URL env var now reads from it.
aws_redis_node_type / aws_redis_engine_version exposed for future sizing without code change.

Why

The bundle's state-redis pack reads its connection string from the runtime secrets and expects a reachable Redis. The AWS scaffold only shipped a modules/redis/main.tf null_resource stub — nothing was ever provisioned — so any bundle that resolved to state-redis at runtime got secrets://dev/<tenant>/_/state-redis/redis_url pointed at a host that didn't exist. Combined with the placeholder-expansion gap in runtime_secrets.rs (see #185), the deep-research demo's button-click flow surfaced pack execution failed: failed to render node input template because state writes silently failed.

The companion PR in greentic-demo (greenticai/greentic-demo#166) drops state-redis from the deep-research AWS bundle and unblocks 3Point. This PR is the proper fix for any caller that wants a working state-redis on AWS.

Design notes

Resources live inside the operator module. ElastiCache needs the operator's VPC, subnets, and ECS task SG. Keeping the cluster co-located with the operator avoids a cross-module data flow that would otherwise circle (Redis needs operator's VPC; operator needs Redis's URL).
External redis_url still wins. effective_redis_url = var.redis_url != "" ? var.redis_url : local.managed_redis_url preserves backward compatibility for callers that already point at an outside-of-this-stack Redis.
Strictly opt-in. Default count = 0 means existing deploys see no change. A separate Rust-side change can later wire a GREENTIC_DEPLOY_TERRAFORM_VAR_AWS_PROVISION_REDIS env switch when we want the deployer CLI to drive this.
Stub left intact. modules/redis/main.tf (the null_resource) stays because tests/pr04_terraform_pack.rs asserts the file exists in the pack. Cleanup belongs to a separate change.

Test plan

terraform init -backend=false && terraform validate in fixtures/packs/aws/terraform/ — clean (only pre-existing data.aws_region.current.name deprecation warnings).
cargo test --workspace — 0 failures.
cargo fmt --all --check — clean.
cargo clippy --workspace -- -D warnings — clean.
Live AWS deploy with TF_VAR_aws_provision_redis=true. Expect: terraform apply takes ~5–10 min extra for the cluster, REDIS_URL env on the ECS task points at the cluster endpoint, state-redis pack connects, deep-research button-click flow works end-to-end. Not exercised in CI.

Out of scope

Multi-AZ replication group with automatic failover (current is single-node).
AUTH token / transit encryption (current cluster is open inside the VPC).
Removing the dead modules/redis/main.tf stub.
Deployer CLI flag to flip aws_provision_redis from the command line.

When `var.aws_provision_redis = true` and no external `redis_url` is supplied, the AWS operator module now stands up a single-node ElastiCache cluster (`cache.t3.micro`, Redis 7.1 by default), gates ingress on port 6379 to the ECS service security group, and feeds the resulting endpoint into the ECS container as `REDIS_URL`. External `redis_url` (when non-empty) still takes precedence, so existing deploys that target an outside-of-this-stack Redis keep working without any change. Why --- The bundle's state-redis pack expects a reachable Redis endpoint, but the AWS scaffold only shipped a `modules/redis/main.tf` `null_resource` stub — nothing was ever provisioned. The companion fix in greentic-demo PR #166 drops state-redis from the deep-research demo so the demo path is unblocked, but multi-instance scaling (or any future bundle that legitimately needs a shared state-kv backend) needs a real Redis. This makes that opt-in. Design notes ------------ - Resources live inside the operator module so VPC/subnet/security-group references stay local; no cross-module data flow with the redis stub. - `local.effective_redis_url` prefers an externally-supplied URL over the managed one. That preserves backward compatibility for callers that pass `redis_url` from outside (e.g. an existing ElastiCache in another stack). - Default `count = 0` keeps it strictly opt-in. The existing `modules/redis/main.tf` stub is left in place because tests in `tests/pr04_terraform_pack.rs` assert the file exists; cleanup belongs to a separate change. - `provision_redis` is propagated from the top-level `var.aws_provision_redis` so the deployer Rust side can later wire a `GREENTIC_DEPLOY_TERRAFORM_VAR_AWS_PROVISION_REDIS` env switch without touching this module. Test plan --------- - `terraform init -backend=false && terraform validate` in `fixtures/packs/aws/terraform/` is clean (only pre-existing `data.aws_region.current.name` deprecation warnings). - `cargo test --workspace` is green. - `cargo fmt --all --check` and `cargo clippy --workspace -- -D warnings` are clean. - A live AWS deploy with `TF_VAR_aws_provision_redis=true` will need ~5–10 min extra `terraform apply` for the cluster; not exercised in CI.

BimaPangestu28 · 2026-05-11T02:37:59Z

Heads-up before this lands

Live-tested this PR end-to-end. ElastiCache provisioning itself works: with TF_VAR_aws_provision_redis=true and an empty redis_url, terraform stands up cache.t3.micro, wires the SG, and local.effective_redis_url populates REDIS_URL on the ECS container.

But the state-redis pack doesn't read the REDIS_URL env directly — it reads secrets://dev/<tenant>/_/state-redis/redis_url from the runtime secrets store (AWS Secrets Manager via valueFrom). That secret's value is whatever the deployer promoted at apply time (post #185 + #187: the resolved env-var lookup of ${REDIS_URL}). So even after this PR provisions a Redis cluster, the pack still tries to connect to whatever the user supplied in REDIS_URL (or fails-fast missing if they didn't).

In other words: this PR is correct at the infra layer but the demo path doesn't benefit until one of these lands as a follow-up:

state-redis pack reads REDIS_URL env directly (or has it as a fallback) instead of going through the secrets:// URI for the connection string.
Or the deployer overwrites the runtime secret with module.operator_aws[0].managed_redis_url after terraform apply so the AWS SM value matches the auto-provisioned endpoint.

Marking do-not-merge until that follow-up is decided — happy to keep this branch alive so it doesn't bit-rot, but it shouldn't ship in isolation.

BimaPangestu28 marked this pull request as draft May 11, 2026 02:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(aws): optionally provision managed ElastiCache Redis#186

feat(aws): optionally provision managed ElastiCache Redis#186
BimaPangestu28 wants to merge 1 commit into
mainfrom
fix/aws-provision-managed-redis

BimaPangestu28 commented May 11, 2026

Uh oh!

BimaPangestu28 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BimaPangestu28 commented May 11, 2026

Summary

Why

Design notes

Test plan

Out of scope

Uh oh!

BimaPangestu28 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant