From 96429477c1e48dca6cda3530dd53a500694cafd2 Mon Sep 17 00:00:00 2001
From: Nuru <Nuru@users.noreply.github.com>
Date: Mon, 8 Jul 2024 09:06:46 -0700
Subject: [PATCH] [eks/actions-runner-controller] Multiple bug fixes and
 enhancements (#1075)

---
 .../actions-runner-controller/CHANGELOG.md    | 126 +++++++++
 .../eks/actions-runner-controller/README.md   |  75 +++---
 .../charts/actions-runner/Chart.yaml          |   2 +-
 .../templates/horizontalrunnerautoscaler.yaml |  25 +-
 .../templates/runnerdeployment.yaml           | 246 +++++++++++-------
 .../charts/actions-runner/values.yaml         |   2 +-
 modules/eks/actions-runner-controller/main.tf |   8 +-
 .../eks/actions-runner-controller/outputs.tf  |  23 ++
 .../resources/values.yaml                     |   9 +-
 .../actions-runner-controller/variables.tf    |  40 ++-
 10 files changed, 410 insertions(+), 146 deletions(-)
 create mode 100644 modules/eks/actions-runner-controller/CHANGELOG.md

diff --git a/modules/eks/actions-runner-controller/CHANGELOG.md b/modules/eks/actions-runner-controller/CHANGELOG.md
new file mode 100644
index 000000000..d0ca5cd41
--- /dev/null
+++ b/modules/eks/actions-runner-controller/CHANGELOG.md
@@ -0,0 +1,126 @@
+## PR [#1075](https://github.com/cloudposse/terraform-aws-components/pull/1075)
+
+New Features:
+
+- Add support for
+  [scheduled overrides](https://github.com/actions/actions-runner-controller/blob/master/docs/automatically-scaling-runners.md#scheduled-overrides)
+  of Runner Autoscaler min and max replicas.
+- Add option `tmpfs_enabled` to have runners use RAM-backed ephemeral storage (`tmpfs`, `emptyDir.medium: Memory`)
+  instead of disk-backed storage.
+- Add `wait_for_docker_seconds` to allow configuration of the time to wait for the Docker daemon to be ready before
+  starting the runner.
+- Add the ability to have the runner Pods add annotations to themselves once they start running a job. (Actually
+  released in release 1.454.0, but not documented until now.)
+
+Changes:
+
+- Previously, `syncPeriod`, which sets the period in which the controller reconciles the desired runners count, was set
+  to 120 seconds in `resources/values.yaml`. This setting has been removed, reverting to the default value of 1 minute.
+  You can still set this value by setting the `syncPeriod` value in the `values.yaml` file or by setting `syncPeriod` in
+  `var.chart_values`.
+- Previously, `RUNNER_GRACEFUL_STOP_TIMEOUT` was hardcoded to 90 seconds. That has been reduced to 80 seconds to expand
+  the buffer between that and forceful termination from 10 seconds to 20 seconds, increasing the chances the runner will
+  successfully deregister itself.
+- The inaccurately named `webhook_startup_timeout` has been replaced with `max_duration`. `webhook_startup_timeout` is
+  still supported for backward compatibility, but is deprecated.
+
+Bugfixes:
+
+- Create and deploy the webhook secret when an existing secret is not supplied
+- Restore proper order of operations in creating resources (broken in release 1.454.0 (PR #1055))
+- If `docker_storage` is set and `dockerdWithinRunnerContainer` is `true` (which is hardcoded to be the case), properly
+  mount the docker storage volume into the runner container rather than the (non-existent) docker sidecar container.
+
+### Discussion
+
+#### Scheduled overrides
+
+Scheduled overrides allow you to set different min and max replica values for the runner autoscaler at different times.
+This can be useful if you have predictable patterns of load on your runners. For example, you might want to scale down
+to zero at night and scale up during the day. This feature is implemented by adding a `scheduled_overrides` field to the
+`var.runners` map.
+
+See the
+[Actions Runner Controller documentation](https://github.com/actions/actions-runner-controller/blob/master/docs/automatically-scaling-runners.md#scheduled-overrides)
+for details on how they work and how to set them up.
+
+#### Use RAM instead of Disk via `tmpfs_enabled`
+
+The standard `gp3` EBS volume used for EC2 instance's disk storage is limited (unless you pay extra) to 3000 IOPS and
+125 MB/s throughput. This is fine for average workloads, but it does not scale with instance size. A `.48xlarge`
+instance could host 90 Pods, but all 90 would still be sharing the same single 3000 IOPS and 125 MB/s throughput EBS
+volume attached to the host. This can lead to severe performance issues, as the whole Node gets locked up waiting for
+disk I/O.
+
+To mitigate this issue, we have added the `tmpfs_enabled` option to the `runners` map. When set to `true`, the runner
+Pods will use RAM-backed ephemeral storage (`tmpfs`, `emptyDir.medium: Memory`) instead of disk-backed storage. This
+means the Pod's impact on the Node's disk I/O is limited to the overhead required to launch and manage the Pod (e.g.
+downloading the container image and writing logs to the disk). This can be a significant performance improvement,
+allowing you to run more Pods on a single Node without running into disk I/O bottlenecks. Without this feature enabled,
+you may be limited to running something like 14 Runners on an instance, regardless of instance size, due to disk I/O
+limits. With this feature enabled, you may be able to run 50-100 Runners on a single instance.
+
+The trade-off is that the Pod's data is stored in RAM, which increases its memory usage. Be sure to increase the amount
+of memory allocated to the runner Pod to account for this. This is generally not a problem, as Runners typically use a
+small enough amount of disk space that it can be reasonably stored in the RAM allocated to a single CPU in an EC2
+instance, so it is the CPU that remains the limiting factor in how many Runners can be run on an instance.
+
+:::warning You must configure a memory request for the runner Pod
+
+When using `tmpfs_enabled`, you must configure a memory request for the runner Pod. If you do not, a single Pod would be
+allowed to consume half the Node's memory just for its disk storage.
+
+:::
+
+#### Configure startup timeout via `wait_for_docker_seconds`
+
+When the runner starts and Docker-in-Docker is enabled, the runner waits for the Docker daemon to be ready before
+registering marking itself ready to run jobs. This is done by polling the Docker daemon every second until it is ready.
+The default timeout for this is 120 seconds. If the Docker daemon is not ready within that time, the runner will exit
+with an error. You can configure this timeout by setting `wait_for_docker_seconds` in the `runners` map.
+
+As a general rule, the Docker daemon should be ready within a few seconds of the runner starting. However, particularly
+when there are disk I/O issues (see the `tmpfs_enabled` feature above), the Docker daemon may take longer to respond.
+
+#### Add annotations to runner Pods once they start running a job
+
+You can now configure the runner Pods to add annotations to themselves once they start running a job. The idea is to
+allow you to have idle pods allow themselves to be interrupted, but then mark themselves as uninterruptible once they
+start running a job. This is done by setting the `running_pod_annotations` field in the `runners` map. For example:
+
+```yaml
+running_pod_annotations:
+  # Prevent Karpenter from evicting or disrupting the worker pods while they are running jobs
+  # As of 0.37.0, is not 100% effective due to race conditions.
+  "karpenter.sh/do-not-disrupt": "true"
+```
+
+As noted in the comments above, this was intended to prevent Karpenter from evicting or disrupting the worker pods while
+they are running jobs, while leaving Karpenter free to interrupt idle Runners. However, as of Karpenter 0.37.0, this is
+not 100% effective due to race conditions: Karpenter may decide to terminate the Node the Pod is running on but not
+signal the Pod before it accepts a job and starts running it. Without the availability of transactions or atomic
+operations, this is a difficult problem to solve, and will probably require a more complex solution than just adding
+annotations to the Pods. Nevertheless, this feature remains available for use in other contexts, as well as in the hope
+that it will eventually work with Karpenter.
+
+#### Bugfix: Deploy webhook secret when existing secret is not supplied
+
+Because deploying secrets with Terraform causes the secrets to be stored unencrypted in the Terraform state file, we
+give users the option of creating the configuration secret externally (e.g. via
+[SOPS](https://github.com/getsops/sops)). Unfortunately, at some distant time in the past, when we enabled this option,
+we broke this component insofar as the webhook secret was no longer being deployed when the user did not supply an
+existing secret. This PR fixes that.
+
+The consequence of this bug was that, since the webhook secret was not being deployed, the webhook did not reject
+unauthorized requests. This could have allowed an attacker to trigger the webhook and perform a DOS attack by killing
+jobs as soon as they were accepted from the queue. A more practical and unintentional consequence was if a repo webhook
+was installed alongside an org webhook, it would not keep guard against the webhook receiving the same payload twice if
+one of the webhooks was missing the secret or had the wrong secret.
+
+#### Bugfix: Restore proper order of operations in creating resources
+
+In release 1.454.0 (PR [#1055](https://github.com/cloudposse/terraform-aws-components/pull/1055)), we reorganized the
+RunnerDeployment template in the Helm chart to put the RunnerDeployment resource first, since it is the most important
+resource, merely to improve readability. Unfortunately, the order of operations in creating resources is important, and
+this change broke the deployment by deploying the RunnerDeployment before creating the resources it depends on. This PR
+restores the proper order of operations.
diff --git a/modules/eks/actions-runner-controller/README.md b/modules/eks/actions-runner-controller/README.md
index b9adc5f1a..9d7886ef7 100644
--- a/modules/eks/actions-runner-controller/README.md
+++ b/modules/eks/actions-runner-controller/README.md
@@ -73,19 +73,30 @@ components:
               kubernetes.io/os: "linux"
               kubernetes.io/arch: "amd64"
             type: "repository" # can be either 'organization' or 'repository'
-            dind_enabled: false # If `true`, a Docker sidecar container will be deployed
+            dind_enabled: true # If `true`, a Docker daemon will be started in the runner Pod.
             # To run Docker in Docker (dind), change image to summerwind/actions-runner-dind
             # If not running Docker, change image to summerwind/actions-runner use a smaller image
             image: summerwind/actions-runner-dind
             # `scope` is org name for Organization runners, repo name for Repository runners
             scope: "org/infra"
-            # Tell Karpenter not to evict this pod while it is running a job.
-            # If we do not set this, Karpenter will feel free to terminate the runner while it is running a job,
-            # as part of its consolidation efforts, even when using "on demand" instances.
-            running_pod_annotations:
-              karpenter.sh/do-not-disrupt: "true"
-            min_replicas: 1
+            min_replicas: 0 # Default, overridden by scheduled_overrides below
             max_replicas: 20
+            # Scheduled overrides. See https://github.com/actions/actions-runner-controller/blob/master/docs/automatically-scaling-runners.md#scheduled-overrides
+            # Order is important. The earlier entry is prioritized higher than later entries. So you usually define
+            # one-time overrides at the top of your list, then yearly, monthly, weekly, and lastly daily overrides.
+            scheduled_overrides:
+              # Override the daily override on the weekends
+              - start_time: "2024-07-06T00:00:00-08:00" # Start of Saturday morning Pacific Standard Time
+                end_time: "2024-07-07T23:59:59-07:00" # End of Sunday night Pacific Daylight Time
+                min_replicas: 0
+                recurrence_rule:
+                  frequency: "Weekly"
+              # Keep a warm pool of runners during normal working hours
+              - start_time: "2024-07-01T09:00:00-08:00" # 9am Pacific Standard Time (8am PDT), start of workday
+                end_time: "2024-07-01T17:00:00-07:00" # 5pm Pacific Daylight Time (6pm PST), end of workday
+                min_replicas: 2
+                recurrence_rule:
+                  frequency: "Daily"
             scale_down_delay_seconds: 100
             resources:
               limits:
@@ -95,13 +106,12 @@ components:
                 cpu: 100m
                 memory: 128Mi
             webhook_driven_scaling_enabled: true
-            # The name `webhook_startup_timeout` is misleading.
-            # It is actually the duration after which a job will be considered completed,
+            # max_duration is the duration after which a job will be considered completed,
             # (and the runner killed) even if the webhook has not received a "job completed" event.
             # This is to ensure that if an event is missed, it does not leave the runner running forever.
             # Set it long enough to cover the longest job you expect to run and then some.
             # See https://github.com/actions/actions-runner-controller/blob/9afd93065fa8b1f87296f0dcdf0c2753a0548cb7/docs/automatically-scaling-runners.md?plain=1#L264-L268
-            webhook_startup_timeout: "90m"
+            max_duration: "90m"
             # Pull-driven scaling is obsolete and should not be used.
             pull_driven_scaling_enabled: false
             # Labels are not case-sensitive to GitHub, but *are* case-sensitive
@@ -156,7 +166,7 @@ components:
           #      cpu: 100m
           #      memory: 128Mi
           #  webhook_driven_scaling_enabled: true
-          #  webhook_startup_timeout: "90m"
+          #  max_duration: "90m"
           #  pull_driven_scaling_enabled: false
           #  # Labels are not case-sensitive to GitHub, but *are* case-sensitive
           #  # to the webhook based autoscaler, which requires exact matches
@@ -353,8 +363,8 @@ is a delivery (of a "ping" event) with a green check mark. If not, verify all th
 
 The `HorizontalRunnerAutoscaler scaleUpTriggers.duration` (see [Webhook Driven Scaling documentation](https://github.
 com/actions/actions-runner-controller/blob/master/docs/automatically-scaling-runners.md#webhook-driven-scaling)) is
-controlled by the `webhook_startup_timeout` setting for each Runner. The purpose of this timeout is to ensure, in case a
-job cancellation or termination event gets missed, that the resulting idle runner eventually gets terminated.
+controlled by the `max_duration` setting for each Runner. The purpose of this timeout is to ensure, in case a job
+cancellation or termination event gets missed, that the resulting idle runner eventually gets terminated.
 
 #### How the Autoscaler Determines the Desired Runner Pool Size
 
@@ -371,50 +381,49 @@ will scale down the pool by 2 instead of 1: once because the capacity reservatio
 finished. This will also cause starvation of waiting jobs, because the next in line will have its timeout timer started
 but will not actually start running because no runner is available. And if `minReplicas` is set to zero, the pool will
 scale down to zero before finishing all the jobs, leaving some waiting indefinitely. This is why it is important to set
-the `webhook_startup_timeout` to a time long enough to cover the full time a job may have to wait between the time it is
-queued and the time it finishes, assuming that the HRA scales up the pool by 1 and runs the job on the new runner.
+the `max_duration` to a time long enough to cover the full time a job may have to wait between the time it is queued and
+the time it finishes, assuming that the HRA scales up the pool by 1 and runs the job on the new runner.
 
 :::info If there are more jobs queued than there are runners allowed by `maxReplicas`, the timeout timer does not start
 on the capacity reservation until enough reservations ahead of it are removed for it to be considered as representing
-and active job. Although there are some edge cases regarding `webhook_startup_timeout` that seem not to be covered
-properly (see
+and active job. Although there are some edge cases regarding `max_duration` that seem not to be covered properly (see
 [actions-runner-controller issue #2466](https://github.com/actions/actions-runner-controller/issues/2466)), they only
 merit adding a few extra minutes to the timeout.
 
 :::
 
-### Recommended `webhook_startup_timeout` Duration
+### Recommended `max_duration` Duration
 
-#### Consequences of Too Short of a `webhook_startup_timeout` Duration
+#### Consequences of Too Short of a `max_duration` Duration
 
-If you set `webhook_startup_timeout` to too short a duration, the Horizontal Runner Autoscaler will cancel capacity
-reservations for jobs that have not yet finished, and the pool will become too small. This will be most serious if you
-have set `minReplicas = 0` because in this case, jobs will be left in the queue indefinitely. With a higher value of
+If you set `max_duration` to too short a duration, the Horizontal Runner Autoscaler will cancel capacity reservations
+for jobs that have not yet finished, and the pool will become too small. This will be most serious if you have set
+`minReplicas = 0` because in this case, jobs will be left in the queue indefinitely. With a higher value of
 `minReplicas`, the pool will eventually make it through all the queued jobs, but not as quickly as intended due to the
 incorrectly reduced capacity.
 
-#### Consequences of Too Long of a `webhook_startup_timeout` Duration
+#### Consequences of Too Long of a `max_duration` Duration
 
 If the Horizontal Runner Autoscaler misses a scale-down event (which can happen because events do not have delivery
-guarantees), a runner may be left running idly for as long as the `webhook_startup_timeout` duration. The only problem
-with this is the added expense of leaving the idle runner running.
+guarantees), a runner may be left running idly for as long as the `max_duration` duration. The only problem with this is
+the added expense of leaving the idle runner running.
 
 #### Recommendation
 
-As a result, we recommend setting `webhook_startup_timeout` to a period long enough to cover:
+As a result, we recommend setting `max_duration` to a period long enough to cover:
 
 - The time it takes for the HRA to scale up the pool and make a new runner available
 - The time it takes for the runner to pick up the job from GitHub
 - The time it takes for the job to start running on the new runner
 - The maximum time a job might take
 
-Because the consequences of expiring a capacity reservation before the job is finished are so severe, we recommend
-setting `webhook_startup_timeout` to a period at least 30 minutes longer than you expect the longest job to take.
-Remember, when everything works properly, the HRA will scale down the pool as jobs finish, so there is little cost to
-setting a long duration, and the cost looks even smaller by comparison to the cost of having too short a duration.
+Because the consequences of expiring a capacity reservation before the job is finished can be severe, we recommend
+setting `max_duration` to a period at least 30 minutes longer than you expect the longest job to take. Remember, when
+everything works properly, the HRA will scale down the pool as jobs finish, so there is little cost to setting a long
+duration, and the cost looks even smaller by comparison to the cost of having too short a duration.
 
-For lightly used runner pools expecting only short jobs, you can set `webhook_startup_timeout` to `"30m"`. As a rule of
-thumb, we recommend setting `maxReplicas` high enough that jobs never wait on the queue more than an hour.
+For lightly used runner pools expecting only short jobs, you can set `max_duration` to `"30m"`. As a rule of thumb, we
+recommend setting `maxReplicas` high enough that jobs never wait on the queue more than an hour.
 
 ### Interaction with Karpenter or other EKS autoscaling solutions
 
@@ -559,7 +568,7 @@ documentation for further details.
 | <a name="input_regex_replace_chars"></a> [regex\_replace\_chars](#input\_regex\_replace\_chars) | Terraform regular expression (regex) string.<br>Characters matching the regex will be removed from the ID elements.<br>If not set, `"/[^a-zA-Z0-9-]/"` is used to remove all characters other than hyphens, letters and digits. | `string` | `null` | no |
 | <a name="input_region"></a> [region](#input\_region) | AWS Region. | `string` | n/a | yes |
 | <a name="input_resources"></a> [resources](#input\_resources) | The cpu and memory of the deployment's limits and requests. | <pre>object({<br>    limits = object({<br>      cpu    = string<br>      memory = string<br>    })<br>    requests = object({<br>      cpu    = string<br>      memory = string<br>    })<br>  })</pre> | n/a | yes |
-| <a name="input_runners"></a> [runners](#input\_runners) | Map of Action Runner configurations, with the key being the name of the runner. Please note that the name must be in<br>kebab-case.<br><br>For example:<pre>hcl<br>organization_runner = {<br>  type = "organization" # can be either 'organization' or 'repository'<br>  dind_enabled: true # A Docker daemon will be started in the runner Pod<br>  image: summerwind/actions-runner-dind # If dind_enabled=false, set this to 'summerwind/actions-runner'<br>  scope = "ACME"  # org name for Organization runners, repo name for Repository runners<br>  group = "core-automation" # Optional. Assigns the runners to a runner group, for access control.<br>  scale_down_delay_seconds = 300<br>  min_replicas = 1<br>  max_replicas = 5<br>  labels = [<br>    "Ubuntu",<br>    "core-automation",<br>  ]<br>}</pre> | <pre>map(object({<br>    type            = string<br>    scope           = string<br>    group           = optional(string, null)<br>    image           = optional(string, "summerwind/actions-runner-dind")<br>    dind_enabled    = optional(bool, true)<br>    node_selector   = optional(map(string), {})<br>    pod_annotations = optional(map(string), {})<br><br>    # running_pod_annotations are only applied to the pods once they start running a job<br>    running_pod_annotations = optional(map(string), {})<br><br>    # affinity is too complex to model. Whatever you assigned affinity will be copied<br>    # to the runner Pod spec.<br>    affinity = optional(any)<br><br>    tolerations = optional(list(object({<br>      key      = string<br>      operator = string<br>      value    = optional(string, null)<br>      effect   = string<br>    })), [])<br>    scale_down_delay_seconds = optional(number, 300)<br>    min_replicas             = number<br>    max_replicas             = number<br>    busy_metrics = optional(object({<br>      scale_up_threshold    = string<br>      scale_down_threshold  = string<br>      scale_up_adjustment   = optional(string)<br>      scale_down_adjustment = optional(string)<br>      scale_up_factor       = optional(string)<br>      scale_down_factor     = optional(string)<br>    }))<br>    webhook_driven_scaling_enabled = optional(bool, true)<br>    # The name `webhook_startup_timeout` is misleading.<br>    # It is actually the duration after which a job will be considered completed,<br>    # (and the runner killed) even if the webhook has not received a "job completed" event.<br>    # This is to ensure that if an event is missed, it does not leave the runner running forever.<br>    # Set it long enough to cover the longest job you expect to run and then some.<br>    # See https://github.com/actions/actions-runner-controller/blob/9afd93065fa8b1f87296f0dcdf0c2753a0548cb7/docs/automatically-scaling-runners.md?plain=1#L264-L268<br>    webhook_startup_timeout     = optional(string, "1h")<br>    pull_driven_scaling_enabled = optional(bool, false)<br>    labels                      = optional(list(string), [])<br>    docker_storage              = optional(string, null)<br>    # storage is deprecated in favor of docker_storage, since it is only storage for the Docker daemon<br>    storage     = optional(string, null)<br>    pvc_enabled = optional(bool, false)<br>    resources = optional(object({<br>      limits = optional(object({<br>        cpu               = optional(string, "1")<br>        memory            = optional(string, "1Gi")<br>        ephemeral_storage = optional(string, "10Gi")<br>      }), {})<br>      requests = optional(object({<br>        cpu               = optional(string, "500m")<br>        memory            = optional(string, "256Mi")<br>        ephemeral_storage = optional(string, "1Gi")<br>      }), {})<br>    }), {})<br>  }))</pre> | n/a | yes |
+| <a name="input_runners"></a> [runners](#input\_runners) | Map of Action Runner configurations, with the key being the name of the runner. Please note that the name must be in<br>kebab-case.<br><br>For example:<pre>hcl<br>organization_runner = {<br>  type = "organization" # can be either 'organization' or 'repository'<br>  dind_enabled: true # A Docker daemon will be started in the runner Pod<br>  image: summerwind/actions-runner-dind # If dind_enabled=false, set this to 'summerwind/actions-runner'<br>  scope = "ACME"  # org name for Organization runners, repo name for Repository runners<br>  group = "core-automation" # Optional. Assigns the runners to a runner group, for access control.<br>  scale_down_delay_seconds = 300<br>  min_replicas = 1<br>  max_replicas = 5<br>  labels = [<br>    "Ubuntu",<br>    "core-automation",<br>  ]<br>}</pre> | <pre>map(object({<br>    type            = string<br>    scope           = string<br>    group           = optional(string, null)<br>    image           = optional(string, "summerwind/actions-runner-dind")<br>    dind_enabled    = optional(bool, true)<br>    node_selector   = optional(map(string), {})<br>    pod_annotations = optional(map(string), {})<br><br>    # running_pod_annotations are only applied to the pods once they start running a job<br>    running_pod_annotations = optional(map(string), {})<br><br>    # affinity is too complex to model. Whatever you assigned affinity will be copied<br>    # to the runner Pod spec.<br>    affinity = optional(any)<br><br>    tolerations = optional(list(object({<br>      key      = string<br>      operator = string<br>      value    = optional(string, null)<br>      effect   = string<br>    })), [])<br>    scale_down_delay_seconds = optional(number, 300)<br>    min_replicas             = number<br>    max_replicas             = number<br>    # Scheduled overrides. See https://github.com/actions/actions-runner-controller/blob/master/docs/automatically-scaling-runners.md#scheduled-overrides<br>    # Order is important. The earlier entry is prioritized higher than later entries. So you usually define<br>    # one-time overrides at the top of your list, then yearly, monthly, weekly, and lastly daily overrides.<br>    scheduled_overrides = optional(list(object({<br>      start_time   = string # ISO 8601 format, eg,  "2021-06-01T00:00:00+09:00"<br>      end_time     = string # ISO 8601 format, eg,  "2021-06-01T00:00:00+09:00"<br>      min_replicas = optional(number)<br>      max_replicas = optional(number)<br>      recurrence_rule = optional(object({<br>        frequency  = string           # One of Daily, Weekly, Monthly, Yearly<br>        until_time = optional(string) # ISO 8601 format time after which the schedule will no longer apply<br>      }))<br>    })), [])<br>    busy_metrics = optional(object({<br>      scale_up_threshold    = string<br>      scale_down_threshold  = string<br>      scale_up_adjustment   = optional(string)<br>      scale_down_adjustment = optional(string)<br>      scale_up_factor       = optional(string)<br>      scale_down_factor     = optional(string)<br>    }))<br>    webhook_driven_scaling_enabled = optional(bool, true)<br>    # max_duration is the duration after which a job will be considered completed,<br>    # even if the webhook has not received a "job completed" event.<br>    # This is to ensure that if an event is missed, it does not leave the runner running forever.<br>    # Set it long enough to cover the longest job you expect to run and then some.<br>    # See https://github.com/actions/actions-runner-controller/blob/9afd93065fa8b1f87296f0dcdf0c2753a0548cb7/docs/automatically-scaling-runners.md?plain=1#L264-L268<br>    # Defaults to 1 hour programmatically (to be able to detect if both max_duration and webhook_startup_timeout are set).<br>    max_duration = optional(string)<br>    # The name `webhook_startup_timeout` was misleading and has been deprecated.<br>    # It has been renamed `max_duration`.<br>    webhook_startup_timeout = optional(string)<br>    # Adjust the time (in seconds) to wait for the Docker in Docker daemon to become responsive.<br>    wait_for_docker_seconds     = optional(string, "")<br>    pull_driven_scaling_enabled = optional(bool, false)<br>    labels                      = optional(list(string), [])<br>    # If not null, `docker_storage` specifies the size (as `go` string) of<br>    # an ephemeral (default storage class) Persistent Volume to allocate for the Docker daemon.<br>    # Takes precedence over `tmpfs_enabled` for the Docker daemon storage.<br>    docker_storage = optional(string, null)<br>    # storage is deprecated in favor of docker_storage, since it is only storage for the Docker daemon<br>    storage = optional(string, null)<br>    # If `pvc_enabled` is true, a Persistent Volume Claim will be created for the runner<br>    # and mounted at /home/runner/work/shared. This is useful for sharing data between runners.<br>    pvc_enabled = optional(bool, false)<br>    # If `tmpfs_enabled` is `true`, both the runner and the docker daemon will use a tmpfs volume,<br>    # meaning that all data will be stored in RAM rather than on disk, bypassing disk I/O limitations,<br>    # but what would have been disk usage is now additional memory usage. You must specify memory<br>    # requests and limits when using tmpfs or else the Pod will likely crash the Node.<br>    tmpfs_enabled = optional(bool)<br>    resources = optional(object({<br>      limits = optional(object({<br>        cpu               = optional(string, "1")<br>        memory            = optional(string, "1Gi")<br>        ephemeral_storage = optional(string, "10Gi")<br>      }), {})<br>      requests = optional(object({<br>        cpu               = optional(string, "500m")<br>        memory            = optional(string, "256Mi")<br>        ephemeral_storage = optional(string, "1Gi")<br>      }), {})<br>    }), {})<br>  }))</pre> | n/a | yes |
 | <a name="input_s3_bucket_arns"></a> [s3\_bucket\_arns](#input\_s3\_bucket\_arns) | List of ARNs of S3 Buckets to which the runners will have read-write access to. | `list(string)` | `[]` | no |
 | <a name="input_ssm_docker_config_json_path"></a> [ssm\_docker\_config\_json\_path](#input\_ssm\_docker\_config\_json\_path) | SSM path to the Docker config JSON | `string` | `null` | no |
 | <a name="input_ssm_github_secret_path"></a> [ssm\_github\_secret\_path](#input\_ssm\_github\_secret\_path) | The path in SSM to the GitHub app private key file contents or GitHub PAT token. | `string` | `""` | no |
diff --git a/modules/eks/actions-runner-controller/charts/actions-runner/Chart.yaml b/modules/eks/actions-runner-controller/charts/actions-runner/Chart.yaml
index 1ec5333d2..95f7916b1 100644
--- a/modules/eks/actions-runner-controller/charts/actions-runner/Chart.yaml
+++ b/modules/eks/actions-runner-controller/charts/actions-runner/Chart.yaml
@@ -15,7 +15,7 @@ type: application
 # This is the chart version. This version number should be incremented each time you make changes
 # to the chart and its templates, including the app version.
 # Versions are expected to follow Semantic Versioning (https://semver.org/)
-version: 0.2.0
+version: 0.3.0
 
 # This chart only deploys Resources for actions-runner-controller, so app version does not really apply.
 # We use Resource API version instead.
diff --git a/modules/eks/actions-runner-controller/charts/actions-runner/templates/horizontalrunnerautoscaler.yaml b/modules/eks/actions-runner-controller/charts/actions-runner/templates/horizontalrunnerautoscaler.yaml
index fa5c96452..eda4813a7 100644
--- a/modules/eks/actions-runner-controller/charts/actions-runner/templates/horizontalrunnerautoscaler.yaml
+++ b/modules/eks/actions-runner-controller/charts/actions-runner/templates/horizontalrunnerautoscaler.yaml
@@ -10,6 +10,27 @@ spec:
     name: {{ .Values.release_name }}
   minReplicas: {{ .Values.min_replicas }}
   maxReplicas: {{ .Values.max_replicas }}
+  {{- with .Values.scheduled_overrides }}
+  scheduledOverrides:
+    {{- range . }}
+    - startTime: "{{ .start_time }}"
+      endTime: "{{ .end_time }}"
+      {{- with .recurrence_rule }}
+      recurrenceRule:
+        frequency: {{ .frequency }}
+        {{- if .until_time }}
+        untilTime: "{{ .until_time }}"
+        {{- end }}
+      {{- end }}
+      {{- with .min_replicas }}
+      minReplicas: {{ . }}
+      {{- end }}
+      {{- with .max_replicas }}
+      maxReplicas: {{ . }}
+      {{- end }}
+    {{- end }}
+  {{- end }}
+
   {{- if .Values.pull_driven_scaling_enabled }}
   metrics:
     - type: PercentageRunnersBusy
@@ -31,7 +52,7 @@ spec:
   - githubEvent:
       workflowJob: {}
     amount: 1
-    {{- if .Values.webhook_startup_timeout }}
-    duration: "{{ .Values.webhook_startup_timeout }}"
+    {{- if .Values.max_duration }}
+    duration: "{{ .Values.max_duration }}"
     {{- end }}
   {{- end }}
diff --git a/modules/eks/actions-runner-controller/charts/actions-runner/templates/runnerdeployment.yaml b/modules/eks/actions-runner-controller/charts/actions-runner/templates/runnerdeployment.yaml
index 1321f22c8..27077abae 100644
--- a/modules/eks/actions-runner-controller/charts/actions-runner/templates/runnerdeployment.yaml
+++ b/modules/eks/actions-runner-controller/charts/actions-runner/templates/runnerdeployment.yaml
@@ -1,7 +1,106 @@
+{{- $release_name := .Values.release_name }}
+{{- /* To avoid the situation where a value evaluates to
+a string value of "false", which has a boolean value of true,
+we explicitly convert to boolean based on the string value */}}
+{{- $use_tmpfs := eq (printf "%v" .Values.tmpfs_enabled) "true" }}
+{{- $use_pvc := eq (printf "%v" .Values.pvc_enabled) "true" }}
+{{- $use_dockerconfig := eq (printf "%v" .Values.docker_config_json_enabled) "true" }}
+{{- $use_dind := eq (printf "%v" .Values.dind_enabled) "true" }}
+{{- /* Historically, the docker daemon was run in a sidecar.
+       At some point, the option became available to use dockerdWithinRunnerContainer,
+       and we now default to that. In fact, at this moment, the sidecar option is not configurable.
+       We keep the logic here in case we need to revert to the sidecar option. */}}
+{{- $use_dind_in_runner := $use_dind }}
+{{- if $use_pvc }}
+# Persistent Volumes can be used for image caching
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: {{ $release_name }}
+spec:
+  accessModes:
+    - ReadWriteMany
+  # StorageClassName comes from efs-controller and must be deployed first.
+  storageClassName: efs-sc
+  resources:
+    requests:
+      # EFS is not actually storage constrained, but this storage request is
+      # required. 100Gi is a ballpark for how much we initially request, but this
+      # may grow. We are responsible for docker pruning this periodically to
+      # save space.
+      storage: 100Gi
+{{- end }}
+{{- if $use_dockerconfig }}
+---
+apiVersion: v1
+kind: Secret
+metadata:
+  name: {{ $release_name }}-regcred
+type: kubernetes.io/dockerconfigjson
+data:
+  .dockerconfigjson: {{ .Values.docker_config_json }}
+{{- end }}
+{{- with .Values.running_pod_annotations }}
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: {{ $release_name }}-runner-hooks
+data:
+  annotate.sh: |
+    #!/bin/bash
+
+    # If we had kubectl and a KUBECONFIG, we could do this:
+    #   kubectl annotate pod $HOSTNAME 'karpenter.sh/do-not-evict="true"' --overwrite
+    #   kubectl annotate pod $HOSTNAME 'karpenter.sh/do-not-disrupt="true"' --overwrite
+
+    # This is the same thing, the hard way
+
+    # Metadata about the pod
+    NAMESPACE=$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace)
+    POD_NAME=$(hostname)
+
+    # Kubernetes API URL
+    API_URL="https://kubernetes.default.svc"
+
+    # Read the service account token
+    TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
+
+    # Content type
+    CONTENT_TYPE="application/merge-patch+json"
+
+    PATCH_JSON=$(cat <<'EOF'
+    {
+      "metadata": {
+        "annotations":
+         {{- . | toJson | nindent 10 }}
+      }
+    }
+    EOF
+    )
+
+    # Use curl to patch the pod
+    curl -sSk -X PATCH \
+      -H "Authorization: Bearer $TOKEN" \
+      -H "Content-Type: $CONTENT_TYPE" \
+      -H "Accept: application/json" \
+      -d "$PATCH_JSON" \
+      "$API_URL/api/v1/namespaces/$NAMESPACE/pods/$POD_NAME"  | jq .metadata.annotations \
+    && AT=$(date -u +"%Y-%m-%dT%H:%M:%S.%3Nz") || code=$?
+
+    if [ -z "$AT" ]; then
+      echo "Failed (curl exited with status ${code}) to annotate pod with annotations:\n  '%s'\n" '{{ . | toJson }}'
+      exit $code
+    else
+      printf "Annotated pod at %s with annotations:\n  '%s'\n" "$AT" '{{ . | toJson }}'
+    fi
+
+---
+{{ end }}
 apiVersion: actions.summerwind.dev/v1alpha1
 kind: RunnerDeployment
 metadata:
-  name: {{ .Values.release_name }}
+  name: {{ $release_name }}
 spec:
   # Do not use `replicas` with HorizontalRunnerAutoscaler
   # See https://github.com/actions-runner-controller/actions-runner-controller/issues/206#issuecomment-748601907
@@ -13,7 +112,7 @@ spec:
         {{- toYaml . | nindent 8 }}
     {{- end }}
     spec:
-      {{- if .Values.docker_config_json_enabled }}
+      {{- if $use_dockerconfig }}
       # secrets volumeMount are always mounted readOnly so config.json has to be copied to the correct directory
       # https://github.com/kubernetes/kubernetes/issues/62099
       # https://github.com/actions/actions-runner-controller/issues/2123#issuecomment-1527077517
@@ -38,8 +137,13 @@ spec:
         # It should be less than the terminationGracePeriodSeconds above so that it has time
         # to report its status and deregister itself from the runner pool.
         - name: RUNNER_GRACEFUL_STOP_TIMEOUT
-          value: "90"
-
+          value: "80"
+        {{- with .Values.wait_for_docker_seconds }}
+        # If Docker is taking too long to start (which is likely due to some other performance issue),
+        # increase the timeout from the default of 120 seconds.
+        - name: WAIT_FOR_DOCKER_SECONDS
+          value: "{{ . }}"
+        {{- end }}
       # You could reserve nodes for runners by labeling and tainting nodes with
       #   node-role.kubernetes.io/actions-runner
       # and then adding the following to this RunnerDeployment
@@ -96,16 +200,16 @@ spec:
         # to explicitly include the "self-hosted" label in order to match the
         # workflow_job to it.
         - self-hosted
-      {{- range .Values.labels }}
+        {{- range .Values.labels }}
         - {{ . | quote }}
-      {{- end }}
+        {{- end }}
       # dockerdWithinRunnerContainer = false means access to a Docker daemon is provided by a sidecar container.
-      dockerdWithinRunnerContainer: {{ .Values.dind_enabled }}
+      dockerdWithinRunnerContainer: {{ $use_dind_in_runner }}
       image: {{ .Values.image | quote }}
       imagePullPolicy: IfNotPresent
-      {{- if  .Values.docker_config_json_enabled }}
+      {{- if  $use_dockerconfig }}
       imagePullSecrets:
-        - name: {{ .Values.release_name }}-regcred
+        - name: {{ $release_name }}-regcred
       {{- end }}
       serviceAccountName: {{ .Values.service_account_name }}
       resources:
@@ -121,28 +225,48 @@ spec:
           {{- if index .Values.resources.requests "ephemeral_storage" }}
           ephemeral-storage: {{ .Values.resources.requests.ephemeral_storage }}
           {{- end }}
-      {{- if and .Values.dind_enabled .Values.docker_storage }}
+      {{- if and (not $use_dind_in_runner) (or .Values.docker_storage $use_tmpfs) }}
+      {{- /* dockerVolumeMounts are mounted into the docker sidecar, and ignored if running with dockerdWithinRunnerContainer */}}
       dockerVolumeMounts:
         - mountPath: /var/lib/docker
           name: docker-volume
       {{- end }}
-      {{- if  or (.Values.pvc_enabled) (.Values.docker_config_json_enabled) }}
+      {{- if or $use_pvc $use_dockerconfig $use_tmpfs }}
       volumeMounts:
-        {{- if .Values.pvc_enabled }}
+        {{- if and $use_dind_in_runner (or .Values.docker_storage $use_tmpfs) }}
+        - mountPath: /var/lib/docker
+          name: docker-volume
+        {{- end }}
+        {{- if $use_pvc }}
         - mountPath: /home/runner/work/shared
           name: shared-volume
         {{- end }}
-        {{- if .Values.docker_config_json_enabled }}
+        {{- if $use_dockerconfig }}
         - mountPath: /home/.docker/
           name: docker-secret
         - mountPath: /home/runner/.docker
           name: docker-config-volume
         {{- end }}
+        {{- if $use_tmpfs }}
+        - mountPath: /tmp
+          name: tmp
+        - mountPath: /runner/_work
+          name: work
+        {{- end }}
       {{- end }}{{/* End of volumeMounts */}}
-      {{- if or (and .Values.dind_enabled .Values.docker_storage) (.Values.pvc_enabled) (.Values.docker_config_json_enabled) (not (empty .Values.running_pod_annotations)) }}
+      {{- if or (and $use_dind (or .Values.docker_storage $use_tmpfs)) $use_pvc $use_dockerconfig (not (empty .Values.running_pod_annotations)) }}
       volumes:
-        {{- if and .Values.dind_enabled .Values.docker_storage }}
+        {{- if $use_tmpfs }}
+        - name: work
+          emptyDir:
+            medium: Memory
+        - name: tmp
+          emptyDir:
+            medium: Memory
+        {{- end }}
+        {{- if and $use_dind (or .Values.docker_storage $use_tmpfs) }}
         - name: docker-volume
+          {{- if .Values.docker_storage }}
           ephemeral:
             volumeClaimTemplate:
               spec:
@@ -150,16 +274,20 @@ spec:
                 resources:
                   requests:
                     storage: {{ .Values.docker_storage }}
+          {{- else }}
+          emptyDir:
+            medium: Memory
+          {{- end }}
         {{- end }}
-        {{- if .Values.pvc_enabled }}
+        {{- if $use_pvc }}
         - name: shared-volume
           persistentVolumeClaim:
-            claimName: {{ .Values.release_name }}
+            claimName: {{ $release_name }}
         {{- end }}
-        {{- if .Values.docker_config_json_enabled }}
+        {{- if $use_dockerconfig }}
         - name: docker-secret
           secret:
-            secretName: {{ .Values.release_name }}-regcred
+            secretName: {{ $release_name }}-regcred
             items:
               - key: .dockerconfigjson
                 path: config.json
@@ -169,85 +297,7 @@ spec:
         {{- with .Values.running_pod_annotations }}
         - name: hooks
           configMap:
-            name: runner-hooks
+            name: {{ $release_name }}-runner-hooks
             defaultMode: 0755  # Set execute permissions for all files
         {{- end }}
       {{- end }}{{/* End of volumes */}}
-{{- if .Values.pvc_enabled }}
----
-# Persistent Volumes can be used for image caching
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: {{ .Values.release_name }}
-spec:
-  accessModes:
-    - ReadWriteMany
-  # StorageClassName comes from efs-controller and must be deployed first.
-  storageClassName: efs-sc
-  resources:
-    requests:
-      # EFS is not actually storage constrained, but this storage request is
-      # required. 100Gi is a ballpark for how much we initially request, but this
-      # may grow. We are responsible for docker pruning this periodically to
-      # save space.
-      storage: 100Gi
-{{- end }}
-{{- if .Values.docker_config_json_enabled }}
----
-apiVersion: v1
-kind: Secret
-metadata:
-  name: {{ .Values.release_name }}-regcred
-type: kubernetes.io/dockerconfigjson
-data:
-  .dockerconfigjson: {{ .Values.docker_config_json }}
-{{- end }}
-{{- with .Values.running_pod_annotations }}
----
-apiVersion: v1
-kind: ConfigMap
-metadata:
-  name: runner-hooks
-data:
-  annotate.sh: |
-    #!/bin/bash
-
-    # If we had kubectl and a KUBECONFIG, we could do this:
-    #   kubectl annotate pod $HOSTNAME 'karpenter.sh/do-not-evict="true"' --overwrite
-    #   kubectl annotate pod $HOSTNAME 'karpenter.sh/do-not-disrupt="true"' --overwrite
-
-    # This is the same thing, the hard way
-
-    # Metadata about the pod
-    NAMESPACE=$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace)
-    POD_NAME=$(hostname)
-
-    # Kubernetes API URL
-    API_URL="https://kubernetes.default.svc"
-
-    # Read the service account token
-    TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
-
-    # Content type
-    CONTENT_TYPE="application/merge-patch+json"
-
-    PATCH_JSON=$(cat <<EOF
-    {
-      "metadata": {
-        "annotations":
-         {{- . | toJson | nindent 10 }}
-      }
-    }
-    EOF
-    )
-
-    # Use curl to patch the pod
-      curl -sSk -X PATCH \
-    -H "Authorization: Bearer $TOKEN" \
-    -H "Content-Type: $CONTENT_TYPE" \
-    -H "Accept: application/json" \
-      -d "$PATCH_JSON" \
-      "$API_URL/api/v1/namespaces/$NAMESPACE/pods/$POD_NAME"  | jq .metadata.annotations
-
-{{ end }}
diff --git a/modules/eks/actions-runner-controller/charts/actions-runner/values.yaml b/modules/eks/actions-runner-controller/charts/actions-runner/values.yaml
index 44a5e14b4..c5c96f3c1 100644
--- a/modules/eks/actions-runner-controller/charts/actions-runner/values.yaml
+++ b/modules/eks/actions-runner-controller/charts/actions-runner/values.yaml
@@ -24,7 +24,7 @@ image: summerwind/actions-runner-dind
 
 pvc_enabled: false
 webhook_driven_scaling_enabled: true
-webhook_startup_timeout: "90m"
+max_duration: "90m"
 pull_driven_scaling_enabled: false
 #labels:
 #  - "Ubuntu"
diff --git a/modules/eks/actions-runner-controller/main.tf b/modules/eks/actions-runner-controller/main.tf
index ba4d25e70..e7b7257e5 100644
--- a/modules/eks/actions-runner-controller/main.tf
+++ b/modules/eks/actions-runner-controller/main.tf
@@ -153,6 +153,9 @@ module "actions_runner_controller" {
         enabled                   = var.webhook.enabled
         queueLimit                = var.webhook.queue_limit
         useRunnerGroupsVisibility = local.runner_groups_enabled
+        secret = {
+          create = local.create_secret
+        }
         ingress = {
           enabled = var.webhook.enabled
           hosts = [
@@ -230,10 +233,13 @@ module "actions_runner" {
       scale_down_delay_seconds       = each.value.scale_down_delay_seconds
       min_replicas                   = each.value.min_replicas
       max_replicas                   = each.value.max_replicas
+      scheduled_overrides            = each.value.scheduled_overrides
       webhook_driven_scaling_enabled = each.value.webhook_driven_scaling_enabled
-      webhook_startup_timeout        = lookup(each.value, "webhook_startup_timeout", "")
+      max_duration                   = coalesce(each.value.webhook_startup_timeout, each.value.max_duration, "1h")
+      wait_for_docker_seconds        = each.value.wait_for_docker_seconds
       pull_driven_scaling_enabled    = each.value.pull_driven_scaling_enabled
       pvc_enabled                    = each.value.pvc_enabled
+      tmpfs_enabled                  = each.value.tmpfs_enabled
       node_selector                  = each.value.node_selector
       affinity                       = each.value.affinity
       tolerations                    = each.value.tolerations
diff --git a/modules/eks/actions-runner-controller/outputs.tf b/modules/eks/actions-runner-controller/outputs.tf
index 2de292578..0c6853869 100644
--- a/modules/eks/actions-runner-controller/outputs.tf
+++ b/modules/eks/actions-runner-controller/outputs.tf
@@ -1,6 +1,29 @@
 output "metadata" {
   value       = module.actions_runner_controller.metadata
   description = "Block status of the deployed release"
+
+  precondition {
+    condition = length([
+      for k, v in var.runners : k if v.webhook_startup_timeout != null && v.max_duration != null
+    ]) == 0
+    error_message = <<-EOT
+        The input var.runners[runner].webhook_startup_timeout is deprecated and replaced by var.runners[runner].max_duration.
+        You may not set both values at the same time, but the following runners have both values set:
+          ${join("\n  ", [for k, v in var.runners : k if v.webhook_startup_timeout != null && v.max_duration != null])}
+
+       EOT
+  }
+  precondition {
+    condition = length([
+      for k, v in var.runners : k if v.storage != null && v.docker_storage != null
+    ]) == 0
+    error_message = <<-EOT
+        The input var.runners[runner].storage is deprecated and replaced by var.runners[runner].docker_storage.
+        You may not set both values at the same time, but the following runners have both values set:
+          ${join("\n  ", [for k, v in var.runners : k if v.storage != null && v.docker_storage != null])}
+
+       EOT
+  }
 }
 
 output "metadata_action_runner_releases" {
diff --git a/modules/eks/actions-runner-controller/resources/values.yaml b/modules/eks/actions-runner-controller/resources/values.yaml
index 6132ae7a5..f4a9db43d 100644
--- a/modules/eks/actions-runner-controller/resources/values.yaml
+++ b/modules/eks/actions-runner-controller/resources/values.yaml
@@ -1,23 +1,24 @@
 authSecret:
   create: false
-  name: controller-manager
+  # Use default name, or set via var.existing_kubernetes_secret_name
 scope:
   # If true, the controller will only watch custom resources in a single namespace,
   # which by default is the namespace the controller is in.
   # This provides the ability to run multiple controllers in different namespaces
   # with different TOKENS to get around GitHub API rate limits, among other things.
   singleNamespace: true
-syncPeriod: 120s
+# syncPeriod sets the period in which the controller reconciles the desired runners count.
+# The default value is 60 seconds.
+# syncPeriod: 120s
 
 githubWebhookServer:
   enabled: false
-  syncPeriod: 120s
   secret:
     # Webhook secret, used to authenticate incoming webhook events from GitHub
     # When using Sops, stored in same SopsSecret as authSecret under key `github_webhook_secret_token`
+    # with name set via var.existing_kubernetes_secret_name. Otherwise, use default name.
     enabled: true
     create: false
-    name: "controller-manager"
   useRunnerGroupsVisibility: false
   ingress:
     enabled: false
diff --git a/modules/eks/actions-runner-controller/variables.tf b/modules/eks/actions-runner-controller/variables.tf
index 57117c334..9e7efe793 100644
--- a/modules/eks/actions-runner-controller/variables.tf
+++ b/modules/eks/actions-runner-controller/variables.tf
@@ -172,6 +172,19 @@ variable "runners" {
     scale_down_delay_seconds = optional(number, 300)
     min_replicas             = number
     max_replicas             = number
+    # Scheduled overrides. See https://github.com/actions/actions-runner-controller/blob/master/docs/automatically-scaling-runners.md#scheduled-overrides
+    # Order is important. The earlier entry is prioritized higher than later entries. So you usually define
+    # one-time overrides at the top of your list, then yearly, monthly, weekly, and lastly daily overrides.
+    scheduled_overrides = optional(list(object({
+      start_time   = string # ISO 8601 format, eg,  "2021-06-01T00:00:00+09:00"
+      end_time     = string # ISO 8601 format, eg,  "2021-06-01T00:00:00+09:00"
+      min_replicas = optional(number)
+      max_replicas = optional(number)
+      recurrence_rule = optional(object({
+        frequency  = string           # One of Daily, Weekly, Monthly, Yearly
+        until_time = optional(string) # ISO 8601 format time after which the schedule will no longer apply
+      }))
+    })), [])
     busy_metrics = optional(object({
       scale_up_threshold    = string
       scale_down_threshold  = string
@@ -181,19 +194,34 @@ variable "runners" {
       scale_down_factor     = optional(string)
     }))
     webhook_driven_scaling_enabled = optional(bool, true)
-    # The name `webhook_startup_timeout` is misleading.
-    # It is actually the duration after which a job will be considered completed,
-    # (and the runner killed) even if the webhook has not received a "job completed" event.
+    # max_duration is the duration after which a job will be considered completed,
+    # even if the webhook has not received a "job completed" event.
     # This is to ensure that if an event is missed, it does not leave the runner running forever.
     # Set it long enough to cover the longest job you expect to run and then some.
     # See https://github.com/actions/actions-runner-controller/blob/9afd93065fa8b1f87296f0dcdf0c2753a0548cb7/docs/automatically-scaling-runners.md?plain=1#L264-L268
-    webhook_startup_timeout     = optional(string, "1h")
+    # Defaults to 1 hour programmatically (to be able to detect if both max_duration and webhook_startup_timeout are set).
+    max_duration = optional(string)
+    # The name `webhook_startup_timeout` was misleading and has been deprecated.
+    # It has been renamed `max_duration`.
+    webhook_startup_timeout = optional(string)
+    # Adjust the time (in seconds) to wait for the Docker in Docker daemon to become responsive.
+    wait_for_docker_seconds     = optional(string, "")
     pull_driven_scaling_enabled = optional(bool, false)
     labels                      = optional(list(string), [])
-    docker_storage              = optional(string, null)
+    # If not null, `docker_storage` specifies the size (as `go` string) of
+    # an ephemeral (default storage class) Persistent Volume to allocate for the Docker daemon.
+    # Takes precedence over `tmpfs_enabled` for the Docker daemon storage.
+    docker_storage = optional(string, null)
     # storage is deprecated in favor of docker_storage, since it is only storage for the Docker daemon
-    storage     = optional(string, null)
+    storage = optional(string, null)
+    # If `pvc_enabled` is true, a Persistent Volume Claim will be created for the runner
+    # and mounted at /home/runner/work/shared. This is useful for sharing data between runners.
     pvc_enabled = optional(bool, false)
+    # If `tmpfs_enabled` is `true`, both the runner and the docker daemon will use a tmpfs volume,
+    # meaning that all data will be stored in RAM rather than on disk, bypassing disk I/O limitations,
+    # but what would have been disk usage is now additional memory usage. You must specify memory
+    # requests and limits when using tmpfs or else the Pod will likely crash the Node.
+    tmpfs_enabled = optional(bool)
     resources = optional(object({
       limits = optional(object({
         cpu               = optional(string, "1")