Skip to content

allow force rescheduling of failed alloc with no reschedule.attempts remaining #27225

@akamensky

Description

@akamensky

Nomad version

Verified the same behavior in 1.9.5, 1.10.5, 1.11.1

Nomad v1.11.1
BuildDate 2025-12-09T20:10:56Z
Revision 5b76eb0535615e32faf4daee479f7155ea16ec0d

Operating system and Environment details

Fedora 40

Issue

The job definition has following block:

    reschedule {
      attempts  = 0
      unlimited = false
    }

    restart {
      attempts = 0
      mode     = "fail"
    }

This is done because in our use case we cannot allow application to restart automatically upon failure. It should remain down until the failure can be investigated. However this also has a consequence that trying to submit the same job spec to Nomad results in it not starting it. The logs that we see on the Nomad side for this is:

2025-12-11T04:55:38.388Z [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=40c0c62d-f2dd-69b9-742a-3a7d8609bdfd task=prestart reason="Policy allows no restarts"

And the app remains in failed state. To start it again we have to 1st purge the job from Nomad then submit it again.

Reproduction steps

job "test-bug" {
    region = "crypto"
    type = "service"

    group "grp" {
        reschedule {
          attempts  = 0
          unlimited = false
        }

        restart {
          attempts = 0
          mode     = "fail"
        }

        task "main" {
            driver = "raw_exec"
            kill_timeout = "180s"
            kill_signal = "SIGCONT" # Just for the sake of having it defined, not used in repro
            shutdown_delay = "60s"

            config {
                command = "sleep"
                args = [
                    "infinity"
                ]
            }
        }
    }
}
  1. The job with above block to prevent automated restarts
  2. Send it to failed state by kill -9 PID on the host it is running
  3. Try to start it again by submitting the same job spec (via API)

Expected Result

The job started when the same spec is submitted, because it is a user action, so it is intentional.

Actual Result

The job is not started with the error log "Policy allows no restarts"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Needs Roadmapping

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions