Complete inflight requests after SIGTERM #1508

sparkacus · 2025-02-19T08:43:23Z

Unless I'm mistaken, when a SIGTERM is recieved gofr immediately exits closing all connections.

Would it be possible to gracefully handle the shutdown?

Off the top of my head the following might be useful, especially for those of us who plan to scale up/down:

Ability to set a threshold (seconds) where gofr will still accept new connections after a sigterm
Ability to set a threshold (seconds) where gofr will continue to handle inflight requests after the threshold for new connections (above)

Traefik's lifecycle procedure might be a good example of how it could be handled: https://doc.traefik.io/traefik/routing/entrypoints/#lifecycle

Umang01-hash · 2025-02-21T05:39:08Z

@sparkacus GoFr has a graceful shutdown implemented, we added it using this PR: #837

And regarding ability to set thresholds we can discuss about it.

sparkacus · 2025-02-21T07:57:58Z

@Umang01-hash You're right, apologies.

I'm using the 'air' package for hot reloading, so a CTRL+C resulted in the process being killed instantly, but that's not gofrs doing.... sorry.

I notice that the shutdown timeout is hardcoded to 30s - It might be useful to allow us to override that via a config i.e. I might want to lower or increase the value. An example could be when using SPOT instances on AWS, from my understanding they'll kill the instance regardless 30s after SIGTERM, so we might want to lower it to ensure gofr can completly shutdown gracefully, which is useful for observability purposes.

EDIT: I notice that gofr does have a REQUEST_TIMEOUT config which would likely resolve the example I provided (I'm certainly not advocating that requests should last longer than 30s :), but sometimes there are reasons)

Concerning the thresholds, it certainly would be useful, especially if gofr is targeted directly from an LB (It might not be a kubernetes service which will remove the target from the endpointslices). Ideally when a SIGTERM is recieved it would immdiately fail the healthcheck endpoint, but being able to handle new requests for a period of time gives the LB an opertunity to detect the failed endpoint and remove it as a target, but at least those new connections will be served.

vikash · 2025-02-24T16:25:34Z

health check and liveness probe should fail after SIGTERM is received, even if the server is not currently closed, it should be able to finish all in-flight requests.

SERVER_SHUTDOWN_GRACE_PERIOD or something like this can be added as well - current default of 30 second can be the default to ensure no changes in existing behaviour even after the change.

sparkacus · 2025-02-26T21:03:37Z

health check and liveness probe should fail after SIGTERM is received, even if the server is not currently closed, it should be able to finish all in-flight requests.

SERVER_SHUTDOWN_GRACE_PERIOD or something like this can be added as well - current default of 30 second can be the default to ensure no changes in existing behaviour even after the change.

Yes, having the ability to tweak the shutdown period could be very useful. It may be that 30 seconds is too long, so it will help us optimise based on various circumstances.

Concerning the healthcheck/liveness probe, you're right, at the point of SIGTERM it should remove the IP from the endpoint slice. Kubernetes might have been a bad example because it handles these situations well.

Lets use another example. The gofr service is say hosted on an instance that is sitting behind a traditional LB e.g. AWS ALB... those target groups are based on interval health checks, so if the gofr service shuts down (lets assume not due to the instance being stopped), the ALB will still forward the request to the service which is now technically offline (it's in the graceful shutdown mode) and thus the request will fail. That target will only be removed once whatever healthcheck thresholds have been reached.

If gofr can still handle incoming requests during the graceful shutdown but return a non-200 response to the healthchecks, that should negate request failures. It also gives the administrator peace of mind that they can stop the service without cuasing an outage (although technically if they knew they were going to stop the service they'd remove the target first... but still)

Umang01-hash · 2025-02-27T07:22:40Z

Hi @sparkacus,

While the PR (#1529) introducing the new SERVER_SHUTDOWN_THRESHOLD config, which indeed adds flexibility to the graceful shutdown process.

Regarding the health check behavior, our current design follows the traditional approach: once SIGTERM is received, the server stops accepting new connections (leading to errors like ECONNREFUSED) while still processing inflight requests. This method clearly signals to load balancers that the instance is going out of service, and modern load balancers (e.g., AWS ALB) are typically tuned to remove targets quickly based on these immediate health check failures.

In AWS environments that use autoscaling groups, this behavior fits well into deployment strategies like blue-green or rolling deployments. Typically, a new instance with the updated version is launched, and once it’s healthy, the old instance is automatically deregistered and terminated. This process ensures smooth transitions without needing to keep the health check endpoint reachable during shutdown.

For those who might need an alternative, the recommended approach is to adjust the load balancer or deployment configuration to fit their specific requirements.

Thanks.

Umang01-hash added the go Pull requests that update Go code label Feb 20, 2025

Umang01-hash self-assigned this Feb 25, 2025

Umang01-hash linked a pull request Feb 26, 2025 that will close this issue

En/graceful shutdown config #1529

Open

4 tasks

gofr-dev deleted a comment from H-Zop Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete inflight requests after SIGTERM #1508

Complete inflight requests after SIGTERM #1508

sparkacus commented Feb 19, 2025

Umang01-hash commented Feb 21, 2025

sparkacus commented Feb 21, 2025 •

edited

Loading

vikash commented Feb 24, 2025

sparkacus commented Feb 26, 2025 •

edited

Loading

Umang01-hash commented Feb 27, 2025 •

edited

Loading

Complete inflight requests after SIGTERM #1508

Complete inflight requests after SIGTERM #1508

Comments

sparkacus commented Feb 19, 2025

Umang01-hash commented Feb 21, 2025

sparkacus commented Feb 21, 2025 • edited Loading

vikash commented Feb 24, 2025

sparkacus commented Feb 26, 2025 • edited Loading

Umang01-hash commented Feb 27, 2025 • edited Loading

sparkacus commented Feb 21, 2025 •

edited

Loading

sparkacus commented Feb 26, 2025 •

edited

Loading

Umang01-hash commented Feb 27, 2025 •

edited

Loading