Skip to content

Commit 45daf59

Browse files
adwk67claude
andcommitted
fix: Add cleanup steps to prevent kuttl namespace deletion timeouts
KubernetesExecutor DAG task pods with a Vector sidecar do not shut down gracefully on SIGTERM — Vector runs as a background process (not PID 1) and ignores the signal, causing pods to wait out the full 300s terminationGracePeriodSeconds before being force-killed. Since kuttl v0.15.0 waits for namespace deletion to complete, this blocks the test run past kuttl's timeout. Add a cleanup step to all KubernetesExecutor tests that deletes the AirflowCluster CR and force-deletes any remaining pods before kuttl tears down the namespace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9fcc743 commit 45daf59

13 files changed

Lines changed: 250 additions & 0 deletions
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
# Force-delete KubernetesExecutor DAG task pods before kuttl deletes the namespace.
3+
# Their Vector sidecar does not respond to SIGTERM (it is not PID 1), so these pods
4+
# sit in Terminating for the full terminationGracePeriodSeconds (300s), blocking
5+
# namespace deletion past kuttl's timeout.
6+
# The proper fix is in operator-rs (making Vector PID 1 via exec).
7+
apiVersion: kuttl.dev/v1beta1
8+
kind: TestStep
9+
timeout: 600
10+
commands:
11+
- script: |
12+
kubectl delete airflowcluster --all -n $NAMESPACE --wait=false 2>/dev/null || true
13+
- script: |
14+
if kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=120s 2>/dev/null; then
15+
exit 0
16+
fi
17+
kubectl delete pods -l app.kubernetes.io/name=airflow -n $NAMESPACE --grace-period=0 --force 2>/dev/null || true
18+
kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=300s
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{% if test_scenario['values']['executor'] == 'kubernetes' %}
2+
---
3+
# Force-delete KubernetesExecutor DAG task pods before kuttl deletes the namespace.
4+
# Their Vector sidecar does not respond to SIGTERM (it is not PID 1), so these pods
5+
# sit in Terminating for the full terminationGracePeriodSeconds (300s), blocking
6+
# namespace deletion past kuttl's timeout.
7+
# The proper fix is in operator-rs (making Vector PID 1 via exec).
8+
apiVersion: kuttl.dev/v1beta1
9+
kind: TestStep
10+
timeout: 600
11+
commands:
12+
- script: |
13+
kubectl delete airflowcluster airflow -n $NAMESPACE --wait=false 2>/dev/null || true
14+
- script: |
15+
if kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=120s 2>/dev/null; then
16+
exit 0
17+
fi
18+
kubectl delete pods -l app.kubernetes.io/name=airflow -n $NAMESPACE --grace-period=0 --force 2>/dev/null || true
19+
kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=300s
20+
{% endif %}
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{% if test_scenario['values']['executor'] == 'kubernetes' %}
2+
---
3+
# Force-delete KubernetesExecutor DAG task pods before kuttl deletes the namespace.
4+
# Their Vector sidecar does not respond to SIGTERM (it is not PID 1), so these pods
5+
# sit in Terminating for the full terminationGracePeriodSeconds (300s), blocking
6+
# namespace deletion past kuttl's timeout.
7+
# The proper fix is in operator-rs (making Vector PID 1 via exec).
8+
apiVersion: kuttl.dev/v1beta1
9+
kind: TestStep
10+
timeout: 600
11+
commands:
12+
- script: |
13+
kubectl delete airflowcluster airflow -n $NAMESPACE --wait=false 2>/dev/null || true
14+
- script: |
15+
if kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=120s 2>/dev/null; then
16+
exit 0
17+
fi
18+
kubectl delete pods -l app.kubernetes.io/name=airflow -n $NAMESPACE --grace-period=0 --force 2>/dev/null || true
19+
kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=300s
20+
{% endif %}
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{% if test_scenario['values']['executor'] == 'kubernetes' %}
2+
---
3+
# Force-delete KubernetesExecutor DAG task pods before kuttl deletes the namespace.
4+
# Their Vector sidecar does not respond to SIGTERM (it is not PID 1), so these pods
5+
# sit in Terminating for the full terminationGracePeriodSeconds (300s), blocking
6+
# namespace deletion past kuttl's timeout.
7+
# The proper fix is in operator-rs (making Vector PID 1 via exec).
8+
apiVersion: kuttl.dev/v1beta1
9+
kind: TestStep
10+
timeout: 600
11+
commands:
12+
- script: |
13+
kubectl delete airflowcluster airflow -n $NAMESPACE --wait=false 2>/dev/null || true
14+
- script: |
15+
if kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=120s 2>/dev/null; then
16+
exit 0
17+
fi
18+
kubectl delete pods -l app.kubernetes.io/name=airflow -n $NAMESPACE --grace-period=0 --force 2>/dev/null || true
19+
kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=300s
20+
{% endif %}
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{% if test_scenario['values']['executor'] == 'kubernetes' %}
2+
---
3+
# Force-delete KubernetesExecutor DAG task pods before kuttl deletes the namespace.
4+
# Their Vector sidecar does not respond to SIGTERM (it is not PID 1), so these pods
5+
# sit in Terminating for the full terminationGracePeriodSeconds (300s), blocking
6+
# namespace deletion past kuttl's timeout.
7+
# The proper fix is in operator-rs (making Vector PID 1 via exec).
8+
apiVersion: kuttl.dev/v1beta1
9+
kind: TestStep
10+
timeout: 600
11+
commands:
12+
- script: |
13+
kubectl delete airflowcluster airflow -n $NAMESPACE --wait=false 2>/dev/null || true
14+
- script: |
15+
if kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=120s 2>/dev/null; then
16+
exit 0
17+
fi
18+
kubectl delete pods -l app.kubernetes.io/name=airflow -n $NAMESPACE --grace-period=0 --force 2>/dev/null || true
19+
kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=300s
20+
{% endif %}
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{% if test_scenario['values']['executor'] == 'kubernetes' %}
2+
---
3+
# Force-delete KubernetesExecutor DAG task pods before kuttl deletes the namespace.
4+
# Their Vector sidecar does not respond to SIGTERM (it is not PID 1), so these pods
5+
# sit in Terminating for the full terminationGracePeriodSeconds (300s), blocking
6+
# namespace deletion past kuttl's timeout.
7+
# The proper fix is in operator-rs (making Vector PID 1 via exec).
8+
apiVersion: kuttl.dev/v1beta1
9+
kind: TestStep
10+
timeout: 600
11+
commands:
12+
- script: |
13+
kubectl delete airflowcluster airflow -n $NAMESPACE --wait=false 2>/dev/null || true
14+
- script: |
15+
if kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=120s 2>/dev/null; then
16+
exit 0
17+
fi
18+
kubectl delete pods -l app.kubernetes.io/name=airflow -n $NAMESPACE --grace-period=0 --force 2>/dev/null || true
19+
kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=300s
20+
{% endif %}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
# Force-delete KubernetesExecutor DAG task pods before kuttl deletes the namespace.
3+
# Their Vector sidecar does not respond to SIGTERM (it is not PID 1), so these pods
4+
# sit in Terminating for the full terminationGracePeriodSeconds (300s), blocking
5+
# namespace deletion past kuttl's timeout.
6+
# The proper fix is in operator-rs (making Vector PID 1 via exec).
7+
apiVersion: kuttl.dev/v1beta1
8+
kind: TestStep
9+
timeout: 600
10+
commands:
11+
- script: |
12+
kubectl delete airflowcluster airflow -n $NAMESPACE --wait=false 2>/dev/null || true
13+
- script: |
14+
if kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=120s 2>/dev/null; then
15+
exit 0
16+
fi
17+
kubectl delete pods -l app.kubernetes.io/name=airflow -n $NAMESPACE --grace-period=0 --force 2>/dev/null || true
18+
kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=300s
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
# Force-delete KubernetesExecutor DAG task pods before kuttl deletes the namespace.
3+
# Their Vector sidecar does not respond to SIGTERM (it is not PID 1), so these pods
4+
# sit in Terminating for the full terminationGracePeriodSeconds (300s), blocking
5+
# namespace deletion past kuttl's timeout.
6+
# The proper fix is in operator-rs (making Vector PID 1 via exec).
7+
apiVersion: kuttl.dev/v1beta1
8+
kind: TestStep
9+
timeout: 600
10+
commands:
11+
- script: |
12+
kubectl delete airflowcluster airflow -n $NAMESPACE --wait=false 2>/dev/null || true
13+
- script: |
14+
if kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=120s 2>/dev/null; then
15+
exit 0
16+
fi
17+
kubectl delete pods -l app.kubernetes.io/name=airflow -n $NAMESPACE --grace-period=0 --force 2>/dev/null || true
18+
kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=300s
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
# Force-delete KubernetesExecutor DAG task pods before kuttl deletes the namespace.
3+
# Their Vector sidecar does not respond to SIGTERM (it is not PID 1), so these pods
4+
# sit in Terminating for the full terminationGracePeriodSeconds (300s), blocking
5+
# namespace deletion past kuttl's timeout.
6+
# The proper fix is in operator-rs (making Vector PID 1 via exec).
7+
apiVersion: kuttl.dev/v1beta1
8+
kind: TestStep
9+
timeout: 600
10+
commands:
11+
- script: |
12+
kubectl delete airflowcluster --all -n $NAMESPACE --wait=false 2>/dev/null || true
13+
- script: |
14+
if kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=120s 2>/dev/null; then
15+
exit 0
16+
fi
17+
kubectl delete pods -l app.kubernetes.io/name=airflow -n $NAMESPACE --grace-period=0 --force 2>/dev/null || true
18+
kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=300s
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{% if test_scenario['values']['executor'] == 'kubernetes' %}
2+
---
3+
# Force-delete KubernetesExecutor DAG task pods before kuttl deletes the namespace.
4+
# Their Vector sidecar does not respond to SIGTERM (it is not PID 1), so these pods
5+
# sit in Terminating for the full terminationGracePeriodSeconds (300s), blocking
6+
# namespace deletion past kuttl's timeout.
7+
# The proper fix is in operator-rs (making Vector PID 1 via exec).
8+
apiVersion: kuttl.dev/v1beta1
9+
kind: TestStep
10+
timeout: 600
11+
commands:
12+
- script: |
13+
kubectl delete airflowcluster airflow -n $NAMESPACE --wait=false 2>/dev/null || true
14+
- script: |
15+
if kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=120s 2>/dev/null; then
16+
exit 0
17+
fi
18+
kubectl delete pods -l app.kubernetes.io/name=airflow -n $NAMESPACE --grace-period=0 --force 2>/dev/null || true
19+
kubectl wait --for=delete pod -l app.kubernetes.io/name=airflow -n $NAMESPACE --timeout=300s
20+
{% endif %}

0 commit comments

Comments
 (0)