Skip to content

feat(pam): gateway auth for kubernetes#185

Closed
saifsmailbox98 wants to merge 2 commits into
mainfrom
saif/pam-168-kubernetes-gateway-mediated-authentication-2
Closed

feat(pam): gateway auth for kubernetes#185
saifsmailbox98 wants to merge 2 commits into
mainfrom
saif/pam-168-kubernetes-gateway-mediated-authentication-2

Conversation

@saifsmailbox98
Copy link
Copy Markdown
Contributor

@saifsmailbox98 saifsmailbox98 commented Apr 14, 2026

Description 📣

Gateway-side changes for K8s PAM impersonation. When auth method is gateway-kubernetes-auth, the proxy reads its own pod token, sets Impersonate-User/Group headers, and auto-discovers the K8s API from env vars.

Existing token auth unchanged.

Type ✨

  • Bug fix
  • New feature
  • Improvement
  • Breaking change
  • Documentation

Tests 🛠️

Needs a K8s cluster with the gateway deployed as a pod. Create a PAM account with "Gateway" auth method, start a session, run kubectl commands.

# gateway pod SA needs a ClusterRole with impersonate verb on target SAs

Add impersonation support to the PAM Kubernetes proxy. When auth method
is gateway-kubernetes-auth, the gateway reads its own pod token, sets
Impersonate-User/Group headers, auto-discovers the K8s API from env
vars, and uses the pod CA cert for TLS.
@linear
Copy link
Copy Markdown

linear Bot commented Apr 14, 2026

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae15756522

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/pam/pam-proxy.go Outdated
k8sHost := util.KUBERNETES_SERVICE_HOST_ENV_NAME
k8sPort := util.KUBERNETES_SERVICE_PORT_HTTPS_ENV_NAME
if host, port := os.Getenv(k8sHost), os.Getenv(k8sPort); host != "" && port != "" {
kubernetesConfig.TargetApiServer = fmt.Sprintf("https://%s:%s", host, port)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Build Kubernetes API URL with JoinHostPort

The in-cluster URL is constructed with fmt.Sprintf("https://%s:%s", host, port), which produces an invalid authority when KUBERNETES_SERVICE_HOST is IPv6 (for example fd00::1 becomes https://fd00::1:443). In that case the proxy later dials an unbracketed IPv6 host and fails, so gateway-kubernetes-auth breaks on IPv6 clusters. Use net.JoinHostPort (or equivalent URL assembly) before setting TargetApiServer.

Useful? React with 👍 / 👎.

Comment on lines +68 to +69
default:
return fmt.Errorf("unsupported Kubernetes auth method: %s", p.config.AuthMethod)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve token auth when authMethod is empty

The new switch rejects any auth method outside the two explicit strings, but session credentials still treat authMethod as optional and can decode as empty. In that case requests now return unsupported Kubernetes auth method and the session fails, whereas previous behavior always used ServiceAccountToken. Treat an empty method as service-account-token (or normalize it earlier) to avoid breaking existing token-based sessions.

Useful? React with 👍 / 👎.

Comment on lines 197 to +205
}
proxyReq.Header = req.Header.Clone()
proxyReq.Header.Set("Authorization", fmt.Sprintf("Bearer %s", p.config.InjectServiceAccountToken))
if err := p.injectAuthHeaders(proxyReq.Header); err != nil {
l.Error().Err(err).Msg("Failed to inject auth headers")
_, err = clientConn.Write([]byte(buildHttpInternalServerError(err.Error())))
if err != nil {
return err
}
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Two related error handling issues were introduced in the injectAuthHeaders failure paths. (1) In the HTTP path (~line 201), err.Error() is passed directly into buildHttpInternalServerError(), exposing internal OS-level details like 'open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory' to the kubectl client — inconsistent with every other error path in the file which uses static generic strings. (2) In the WebSocket upgrade path (~line 296), when injectAuthHeaders fails the function returns immediately with no HTTP response written to clientConn, causing a silent TCP connection drop; the client is still speaking HTTP at this point (it sent an Upgrade request) and expects either a 101 or an error status, so an HTTP 500 should be written before returning, as is done in the HTTP path.

Extended reasoning...

Issue 1 — Raw error string leaked to kubectl client (HTTP path)

In HandleConnection, the new code at line ~201 calls buildHttpInternalServerError(err.Error()) and writes it to the client. For gateway-kubernetes-auth, the error returned by injectAuthHeaders would be 'gateway not running in K8s cluster, unable to read pod service account token: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory'. This exposes the OS-level file path and internal diagnostic message to the kubectl client. Every other error path in the same function uses a static, sanitized string: buildHttpInternalServerError("failed to read request body") and buildHttpInternalServerError("failed to create proxy request"). The full error is already captured server-side via l.Error().Err(err).Msg("Failed to inject auth headers"), so there is no debugging cost to sanitizing the client-facing message.

One verifier argued this is intentional because the client is authenticated and the K8s token path (/var/run/secrets/kubernetes.io/serviceaccount/token) is public knowledge. This is true — the impact is low. However, it still violates the existing file-wide pattern of using static generic client messages, and sending raw OS errors (which may include host-specific paths or vary across environments) is a deviation from least-privilege information disclosure. A simple fix: replace err.Error() with "failed to configure auth headers".

Concrete proof for Issue 1:

  1. Gateway is deployed outside a K8s cluster (or the SA token is missing).
  2. A PAM user runs kubectl get pods.
  3. The HTTP request enters HandleConnection, injectAuthHeaders is called.
  4. os.ReadFile(util.KUBERNETES_SERVICE_ACCOUNT_TOKEN_PATH) fails.
  5. buildHttpInternalServerError(err.Error()) writes the raw OS error to the client, including the internal file path.
  6. kubectl displays this message to the user — internal infrastructure detail exposed.

Issue 2 — Silent TCP drop on auth failure in WebSocket path

In forwardWebsocketConnection, the new code at line ~296 calls injectAuthHeaders and returns the error directly if it fails, with no write to clientConn. Before this PR, the auth step was a simple headers.Set call that could not fail, so no error response was needed. Now that injectAuthHeaders can return errors (missing token file, unsupported auth method), this path leaves the kubectl client in limbo.

At the point of the auth failure, the TCP connection to the target K8s API server has already succeeded (lines ~278-287), but no HTTP response has been sent back to clientConn. The kubectl client sent an HTTP Upgrade request and is waiting for either 101 Switching Protocols or an HTTP error. Instead it receives a connection reset with no status code — for kubectl exec, kubectl logs -f, or kubectl port-forward, this manifests as an unexplained disconnect.

One verifier noted that other failure paths in forwardWebsocketConnection (dial failures at lines ~278-287) also return errors without writing to clientConn, so this is internally consistent. However, those failures happen before any response is expected — the backend connection simply never established. The auth failure here occurs after the backend connection is open and the client is explicitly waiting for an HTTP response. The HTTP proxy path correctly handles this with buildHttpInternalServerError, and the same pattern should be applied here.

Concrete proof for Issue 2:

  1. Gateway is configured for gateway-kubernetes-auth but the SA token file is missing.
  2. A PAM user runs kubectl exec -it pod-name -- bash.
  3. The Upgrade request arrives, forwardWebsocketConnection is called.
  4. TLS connection to K8s API server succeeds (~line 280).
  5. injectAuthHeaders fails, the function returns err immediately.
  6. No HTTP response is written to clientConn.
  7. The TCP connection is closed; kubectl sees an unexplained connection drop with no error message.

Fix: In forwardWebsocketConnection, before returning err, write clientConn.Write([]byte(buildHttpInternalServerError("failed to configure auth headers"))) — mirroring the HTTP path. Also sanitize the HTTP path from err.Error() to a static string.

Comment thread packages/pam/pam-proxy.go Outdated
Comment on lines +311 to +316
if host, port := os.Getenv(k8sHost), os.Getenv(k8sPort); host != "" && port != "" {
kubernetesConfig.TargetApiServer = fmt.Sprintf("https://%s:%s", host, port)
}

// Use pod's in-cluster CA cert with strict TLS (ignore resource SSL settings)
caCert, err := os.ReadFile(util.KUBERNETES_SERVICE_ACCOUNT_CA_CERT_PATH)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 When gateway-kubernetes-auth is configured but KUBERNETES_SERVICE_HOST or KUBERNETES_SERVICE_PORT_HTTPS are absent (e.g., gateway running outside a K8s cluster), the code silently keeps credentials.Url as TargetApiServer with no warning logged — even though for this auth method, credentials.Url is a PAM-resource-configured field not intended to hold the K8s API server address. This is inconsistent with the CA cert failure path directly below it, which does log a Warn. The root cause is never surfaced: when a request arrives, the eventual error message is about the missing pod service account token, not about the missing env vars. A warning log (or early return) at the env-var check would give operators a clear, fast diagnostic instead of a confusing downstream failure.

Extended reasoning...

What the bug is and how it manifests

In pam-proxy.go lines 311-316, for gateway-kubernetes-auth, the auto-discovery of the Kubernetes API server URL reads KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT_HTTPS. If either is absent, the if condition is false and kubernetesConfig.TargetApiServer silently retains its initial value: credentials.Url. For this auth method, credentials.Url is a PAM-resource field populated by the backend for other purposes (e.g., service-account-token flows), and is likely empty or irrelevant for gateway-mediated auth.

The specific code path that triggers it

if host, port := os.Getenv(k8sHost), os.Getenv(k8sPort); host \!= "" && port \!= "" {
    kubernetesConfig.TargetApiServer = fmt.Sprintf("https://%s:%s", host, port)
}
// No else / no warning — TargetApiServer stays as credentials.Url

Contrast this with the CA cert path immediately below (lines 318-327), which does log a Warn on failure. The asymmetry is inconsistent and makes this a clear observability gap.

Why existing code doesn't prevent it

The refutation argues that the failure is not truly silent because (a) the pod service account token file at KUBERNETES_SERVICE_ACCOUNT_TOKEN_PATH will also be absent, causing injectAuthHeaders to return a descriptive error on first request, and (b) the startup info log prints the target URL. Both points are valid but insufficient: (a) the error message says "unable to read pod service account token" — it says nothing about missing KUBERNETES_SERVICE_HOST/PORT env vars, so an operator debugging a non-K8s gateway deployment faces a misleading error; (b) an empty-string URL in the startup log is easy to miss.

What the impact would be

Operators who inadvertently deploy the gateway outside of a K8s pod (or in a cluster where the standard env vars are not injected) will see a generic 500 error about the token file, chase that red herring, and not realise the actual misconfiguration is the missing env vars. A fail-fast error or even a single Warn log at configuration time would cut the debugging cycle significantly.

How to fix it

Add an else branch (or a post-if check) that either returns an error or logs a warning when both env vars are absent for gateway-kubernetes-auth:

if host, port := os.Getenv(k8sHost), os.Getenv(k8sPort); host \!= "" && port \!= "" {
    kubernetesConfig.TargetApiServer = fmt.Sprintf("https://%s:%s", host, port)
} else {
    // Preferred: return fmt.Errorf("gateway-kubernetes-auth requires KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT_HTTPS to be set; gateway must run inside a K8s pod")
    log.Warn().Msg("KUBERNETES_SERVICE_HOST or KUBERNETES_SERVICE_PORT_HTTPS not set; gateway-kubernetes-auth will likely fail — is the gateway running inside a K8s pod?")
}

Step-by-step proof

  1. Deploy the gateway binary on a plain VM (outside any K8s cluster) with auth method gateway-kubernetes-auth.
  2. KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT_HTTPS are not set.
  3. At session start, HandlePAMProxy runs. kubernetesConfig.TargetApiServer is initialised to credentials.Url (likely "" for this auth method).
  4. The if block at lines 311-316 evaluates false; no warning is logged; TargetApiServer stays "".
  5. The startup info log prints "target":"" — easy to overlook.
  6. On first kubectl command, injectAuthHeaders calls os.ReadFile(KUBERNETES_SERVICE_ACCOUNT_TOKEN_PATH), fails with "no such file or directory", and returns the error "gateway not running in K8s cluster, unable to read pod service account token: ...".
  7. The operator sees a 500 about a token file and has no indication that the real root cause is missing env vars. The misconfiguration could take significant time to diagnose.

Comment on lines +63 to +70
saUser := fmt.Sprintf("system:serviceaccount:%s:%s",
p.config.ImpersonateNamespace, p.config.ImpersonateServiceAccount)
headers.Set("Impersonate-User", saUser)
headers.Set("Impersonate-Group", "system:serviceaccounts")
headers.Add("Impersonate-Group", fmt.Sprintf("system:serviceaccounts:%s", p.config.ImpersonateNamespace))
default:
return fmt.Errorf("unsupported Kubernetes auth method: %s", p.config.AuthMethod)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 In injectAuthHeaders, the Impersonate-User header is built as system:serviceaccount:{namespace}:{sa} with no check that either field is non-empty; a misconfigured PAM account (or a backend that omits these fields due to their omitempty JSON tags) produces a malformed header like system:serviceaccount::, causing an opaque 403 from the Kubernetes API server instead of a clear error from the gateway. Add an early validation check before constructing the impersonation headers and return a descriptive error if either field is empty.

Extended reasoning...

What the bug is and how it manifests

In injectAuthHeaders (proxy.go lines 63-70), the gateway-kubernetes-auth branch constructs the Impersonate-User header as system:serviceaccount:{ImpersonateNamespace}:{ImpersonateServiceAccount} with no guard on empty strings. The same empty namespace is also embedded in the second Impersonate-Group header (system:serviceaccounts:{ImpersonateNamespace}). If either value is the empty string, both headers become malformed Kubernetes identities.

The specific code path that triggers it

PAMSessionCredentials in the API model declares both new fields with omitempty: ServiceAccountName and Namespace. If the backend omits either field from its JSON response (e.g. due to misconfiguration or a backend bug), Go's JSON decoder leaves the corresponding struct field as the zero value — the empty string. These empty values flow through credentials.go directly into KubernetesProxyConfig.ImpersonateNamespace / ImpersonateServiceAccount without any validation, reaching injectAuthHeaders unchanged.

Why existing code does not prevent it

There is no validation at any layer in the call chain — not in GetPAMSessionCredentials, not in the gateway-kubernetes-auth block in pam-proxy.go that assigns the fields, and not in injectAuthHeaders itself before the fmt.Sprintf call.

What the impact would be

When either field is empty the gateway sends headers such as Impersonate-User: system:serviceaccount:: and Impersonate-Group: system:serviceaccounts: to the Kubernetes API server. The API server rejects these as invalid impersonation identities with a 403. The operator debugging the failure sees a cryptic 403 from Kubernetes with no indication of which configuration field is missing — the gateway silently passes the malformed header rather than surfacing a clear configuration error.

Step-by-step proof

  1. Backend is misconfigured and returns PAM session credential JSON with serviceAccountName and/or namespace omitted (fields are omitempty, so this is a valid API response).
  2. GetPAMSessionCredentials maps the response into PAMCredentials with ServiceAccountName: empty string and/or Namespace: empty string.
  3. In pam-proxy.go, the gateway-kubernetes-auth block assigns kubernetesConfig.ImpersonateNamespace = credentials.Namespace (empty) and kubernetesConfig.ImpersonateServiceAccount = credentials.ServiceAccountName (empty).
  4. On the first proxied request, injectAuthHeaders is called. fmt.Sprintf("system:serviceaccount:%s:%s", "", "") produces "system:serviceaccount::".
  5. headers.Set("Impersonate-User", "system:serviceaccount::") is sent to the K8s API server.
  6. Kubernetes rejects the request with a 403. The gateway has no early-exit error for this case, so the opaque 403 propagates to the user with no gateway-level diagnosis.

How to fix it

Add an early validation guard at the top of the gateway-kubernetes-auth case in injectAuthHeaders:

if p.config.ImpersonateNamespace == "" || p.config.ImpersonateServiceAccount == "" {
return fmt.Errorf("gateway-kubernetes-auth requires non-empty namespace and service account name (got namespace=%q, serviceAccount=%q)",
p.config.ImpersonateNamespace, p.config.ImpersonateServiceAccount)
}

This causes injectAuthHeaders to return an error before any header is written, the existing error-handling path sends a 500 with a descriptive message, and the operator immediately knows which configuration field to fix.

- Treat empty authMethod as service-account-token for backwards compat
- Add empty namespace/SA name guard before constructing impersonation
  headers
- Use net.JoinHostPort for IPv6-safe URL construction
- Sanitize error messages sent to kubectl client (use static strings)
- Write HTTP 500 before returning on websocket auth failure
- Log warning when KUBERNETES_SERVICE_HOST env vars are missing
@saifsmailbox98 saifsmailbox98 deleted the saif/pam-168-kubernetes-gateway-mediated-authentication-2 branch April 14, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant