Skip to content

Use multiple shorter timeout requests to check for ssl cert#2140

Open
edwardrf wants to merge 1 commit into
mainfrom
edw/cert-verify-timeout
Open

Use multiple shorter timeout requests to check for ssl cert#2140
edwardrf wants to merge 1 commit into
mainfrom
edw/cert-verify-timeout

Conversation

@edwardrf
Copy link
Copy Markdown
Contributor

@edwardrf edwardrf commented Jun 4, 2026

To prevent cert verification timeout

Checklist

  • I have performed a self-review of my code
  • I have added appropriate tests
  • I have updated the Defang CLI docs and/or README to reflect my changes, if necessary

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved TLS certificate verification with per-attempt timeout handling to ensure each IP check completes within a defined timeframe.
    • Enhanced connection resource management and reuse during certificate verification.
    • Added per-attempt timeout cap for more reliable certificate checks.
  • Tests

    • Added comprehensive test coverage for certificate checking behavior, including timeout handling, DNS resolution failures, and fixed-IP transport configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

CheckTLSCert is refactored to resolve domain IPs and probe each separately using a new checkOne helper with per-attempt timeouts enforced at both context and HTTP client levels. Response bodies are drained to enable connection reuse. The fixed-IP transport gains a 5-second ResponseHeaderTimeout. Tests verify timeout firing, error propagation, IP pinning, and transport configuration.

Changes

Per-Attempt Timeout Refactoring and Testing

Layer / File(s) Summary
Per-Attempt Timeout Configuration
src/pkg/cert/check.go
Introduces perAttemptTimeout constant (10s) and adds ResponseHeaderTimeout: 5s to the fixed-IP HTTP transport to enforce timeout boundaries on individual HTTPS probe attempts.
Core Refactoring: CheckTLSCert and checkOne
src/pkg/cert/check.go
Refactors CheckTLSCert to resolve IPs and loop over each, delegating per-IP probes to a new checkOne helper. checkOne enforces per-attempt timeouts via both context.WithTimeout and http.Client.Timeout, builds and executes the HTTPS request using the fixed-IP transport, and drains a bounded portion of the response body to enable connection reuse within the attempt deadline.
Test Infrastructure and Timeout Verification
src/pkg/cert/check_test.go
Establishes test utilities including a stalling TCP listener that holds connections without completing TLS handshake and a helper to temporarily override perAttemptTimeout for tests. Verifies that CheckTLSCert errors and completes within timeout bounds when connecting to stalled servers, and that checkOne terminates within the HTTP client timeout.
Error Handling and Regression Guard Tests
src/pkg/cert/check_test.go
Tests resolver error propagation and the no-IPs edge case to ensure robust error handling. Includes regression guards confirming getFixedIPTransport sets positive ResponseHeaderTimeout and TLSHandshakeTimeout values, and verifying that the fixed-IP transport dials only to a specific IP by checking that connection errors do not appear as DNS resolution errors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 With timeouts set at every gate,
Each IP gets its turn to wait—
No stalls shall linger, bodies drained,
Connections reused, tests well-trained!
A refactored dance, precise and fleet.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing multiple shorter timeout requests (per-attempt timeouts) for SSL certificate verification instead of a single long timeout.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch edw/cert-verify-timeout

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=warning msg="The linter 'gomodguard' is deprecated (since v2.12.0) due to: new major version. Replaced by gomodguard_v2."
level=warning msg="Suggested new configuration:\nlinters:\n enable:\n - gomodguard_v2\n"
level=warning msg="[linters_context] running gomodguard failed: unable to read module file go.mod: current working directory must have a go.mod file: if you are not using go modules it is suggested to disable this linter"
level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pkg/cert/check_test.go`:
- Around line 114-123: The test TestCheckTLSCert_NoIPsReturnsNil and
CheckTLSCert behavior are wrong: an empty DNS A-record result must not be
treated as TLS-ready success. Update CheckTLSCert to add an explicit guard that
detects an empty IP list from the resolver and returns a non-nil error (e.g.,
"no IPs found" or a typed error) before attempting any probe; then update
TestCheckTLSCert_NoIPsReturnsNil to expect that error (i.e., assert non-nil)
rather than nil so waitForTLS callers won't prematurely stop. Ensure references
to the resolver response handling and the probe logic in CheckTLSCert are the
places you change.

In `@src/pkg/cert/check.go`:
- Around line 26-27: The error returns from calls to checkOne currently discard
domain/IP context (e.g. the if err := checkOne(ctx, domain, ip.String()); err !=
nil { return err } lines and similar blocks at the other locations), so change
those returns to wrap the original error with domain and IP information using
fmt.Errorf("... domain=%s ip=%s: %w", domain, ipStr, err) (or errors.Wrap
equivalent); update imports if needed and apply this pattern to all occurrences
(the checkOne calls at the shown site and the other two blocks referenced).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c665b822-1515-40e7-8cd9-b7f3db7762d4

📥 Commits

Reviewing files that changed from the base of the PR and between 1260feb and 3bdd354.

📒 Files selected for processing (2)
  • src/pkg/cert/check.go
  • src/pkg/cert/check_test.go

Comment on lines +114 to +123
func TestCheckTLSCert_NoIPsReturnsNil(t *testing.T) {
// Defensive: if the resolver returns no A records we should not loop,
// not error, and not panic. The cert command will retry on its 3s
// ticker and pick up the records once they propagate.
resolver := dns.MockResolver{Records: map[dns.DNSRequest]dns.DNSResponse{
{Type: "A", Domain: "empty.example"}: {Records: nil},
}}
if err := CheckTLSCert(context.Background(), "empty.example", resolver); err != nil {
t.Fatalf("expected nil for empty IP list, got %v", err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

NoIPs should not be treated as TLS-ready success.

Returning/expecting nil for an empty IP set can prematurely end readiness loops, because waitForTLS callers stop immediately on CheckTLSCert == nil.

Suggested update
-func TestCheckTLSCert_NoIPsReturnsNil(t *testing.T) {
+func TestCheckTLSCert_NoIPsReturnsError(t *testing.T) {
@@
-	if err := CheckTLSCert(context.Background(), "empty.example", resolver); err != nil {
-		t.Fatalf("expected nil for empty IP list, got %v", err)
-	}
+	if err := CheckTLSCert(context.Background(), "empty.example", resolver); err == nil {
+		t.Fatal("expected error for empty IP list, got nil")
+	}
}

And in src/pkg/cert/check.go, add an explicit empty-result guard before probing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/cert/check_test.go` around lines 114 - 123, The test
TestCheckTLSCert_NoIPsReturnsNil and CheckTLSCert behavior are wrong: an empty
DNS A-record result must not be treated as TLS-ready success. Update
CheckTLSCert to add an explicit guard that detects an empty IP list from the
resolver and returns a non-nil error (e.g., "no IPs found" or a typed error)
before attempting any probe; then update TestCheckTLSCert_NoIPsReturnsNil to
expect that error (i.e., assert non-nil) rather than nil so waitForTLS callers
won't prematurely stop. Ensure references to the resolver response handling and
the probe logic in CheckTLSCert are the places you change.

Comment thread src/pkg/cert/check.go
Comment on lines +26 to 27
if err := checkOne(ctx, domain, ip.String()); err != nil {
return err
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Wrap probe-path errors with domain/IP context.

These new network-path returns lose caller context (domain, ip) and make retry diagnostics much harder.

Suggested patch
 import (
 	"context"
+	"fmt"
 	"io"
 	"net"
 	"net/http"
 	"time"
@@
 func CheckTLSCert(ctx context.Context, domain string, resolver dns.Resolver) error {
 	ips, err := resolver.LookupIPAddr(ctx, domain)
 	if err != nil {
-		return err
+		return fmt.Errorf("lookup A records for %q: %w", domain, err)
 	}
 	for _, ip := range ips {
 		if err := checkOne(ctx, domain, ip.String()); err != nil {
 			return err
 		}
@@
 	req, err := http.NewRequestWithContext(attemptCtx, http.MethodGet, "https://"+domain, nil)
 	if err != nil {
-		return err
+		return fmt.Errorf("build TLS probe request for %q via %s: %w", domain, ip, err)
 	}
@@
 	resp, err := httpClient.Do(req)
 	if err != nil {
-		return err
+		return fmt.Errorf("TLS probe failed for %q via %s: %w", domain, ip, err)
 	}

As per coding guidelines: “Do not silently swallow real errors; return wrapped errors for filesystem, parsing, network, and provider failures”.

Also applies to: 37-39, 45-47

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pkg/cert/check.go` around lines 26 - 27, The error returns from calls to
checkOne currently discard domain/IP context (e.g. the if err := checkOne(ctx,
domain, ip.String()); err != nil { return err } lines and similar blocks at the
other locations), so change those returns to wrap the original error with domain
and IP information using fmt.Errorf("... domain=%s ip=%s: %w", domain, ipStr,
err) (or errors.Wrap equivalent); update imports if needed and apply this
pattern to all occurrences (the checkOne calls at the shown site and the other
two blocks referenced).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants