Skip to content

fix: --ipv6-prefix for routed mode, stale state cleanup, snapshot restore#559

Merged
ejc3 merged 9 commits intomainfrom
routed-ipv6-prefix
Mar 5, 2026
Merged

fix: --ipv6-prefix for routed mode, stale state cleanup, snapshot restore#559
ejc3 merged 9 commits intomainfrom
routed-ipv6-prefix

Conversation

@ejc3
Copy link
Owner

@ejc3 ejc3 commented Mar 4, 2026

Summary

  • Add --ipv6-prefix flag for routed mode — specifies a routable /64 prefix for VM addressing. When set, MASQUERADE is skipped (the prefix is directly routable and covered by the machine cert's IP SANs). Without it, detect_host_ipv6() auto-detects from interfaces, skipping deprecated addresses.
  • load_state_by_pid: retry after cleanup_stale_state when target PID not found (stale files from killed VMs shadow live ones)
  • Routed mode setup(): verify host forwarding/accept_ra/default route and warn if misconfigured
  • Snapshot restore: remove 30s timeout on fc-agent output reconnect (VM may be CPU-starved after restore; proceeding early causes exec failures)
  • preflight_check is now an instance method (&self) so ipv6_prefix config cannot be mismatched between preflight and setup

Test Results

check_fcvm.py: 39/39 pass
test_container.py cold boot: 31/31 pass
test_container.py snapshot restore (file-based): 30-31/31 pass

ejc3 added 2 commits March 4, 2026 15:46
…estore timeout

- load_state_by_pid: retry after cleanup_stale_state when PID not found
- Routed mode: verify forwarding/accept_ra/default route at setup, warn if wrong
- Snapshot restore: remove 30s timeout on fc-agent output reconnect (VM may be
  CPU-starved after restore; proceeding early causes exec failures)
Add --ipv6-prefix flag for routed mode VM addressing. When set, VMs get
addresses in the specified /64 prefix via NDP proxy, and MASQUERADE is
skipped (the prefix is directly routable and covered by the machine cert's
IP SANs, so VM source IPs pass IP binding checks).

Without --ipv6-prefix, detect_host_ipv6() auto-detects from interfaces,
skipping deprecated addresses. For hosts where all /64s are deprecated,
--ipv6-prefix is required.

Also: preflight_check is now an instance method (&self) so the ipv6_prefix
configuration cannot be mismatched between preflight and setup.
@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔍 Claude Review

SEVERITY: medium

Findings

[MEDIUM] Snapshot restore path doesn't propagate --ipv6-prefix to RoutedNetwork

In src/commands/snapshot.rs (line ~728-731), the FcNetworkMode::Routed branch creates a RoutedNetwork without calling with_ipv6_prefix():

FcNetworkMode::Routed => {
    let mut net =
        RoutedNetwork::new(vm_id.clone(), tap_device.clone(), port_mappings.clone());
    net.preflight_check().context("routed mode preflight check failed")?;
    // ...
}

The SnapshotMetadata struct doesn't store ipv6_prefix, and SnapshotRunArgs has no field for it. This means:

  • If the original VM used --ipv6-prefix, snapshot restores lose that config.
  • preflight_check() will fail on hosts without auto-detectable IPv6 (the main use case for --ipv6-prefix).
  • Even if auto-detect works, restores will add MASQUERADE where the original VM skipped it, breaking traffic that relied on direct routing (cert IP SAN matching, source IP identity).

To fix: either store ipv6_prefix in SnapshotMetadata and propagate it during restore, or add an --ipv6-prefix CLI flag to SnapshotRunArgs.

[MEDIUM] No validation of --ipv6-prefix input

The user-supplied prefix string is used directly in string formatting (format\!("{}::1", prefix), format\!("{}:{:x}:{:x}:{:x}:{:x}", prefix, ...)) and passed to ip commands without any validation. An invalid value like --ipv6-prefix foobar produces nonsense addresses (foobar::1, foobar:1a2b:3c4d:5e6f:7890), and multiple ip commands will silently fail (they use let _ = to ignore errors). The network setup will appear to succeed but produce a non-functional network.

A minimal check in preflight_check() or with_ipv6_prefix() would catch this early — e.g., parsing the 4 colon-separated hex groups to confirm it's a valid /64 prefix.

[LOW] DESIGN.md and CLAUDE.md not updated for --ipv6-prefix

The RoutedNetwork struct in DESIGN.md (line ~301-311) is missing the ipv6_prefix field, the setup pseudocode still shows unconditional preflight_check() and detect_host_ipv6() without the explicit prefix path, and CLAUDE.md's routed architecture description (line ~1228-1234) doesn't mention --ipv6-prefix or the conditional MASQUERADE behavior.

Summary

The --ipv6-prefix feature works correctly for the podman run path, but the snapshot restore path (snapshot run) doesn't propagate the prefix, causing restores of routed-mode VMs to fail or misbehave on hosts that rely on explicit prefix configuration. The prefix input also lacks validation, which could produce confusing silent failures.


Review by Claude | Run

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca2ef8c238

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Err(_) => {
warn!(vm_id = %vm_id, "fc-agent did not connect within 30s, proceeding anyway")
}
match output_connected_rx.await {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid indefinite wait for output reconnect gate

In non-TTY snapshot restores, output_connected_rx.await can block forever when fc-agent never reconnects (for example, if the restored VM exits/crashes before opening the output vsock). In that case the sender is still held inside run_output_listener's initial accept() wait, so this function never reaches the later vm_manager.wait()/cleanup path and the command hangs indefinitely. Please gate this wait against VM exit/cancellation (or fail after a bounded timeout) so failed restores terminate.

Useful? React with 👍 / 👎.

Comment on lines +729 to +731
let mut net =
RoutedNetwork::new(vm_id.clone(), tap_device.clone(), port_mappings.clone());
net.preflight_check().context("routed mode preflight check failed")?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate routed IPv6 prefix into snapshot restore path

This routed restore path constructs RoutedNetwork without ever setting an explicit prefix, so restores always run the auto-detect/MASQUERADE behavior even after adding --ipv6-prefix support in podman run. That creates a behavior mismatch where cold boots can succeed with a configured routable prefix but snapshot/cache restores cannot reproduce it (and may fail preflight on hosts that require explicit prefixing). The restore path needs to carry the prefix into with_ipv6_prefix(...).

Useful? React with 👍 / 👎.

- cargo fmt: line wrapping for .context() chains
- clippy: collapse nested if in detect_host_ipv6()
- Add missing ipv6_prefix field to RunArgs in test helpers

Tested: make lint (fmt, clippy, audit, deny all pass)
@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔍 Claude Review

SEVERITY: medium

Findings

[MEDIUM] Snapshot restore doesn't propagate --ipv6-prefix — still unfixed

As noted in previous review, the snapshot restore path (snapshot run) creates RoutedNetwork without calling with_ipv6_prefix(), and neither SnapshotMetadata nor SnapshotRunArgs stores the prefix. This remains unfixed.

[MEDIUM] No validation of --ipv6-prefix input — still unfixed

As noted in previous review, the user-supplied prefix is used directly in format!("{}::1", prefix) and passed to ip commands without validation. Invalid input produces nonsense addresses and silent network setup failures.

[MEDIUM] PR description/docs inconsistency with code changes

  1. The PR description says "Snapshot restore: remove 30s timeout on fc-agent output reconnect" but doesn't explain the stale state cleanup retry in load_state_by_pid (commit 7eef8bb), which is a behavioral change — after failing to find a PID, it now calls cleanup_stale_state() and retries. This changes the semantics of load_state_by_pid from a pure lookup to a lookup-with-side-effects.

  2. DESIGN.md's RoutedNetwork struct pseudocode (line ~301-311) is missing the ipv6_prefix field, and the setup() pseudocode still shows unconditional preflight_check() / detect_host_ipv6() / ip6tables MASQUERADE without the conditional prefix path.

  3. CLAUDE.md's routed architecture description (line ~1227-1232) doesn't mention --ipv6-prefix or the conditional MASQUERADE behavior.

  4. README.md's routed mode requirements (line ~273) still says "ip6tables" is required for routed mode, but with --ipv6-prefix ip6tables is not needed.

Summary

The two [MEDIUM] issues from the previous review (snapshot restore not propagating --ipv6-prefix, no input validation) remain unfixed. Documentation across DESIGN.md, CLAUDE.md, and README.md is stale with respect to the new --ipv6-prefix flag and conditional MASQUERADE behavior.


Review by Claude | Run

Three issues found by code review:

1. ipv6_prefix not propagated through snapshot metadata — clones of
   routed-mode VMs with --ipv6-prefix would fail preflight (no auto-detect)
   or incorrectly add MASQUERADE. Added ipv6_prefix to VmConfig,
   SnapshotMetadata, and the snapshot restore path.

2. No validation of --ipv6-prefix input — invalid values like "foobar"
   produced nonsense addresses. Added validate_ipv6_prefix() that checks
   for 4 colon-separated hex groups.

3. Bare .await on output_connected_rx could hang forever if Firecracker
   crashes before fc-agent connects. Replaced with tokio::select! loop
   that polls vm_manager.try_wait() every 5s as a liveness check.

Tested: make lint (fmt, clippy, audit, deny all pass)
@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔍 Claude Review

SEVERITY: medium

Findings

[MEDIUM] Stale documentation: --ipv6-prefix not reflected in README.md, DESIGN.md, CLAUDE.md

The two code-level [MEDIUM] issues from previous review (snapshot propagation, input validation) are now fixed in commit 06b9b1a. The documentation staleness from previous review remains:

  • README.md line 273: Says "For routed: sudo, ip6tables, iproute2, host with global IPv6 /64" — ip6tables is not required when --ipv6-prefix is set. Also, --ipv6-prefix is missing from the CLI Reference (lines 322-339).
  • DESIGN.md lines 301-311: RoutedNetwork struct pseudocode is missing the ipv6_prefix: Option<String> field. Line 314 shows preflight_check() as a free function (now &self method). Line 326 shows unconditional ip6tables MASQUERADE but this is now conditional on --ipv6-prefix not being set.
  • CLAUDE.md line 1232: States "ip6tables MASQUERADE for AWS VPC source/dest checks" unconditionally — should note this is skipped when --ipv6-prefix is set.

[LOW] Doc comment on with_ipv6_prefix says "Validates format" but method doesn't validate

Line 70 of routed.rs: /// Set an explicit routable /64 prefix. Validates format (4 colon-separated hex groups). — the method just stores the prefix. Validation happens later in preflight_check().

Summary

The snapshot propagation and validation bugs are fixed. The remaining issue is stale documentation across three files that doesn't reflect the new --ipv6-prefix flag, conditional MASQUERADE behavior, or the preflight_check signature change.


Review by Claude | Run

ejc3 added 2 commits March 5, 2026 01:10
Keep ipv6_prefix as a builder method (with_ipv6_prefix), consistent
with loopback_ip (with_loopback_ip). Both are optional config set
after construction — loopback_ip because it's allocated async after
preflight_check, ipv6_prefix because it's genuinely optional.

The real compile-time safety comes from storing ipv6_prefix in
SnapshotMetadata, so both call sites (podman run + snapshot run)
have it available and use symmetric code.

Tested: cargo fmt --check + clippy pass
- README: ip6tables not required with --ipv6-prefix, add to CLI Reference
- DESIGN: add ipv6_prefix field to RoutedNetwork, note conditional checks
- CLAUDE.md: note MASQUERADE skipped with --ipv6-prefix
@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔍 Claude Review

SEVERITY: medium

Findings

[MEDIUM] Stale documentation: --ipv6-prefix not reflected in README.md, DESIGN.md, CLAUDE.md

As noted in previous review, the code-level issues (snapshot propagation, input validation) are now fixed. The documentation staleness remains across three files:

  • README.md line 273: Says "For routed: sudo, ip6tables, iproute2, host with global IPv6 /64" — ip6tables is not required when --ipv6-prefix is set. Also, --ipv6-prefix is missing from the CLI Reference (lines 322–339).
  • DESIGN.md lines 301–328: RoutedNetwork struct pseudocode is missing the ipv6_prefix: Option<String> field. preflight_check() shown as free function (now &self method). Setup pseudocode and 14-step sequence show unconditional ip6tables MASQUERADE (now conditional). Traffic flow diagram shows unconditional MASQUERADE. Cleanup step 2 doesn't note the conditional.
  • CLAUDE.md line 1232: States "ip6tables MASQUERADE for AWS VPC source/dest checks" unconditionally — should note this is skipped when --ipv6-prefix is set.

Summary

The two original code bugs (snapshot restore not propagating --ipv6-prefix, no input validation) are now fixed. The remaining issue is stale documentation across README.md, DESIGN.md, and CLAUDE.md that doesn't reflect the new --ipv6-prefix flag, conditional MASQUERADE behavior, or the preflight_check signature change. Creating a fix PR for the doc updates.


Review by Claude | Run

@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔍 Claude Review

SEVERITY: none

Findings

All previously reported issues are now fixed:

  • Snapshot restore propagation (commit 06b9b1a) — SnapshotMetadata stores ipv6_prefix, snapshot restore path passes it to RoutedNetwork::with_ipv6_prefix()
  • Input validation (commit 06b9b1a) — validate_ipv6_prefix() checks for 4 colon-separated hex groups, called from preflight_check()
  • Documentation updates (commit bac740f) — README.md, DESIGN.md, and CLAUDE.md all updated for --ipv6-prefix, conditional MASQUERADE, and preflight_check signature change
  • Stale doc commentwith_ipv6_prefix no longer claims to validate format

No new issues found. The PR is clean and ready to merge.

Summary

All [MEDIUM] and [LOW] issues from previous reviews have been addressed across commits 06b9b1a and bac740f. The implementation is consistent: --ipv6-prefix is properly threaded through cold boot, snapshot create, and snapshot restore paths, with validation at preflight and conditional MASQUERADE throughout.


Review by Claude | Run

- 10 tests in routed.rs: prefix validation (valid/invalid/hex/full-addr),
  VM IPv6 generation (deterministic/format), parse_host_ipv6 (deprecated
  address filtering, link-local/ULA skipping, prefix extraction)
- 2 tests in snapshot.rs: ipv6_prefix roundtrip through SnapshotMetadata
  serde, backward compatibility with old snapshots missing the field
- 3 tests in test_state_manager.rs: load_state_by_pid found/not-found,
  stale state cleanup on retry (verifies dead PID files are removed)

Extracted parse_host_ipv6(output, check_onlink) from detect_host_ipv6()
for testability without shelling out to `ip addr`.
@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔍 Claude Review

SEVERITY: medium

Findings

[MEDIUM] ipv6_prefix not restored to vm_state.config during snapshot run

In src/commands/snapshot.rs (lines 789–798), the snapshot restore path restores many config fields from snapshot_config.metadata to vm_state.config:

vm_state.config.health_check_url = snapshot_config.metadata.health_check_url.clone();
vm_state.config.health_check_timeout = snapshot_config.metadata.health_check_timeout;
vm_state.config.username = snapshot_config.metadata.username.clone();
vm_state.config.user = snapshot_config.metadata.user.clone();
vm_state.config.port_mappings = port_mappings;
vm_state.config.network_mode = network_mode;
vm_state.config.tty = tty_mode;
vm_state.config.interactive = interactive;
// Missing: vm_state.config.ipv6_prefix = snapshot_config.metadata.ipv6_prefix.clone();

The ipv6_prefix IS correctly passed to RoutedNetwork::with_ipv6_prefix() (line 732) for networking setup, so the clone itself works. But the prefix is never written back to vm_state.config. This means:

  1. The clone's state file on disk has ipv6_prefix: null even though it was started with an explicit prefix.
  2. If someone creates a snapshot from this clone (fcvm snapshot create), build_snapshot_config() (common.rs:1285) reads vm_state.config.ipv6_prefix and saves None — breaking cascaded snapshot chains.
  3. Any tooling that reads the state file to understand the clone's networking config will see a missing prefix.

Fix: add vm_state.config.ipv6_prefix = snapshot_config.metadata.ipv6_prefix.clone(); alongside the other config restorations.

[LOW] README says --ipv6-prefix <CIDR> but the flag doesn't take CIDR notation

README.md line 339: --ipv6-prefix <CIDR>. The actual input format is 4 colon-separated hex groups (e.g., 2803:6084:7058:46f6), not CIDR (which would be 2803:6084:7058:46f6::/64). The CLI help in args.rs correctly says Example: --ipv6-prefix 2803:6084:7058:46f6.

Summary

The --ipv6-prefix networking works correctly for both cold boot and snapshot restore, but the clone's vm_state.config.ipv6_prefix isn't populated during snapshot run, which breaks cascaded snapshot chains and makes the state file inaccurate.


Review by Claude | Run

@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔧 Auto-Fix Created

I found issues and created a fix PR: #560

Please review and merge the fix PR first, then this PR.

View Claude Run

@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔧 CI Auto-Fix

Created fix PR: #561

Root cause: wait_for_port_forwarding() only checked TCP connect (proving pasta is listening), but after snapshot restore, pasta can accept connections before its L4 forwarding state stabilizes — returning 0 bytes instead of forwarding to nginx. The fix verifies end-to-end readiness by checking the connection stays open (not immediately closed with EOF).

View Claude Run

@ejc3 ejc3 force-pushed the routed-ipv6-prefix branch 2 times, most recently from ca2ef8c to 8feec2e Compare March 5, 2026 04:15
@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔍 Claude Review

SEVERITY: medium

Findings

[MEDIUM] Snapshot restore unconditionally skips MASQUERADE for all routed VMs

In src/commands/snapshot.rs (lines 731–738), the snapshot restore path always derives a prefix from guest_ipv6 and calls with_ipv6_prefix():

if let Some(ref guest_ipv6) = snapshot_config.metadata.network_config.guest_ipv6 {
    let addr_part = guest_ipv6.split('/').next().unwrap_or(guest_ipv6);
    if let Ok(ip) = addr_part.parse::<std::net::Ipv6Addr>() {
        let s = ip.segments();
        let prefix = format!("{:x}:{:x}:{:x}:{:x}", s[0], s[1], s[2], s[3]);
        net = net.with_ipv6_prefix(prefix);
    }
}

This unconditionally sets ipv6_prefix for every routed snapshot restore, because every routed VM has a guest_ipv6. The problem: when ipv6_prefix is set, setup() skips MASQUERADE (line 514: if self.ipv6_prefix.is_some() { ... skipping MASQUERADE ...}).

For VMs originally started with --ipv6-prefix (directly routable prefix), this is correct — no MASQUERADE needed. But for VMs originally started with auto-detect (no --ipv6-prefix, e.g. AWS VPC), MASQUERADE is required because the VPC fabric drops packets with unassigned source IPs. Snapshot restores of auto-detect VMs will silently produce clones with broken outbound IPv6.

This is already fixed in PR #560, which stores ipv6_prefix explicitly in SnapshotMetadata (only set when the user passed --ipv6-prefix) rather than deriving it from guest_ipv6.

The previously reported issues (previous review) — input validation and VmConfig propagation — are also addressed in PR #560.

Summary

The --ipv6-prefix feature works correctly for cold boot, but the snapshot restore path's approach of deriving the prefix from guest_ipv6 causes MASQUERADE to be skipped for ALL routed restores, breaking auto-detect (AWS VPC) VMs. PR #560 already fixes this. No additional fix PR needed.


Review by Claude | Run

@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔧 CI Auto-Fix

Created fix PR: #562

Problem: The ipv6_prefix field was added to RunArgs but the test helper test_args() in src/commands/podman/mod.rs was missing it, causing compilation failure across all test jobs.

Fix: Added ipv6_prefix: None to the test struct initializer.

View Claude Run

@ejc3 ejc3 force-pushed the routed-ipv6-prefix branch from 8feec2e to c1f6fab Compare March 5, 2026 04:28
@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔍 Claude Review

SEVERITY: medium

Findings

[MEDIUM] ipv6_prefix not restored to vm_state.config during snapshot run — still unfixed

As noted in previous review, src/commands/snapshot.rs lines 787–798 restore many config fields from snapshot_config.metadata to vm_state.config but omit ipv6_prefix. The networking setup works (prefix is passed to RoutedNetwork::with_ipv6_prefix() at line 731), but the clone's state file has ipv6_prefix: null, breaking cascaded snapshot chains via build_snapshot_config().

[LOW] README says <CIDR> but the flag takes a prefix — still unfixed

As noted in previous review, README.md line 339 says --ipv6-prefix <CIDR> but the actual input is 4 colon-separated hex groups (e.g., 2803:6084:7058:46f6), not CIDR notation. The CLI help in args.rs is correct.

Summary

The [MEDIUM] config propagation bug and [LOW] README mismatch from PR #560 (closed without merge) remain unfixed. Creating a fix PR.


Review by Claude | Run

@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔧 Auto-Fix Created

I found issues and created a fix PR: #563

Please review and merge the fix PR first, then this PR.

View Claude Run

Add the missing `vm_state.config.ipv6_prefix` restoration alongside
other config fields in the snapshot run path. Without this, cascaded
snapshots (creating a snapshot from a clone) lose the prefix because
`build_snapshot_config()` reads from `vm_state.config`.

Also fix README CLI reference: `<CIDR>` → `<PREFIX>` to match the
actual input format (4 colon-separated hex groups, not CIDR notation).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔍 Claude Review

SEVERITY: none

Findings

All previously reported issues are now fixed:

  • Snapshot restore propagationipv6_prefix stored in SnapshotMetadata, passed to RoutedNetwork::with_ipv6_prefix() during restore (snapshot.rs:731-733)
  • vm_state.config restorationipv6_prefix written to clone's config (snapshot.rs:797), enabling cascaded snapshot chains via build_snapshot_config() (commit 0953b6e)
  • Input validationvalidate_ipv6_prefix() checks for 4 colon-separated hex groups, called from preflight_check() (routed.rs:77-102)
  • Documentation — README.md (<PREFIX> not <CIDR>), DESIGN.md (struct field, conditional MASQUERADE), CLAUDE.md all updated (commits bac740f, 0953b6e)
  • Conditional MASQUERADE — setup and cleanup both correctly gated on ipv6_prefix.is_some() (routed.rs:559, 685)
  • Snapshot restore timeout — replaced 30s hard timeout with liveness-polling loop that waits indefinitely but detects Firecracker crashes (snapshot.rs:1110-1137)

No new issues found. The --ipv6-prefix feature is properly threaded through all paths: cold boot (prepare_vm), snapshot create (build_snapshot_config), and snapshot restore (cmd_snapshot_run). The stale state cleanup retry in load_state_by_pid has good test coverage (3 new tests).

Summary

The PR is clean. All 8 commits are consistent with the description, all previously flagged issues are resolved, and code/docs are in sync.


Review by Claude | Run

On first failure (or when FCVM_FORCE_DIAG=1), dump 16 diagnostic checks:
- Verbose curl via pasta (exact failure point)
- ss -tlnp and ss -tanp (listening + all TCP states)
- nsenter curl 10.0.2.100:80 (bypass pasta, test guest directly)
- nsenter ip neigh, bridge link, ss -tanp (namespace state)
- nsenter ping 10.0.2.100 (L3 reachability)
- exec in VM: nginx pids, localhost curl, connection count
- exec in VM: ss -tan, ARP, somaxconn, tcp_max_syn_backlog
- exec in VM: dmesg for TCP/conntrack errors
- Raw TCP via nc from namespace (bypass HTTP)
- /proc/net/sockstat in namespace (socket counts)
- Pasta fd count, VmRSS, thread count
- tcpdump on br0 port 80 while doing curl (captures actual RST)
- conntrack entries for port 80 in namespace
- Per-request timing logged every 5th request

The decisive test is nsenter curl: if it works but pasta curl doesn't,
the bug is in pasta's splice relay. If both fail, the bug is in the
guest's TCP stack after snapshot restore.
@claude-claude
Copy link
Contributor

claude-claude bot commented Mar 5, 2026

🔍 Claude Review

SEVERITY: none

Findings

All previously reported issues are now fixed:

  • Snapshot restore propagationipv6_prefix stored in SnapshotMetadata, passed to RoutedNetwork::with_ipv6_prefix() during restore (snapshot.rs:731-733)
  • vm_state.config restorationipv6_prefix written to clone's config (snapshot.rs:797), enabling cascaded snapshot chains
  • Input validationvalidate_ipv6_prefix() checks for 4 colon-separated hex groups, called from preflight_check() (routed.rs:77-102)
  • Documentation — README.md (<PREFIX> not <CIDR>), DESIGN.md (struct field, conditional MASQUERADE, self.preflight_check()), CLAUDE.md all updated
  • Conditional MASQUERADE — setup and cleanup both correctly gated on ipv6_prefix.is_some() (routed.rs:559, 685), proxy NDP correctly runs regardless (needed for both paths)

Latest commit 9d705f9 adds extensive diagnostics (16 checks) to the clone port forward stress test — pure diagnostic logging for debugging pasta-related failures, no functional changes.

[LOW] Duplicate comment numbering in test diagnosticstests/test_clone_port_forward_stress.rs has two items numbered "// 15." (lines 508 and 556); item 16 (line 618) should be 17. Cosmetic only, no functional impact.

Summary

The PR is clean and ready to merge. All 9 commits are consistent with the description, all previously flagged issues are resolved, and code/docs are in sync. The --ipv6-prefix feature is properly threaded through cold boot (prepare_vm), snapshot create (build_snapshot_config), and snapshot restore (cmd_snapshot_run).


Review by Claude | Run

@ejc3 ejc3 merged commit 72241a9 into main Mar 5, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant