Skip to content

Releases: ejc3/fcvm

Nested Kernel 6.18.3 (aarch64) - a427e53a7e7c

04 Jan 08:52

Choose a tag to compare

Nested kernel for running fcvm inside fcvm (nested virtualization).

Kernel Details

Property Value
Version 6.18.3
Build SHA a427e53a7e7c
Architecture aarch64

Features

  • CONFIG_KVM=y - KVM hypervisor built-in for nested virtualization
  • FUSE support - For volume mounts between host and guest
  • MMFR4 override patch - Enables arm64.nv2 boot parameter for NV2 support

ARM64 Nested Virtualization (EL2)

This kernel enables recursive VM nesting on ARM64 using FEAT_NV2:

  • EL2 - ARM Exception Level 2 (hypervisor mode), required for KVM
  • VHE mode - Virtualization Host Extensions for efficient hypervisor
  • NV2 - Nested Virtualization v2, allows guest kernels to run their own KVM

Requirements

  • Host: ARM64 with FEAT_NV2 (AWS Graviton3+: c7g.metal, m7g.metal)
  • Host kernel: 6.18+ with kvm-arm.mode=nested boot parameter

Usage

fcvm setup --kernel-profile nested
fcvm podman run --kernel-profile nested --privileged --name outer alpine:latest
# Inside VM: fcvm podman run --name inner alpine:latest

Nested Kernel 6.18.3 (aarch64) - 792d99c8975e

04 Jan 15:44

Choose a tag to compare

ARM64 nested virtualization kernel with NV2 support.

Built from:

  • Kernel: 6.18.3
  • Patches: kernel/patches-arm64/*.patch
  • SHA: 792d99c8975e

Nested Kernel 6.18 (aarch64) - d452bc88a5c0

02 Jan 07:47

Choose a tag to compare

Nested kernel for running fcvm inside fcvm (nested virtualization).

Kernel Details

Property Value
Version 6.18
Build SHA d452bc88a5c0
Architecture aarch64

Features

  • CONFIG_KVM=y - KVM hypervisor built-in for nested virtualization
  • FUSE support - For volume mounts between host and guest
  • MMFR4 override patch - Enables arm64.nv2 boot parameter for NV2 support

ARM64 Nested Virtualization (EL2)

This kernel enables recursive VM nesting on ARM64 using FEAT_NV2:

  • EL2 - ARM Exception Level 2 (hypervisor mode), required for KVM
  • VHE mode - Virtualization Host Extensions for efficient hypervisor
  • NV2 - Nested Virtualization v2, allows guest kernels to run their own KVM

Requirements

  • Host: ARM64 with FEAT_NV2 (AWS Graviton3+: c7g.metal, m7g.metal)
  • Host kernel: 6.18+ with kvm-arm.mode=nested boot parameter

Usage

fcvm setup --kernel-profile nested
fcvm podman run --kernel-profile nested --privileged --name outer alpine:latest
# Inside VM: fcvm podman run --name inner alpine:latest

Inception Kernel 6.18 (aarch64) - 2e9de4a64f3b

01 Jan 19:52

Choose a tag to compare

Inception kernel for running fcvm inside fcvm (nested virtualization).

Kernel Details

Property Value
Version 6.18
Build SHA 2e9de4a64f3b
Architecture aarch64

Features

  • CONFIG_KVM=y - KVM hypervisor built-in for nested virtualization
  • FUSE support - For volume mounts between host and guest
  • MMFR4 override patch - Enables arm64.nv2 boot parameter for NV2 support

ARM64 Nested Virtualization (EL2)

This kernel enables recursive VM nesting on ARM64 using FEAT_NV2:

  • EL2 - ARM Exception Level 2 (hypervisor mode), required for KVM
  • VHE mode - Virtualization Host Extensions for efficient hypervisor
  • NV2 - Nested Virtualization v2, allows guest kernels to run their own KVM

Requirements

  • Host: ARM64 with FEAT_NV2 (AWS Graviton3+: c7g.metal, m7g.metal)
  • Host kernel: 6.18+ with kvm-arm.mode=nested boot parameter

Usage

fcvm setup --inception
fcvm podman run --kernel <path> --privileged --name outer alpine:latest
# Inside VM: fcvm podman run --name inner alpine:latest

Inception Kernel 6.18 (cdf558e1c770)

31 Dec 21:06

Choose a tag to compare

Inception kernel for running fcvm inside fcvm (nested virtualization).

Kernel Details

Property Value
Version 6.18.2
Build SHA cdf558e1c770
Architecture ARM64 (aarch64)
Page Size 4K

Features

  • CONFIG_KVM=y - KVM hypervisor built-in for nested virtualization
  • FUSE support - For volume mounts between host and guest
  • MMFR4 override patch - Enables arm64.nv2 boot parameter to advertise NV2 support

ARM64 Nested Virtualization (EL2)

This kernel enables recursive VM nesting on ARM64 using FEAT_NV2:

  • EL2 (Exception Level 2) - ARM hypervisor mode, required for KVM
  • VHE mode (E2H=1) - Virtualization Host Extensions for efficient hypervisor
  • NV2 - Nested Virtualization v2, allows guest kernels to run their own KVM
  • Boot parameter arm64.nv2 overrides ID_AA64MMFR4_EL1 to advertise NV2 capability

Requirements

  • Host: ARM64 with FEAT_NV2 (AWS Graviton3+: c7g.metal, m7g.metal)
  • Host kernel: 6.18+ with kvm-arm.mode=nested boot parameter

Usage

# Download inception kernel
fcvm setup --inception

# Run VM with inception kernel for nested virtualization
fcvm podman run --kernel /mnt/fcvm-btrfs/kernels/vmlinux-inception-6.18-aarch64-cdf558e1c770.bin \
    --privileged \
    --name outer \
    alpine:latest

# Inside the VM, run another fcvm
fcvm podman run --name inner alpine:latest

v0.5.1 - Fix loopback IP allocation race condition

25 Nov 11:34

Choose a tag to compare

Bug Fixes

  • Fix race condition in loopback IP allocation: When starting multiple VMs concurrently, there was a TOCTOU (time-of-check-time-of-use) race where VMs could all read the same "used" IPs and allocate the same unused IP. Now uses file locking to ensure atomic IP allocation with state persistence.

Changes

  • Add allocate_loopback_ip() to StateManager that holds exclusive file lock while persisting state
  • Add with_loopback_ip() builder method to SlirpNetwork for pre-allocated IPs
  • Update podman.rs and snapshot.rs to allocate IP atomically before network setup

Testing

Verified with 100 concurrent clones - all received unique IPs (100% success rate).

Full Changelog: v0.5.0...v0.5.1

v0.5.0 - Security and Code Quality Improvements

25 Nov 10:48

Choose a tag to compare

Security Fixes

  • Replace panic-prone .to_str().unwrap() with safe error handling in path operations
  • Fix CString null byte panic in namespace handling
  • Fix SystemTime panic for edge cases
  • Add VM name validation to prevent path traversal and shell injection

Code Quality

  • Remove 117 lines of unused readiness module
  • Extract magic numbers to named constants
  • Fix doctest for architecture diagram

Verification

  • All 43 tests pass
  • Cargo clippy clean
  • Stress test: 3/3 VMs healthy (100%), 519ms time-to-first-response

v0.4.0 - True Rootless Networking with slirp4netns

25 Nov 10:21

Choose a tag to compare

Summary

This release adds true rootless networking support using slirp4netns, enabling fcvm to run containers in microVMs without requiring root privileges for network setup.

Highlights

True Rootless Networking

  • slirp4netns integration: User namespace + network namespace isolation without root
  • No CAP_NET_ADMIN required: Network setup happens entirely in userspace
  • --network flag: Choose between bridged (default, requires root) or rootless modes

Simplified IP Architecture

  • Fixed guest subnet (192.168.1.0/24): Each VM runs in isolated user namespace, no IP conflicts
  • Sequential loopback allocation: Health check IPs (127.0.0.2, .3, .4...) allocated sequentially on host
  • Removed hash-based allocation: Simpler, more predictable IP assignment

Health Checks for Rootless VMs

  • Unique loopback IPs per VM enable health checks without veth pairs
  • Port forwarding via slirp4netns API socket
  • Works for both baseline VMs and snapshot clones

Usage

# Rootless VM (no root required)
fcvm podman run --network rootless nginx:alpine

# Rootless clone from snapshot
fcvm snapshot run --pid <serve_pid> --network rootless

Changes

  • Add slirp4netns rootless networking with health check port forwarding
  • Simplify rootless networking: fixed guest subnet, sequential loopback allocation
  • Add with_guest_ip() builder method for clones to restore snapshot guest IPs
  • Update NetworkManager trait with post_start() for deferred slirp4netns startup

Testing

Stress tested with 3 concurrent clones:

  • 100% success rate (3/3 healthy)
  • ~519ms time to first response
  • ~10,000 page faults/sec served by UFFD memory server

v0.3.0: Type Safety, Security, and Performance

25 Nov 07:34

Choose a tag to compare

Major improvements to code quality, type safety, and security.

Highlights

  • 10 clones, 100% healthy in ~1 second - Stress tested and verified
  • reqwest .interface() - Proper network interface binding for health checks
  • Type Safety - NetworkConfig and ProcessType converted to typed structs/enums
  • Security Fixes - TOCTOU race conditions and UFFD bounds checking

New Features

  • --base-dir CLI argument for custom storage location
  • Schema version tracking for state files
  • UFFD and snapshot test suite (36 tests total)

Improvements

  • Replace curl subprocess with reqwest using SO_BINDTODEVICE
  • Typed NetworkConfig struct eliminates JSON field access errors
  • ProcessType enum prevents invalid process type strings
  • Network namespace cleanup on setup failure
  • Simplified health monitor log throttling
  • Fix VM ID slice panic with truncate_id() helper
  • Fix clippy borrowed_box warnings

Security

  • UFFD bounds checking prevents out-of-bounds memory access
  • UFFD mapping validation catches malformed regions
  • TOCTOU race condition fixes for socket and file operations

Performance

  • All 10 clones healthy in 518-1037ms (avg 774ms)
  • UFFD serves ~6000-7000 page faults/second per VM
  • Memory sharing via kernel page cache working correctly

Commits since v0.2.0

  • Add UFFD and snapshot tests
  • Fix clippy borrowed_box warnings
  • Simplify health monitor log throttling
  • Add network namespace cleanup on setup failure
  • Add --base-dir CLI argument for custom storage location
  • Convert ProcessType from String to enum for type safety
  • Use typed NetworkConfig struct for type safety
  • Fix TOCTOU race conditions in socket and file operations
  • Add UFFD bounds checking and mapping validation
  • Fix VM ID slice panic, add cleanup logging, add schema version
  • Replace curl with reqwest using .interface() for health checks
  • Return non-zero exit code when stress test has failures
  • Fix clone networking by removing IP from namespace bridge
  • Add UFFD process CPU tracking to stress test monitor
  • Add MMDS-based ARP flush for fast clone restore
  • Remove unnecessary default route setup in network namespace
  • Fix network namespace cleanup and add hierarchical logging

v0.2.0 - Network Namespace Isolation & Stress Testing

25 Nov 03:53

Choose a tag to compare

fcvm v0.2.0 - Network Namespace Isolation & Stress Testing

Major release adding production-ready network isolation, comprehensive stress testing infrastructure, and significant performance optimizations for snapshot cloning.

Performance Results 🚀

Tested on c6g.metal (64 ARM cores, 125GB RAM) with nginx:alpine containers:

Configuration VMs Success Rate Avg Time-to-Health Clone Time
Batched (10 at a time) 100 100% 1,629ms 28ms
Single batch 50 100% 3,960ms 28ms
Single batch 100 97% 14,412ms 28ms

Key Finding: UFFD is NOT the Bottleneck

With 100 VMs running in parallel, the UFFD memory server peaked at only 39.7% CPU. The actual bottleneck is overall system CPU contention (200 vCPUs competing for 64 cores), not memory page serving.

03:47:44 | Load: 46.61 | VMs: 100 | UFFD CPU: 39.7% | Mem: 12Gi/125Gi

New Features

Network Namespace Isolation

  • Each VM runs in its own network namespace with dedicated veth pair
  • Enables multiple clones with identical guest IPs (172.30.77.190) without conflicts
  • Health checks route correctly via curl --interface <veth> binding
  • Automatic cleanup of namespaces, veth pairs, and TAP devices on VM exit

Stress Test Infrastructure (fcvm test stress)

  • Configurable parallel VM spawning with batch control
  • Real-time system monitoring (CPU load, memory, UFFD CPU usage)
  • Time-to-health metrics for each VM
  • Comprehensive summary with success rates and timing statistics

MMDS-Based ARP Flush for Fast Clone Restore

  • Host injects restore-epoch into MMDS before resuming clones
  • fc-agent watches for epoch changes and flushes ARP cache
  • Eliminates stale ARP entries that caused 5+ second delays
  • Sub-second network connectivity after snapshot restore

PID-Based Process Management

  • Track fcvm process PIDs (not Firecracker PIDs) for reliable lifecycle management
  • fcvm snapshot serve saves its PID for clone coordination
  • Automatic cleanup: killing serve process terminates all connected clones
  • fcvm ls --pid <pid> for direct process lookup

Code Quality Improvements

Removed Dead Code

  • Unimplemented stub commands (stop, logs, inspect, top)
  • Unimplemented readiness gates (vsock, log, exec)
  • Simplified to rootless-only networking (removed privileged mode)

Added Test Coverage

  • tests/test_health_monitor.rs - Health check logic
  • tests/test_state_manager.rs - State persistence
  • Clippy warnings fixed throughout

Hierarchical Logging

  • Clean log output with target tags: [vm:] [firecracker:] [health-monitor:]
  • Firecracker timestamp stripping for cleaner output
  • ANSI color codes disabled when output is piped

Breaking Changes

  • Removed --mode privileged (rootless-only now)
  • Removed fcvm stop, fcvm logs, fcvm inspect, fcvm top (were unimplemented stubs)
  • Network config changed to 172.30.x.x/30 subnets

Stats

  • 40 commits
  • 48 files changed
  • +3,825 / -1,821 lines

What's Next

  • Investigate CPU contention for 100+ parallel VMs
  • Add rate limiting for clone spawning
  • Port mapping through namespaces
  • Volume mounting support

🤖 Generated with Claude Code