Releases: ejc3/fcvm
Nested Kernel 6.18.3 (aarch64) - a427e53a7e7c
Nested kernel for running fcvm inside fcvm (nested virtualization).
Kernel Details
| Property | Value |
|---|---|
| Version | 6.18.3 |
| Build SHA | a427e53a7e7c |
| Architecture | aarch64 |
Features
- CONFIG_KVM=y - KVM hypervisor built-in for nested virtualization
- FUSE support - For volume mounts between host and guest
- MMFR4 override patch - Enables
arm64.nv2boot parameter for NV2 support
ARM64 Nested Virtualization (EL2)
This kernel enables recursive VM nesting on ARM64 using FEAT_NV2:
- EL2 - ARM Exception Level 2 (hypervisor mode), required for KVM
- VHE mode - Virtualization Host Extensions for efficient hypervisor
- NV2 - Nested Virtualization v2, allows guest kernels to run their own KVM
Requirements
- Host: ARM64 with FEAT_NV2 (AWS Graviton3+: c7g.metal, m7g.metal)
- Host kernel: 6.18+ with
kvm-arm.mode=nestedboot parameter
Usage
fcvm setup --kernel-profile nested
fcvm podman run --kernel-profile nested --privileged --name outer alpine:latest
# Inside VM: fcvm podman run --name inner alpine:latestNested Kernel 6.18.3 (aarch64) - 792d99c8975e
ARM64 nested virtualization kernel with NV2 support.
Built from:
- Kernel: 6.18.3
- Patches: kernel/patches-arm64/*.patch
- SHA: 792d99c8975e
Nested Kernel 6.18 (aarch64) - d452bc88a5c0
Nested kernel for running fcvm inside fcvm (nested virtualization).
Kernel Details
| Property | Value |
|---|---|
| Version | 6.18 |
| Build SHA | d452bc88a5c0 |
| Architecture | aarch64 |
Features
- CONFIG_KVM=y - KVM hypervisor built-in for nested virtualization
- FUSE support - For volume mounts between host and guest
- MMFR4 override patch - Enables
arm64.nv2boot parameter for NV2 support
ARM64 Nested Virtualization (EL2)
This kernel enables recursive VM nesting on ARM64 using FEAT_NV2:
- EL2 - ARM Exception Level 2 (hypervisor mode), required for KVM
- VHE mode - Virtualization Host Extensions for efficient hypervisor
- NV2 - Nested Virtualization v2, allows guest kernels to run their own KVM
Requirements
- Host: ARM64 with FEAT_NV2 (AWS Graviton3+: c7g.metal, m7g.metal)
- Host kernel: 6.18+ with
kvm-arm.mode=nestedboot parameter
Usage
fcvm setup --kernel-profile nested
fcvm podman run --kernel-profile nested --privileged --name outer alpine:latest
# Inside VM: fcvm podman run --name inner alpine:latestInception Kernel 6.18 (aarch64) - 2e9de4a64f3b
Inception kernel for running fcvm inside fcvm (nested virtualization).
Kernel Details
| Property | Value |
|---|---|
| Version | 6.18 |
| Build SHA | 2e9de4a64f3b |
| Architecture | aarch64 |
Features
- CONFIG_KVM=y - KVM hypervisor built-in for nested virtualization
- FUSE support - For volume mounts between host and guest
- MMFR4 override patch - Enables
arm64.nv2boot parameter for NV2 support
ARM64 Nested Virtualization (EL2)
This kernel enables recursive VM nesting on ARM64 using FEAT_NV2:
- EL2 - ARM Exception Level 2 (hypervisor mode), required for KVM
- VHE mode - Virtualization Host Extensions for efficient hypervisor
- NV2 - Nested Virtualization v2, allows guest kernels to run their own KVM
Requirements
- Host: ARM64 with FEAT_NV2 (AWS Graviton3+: c7g.metal, m7g.metal)
- Host kernel: 6.18+ with
kvm-arm.mode=nestedboot parameter
Usage
fcvm setup --inception
fcvm podman run --kernel <path> --privileged --name outer alpine:latest
# Inside VM: fcvm podman run --name inner alpine:latestInception Kernel 6.18 (cdf558e1c770)
Inception kernel for running fcvm inside fcvm (nested virtualization).
Kernel Details
| Property | Value |
|---|---|
| Version | 6.18.2 |
| Build SHA | cdf558e1c770 |
| Architecture | ARM64 (aarch64) |
| Page Size | 4K |
Features
- CONFIG_KVM=y - KVM hypervisor built-in for nested virtualization
- FUSE support - For volume mounts between host and guest
- MMFR4 override patch - Enables
arm64.nv2boot parameter to advertise NV2 support
ARM64 Nested Virtualization (EL2)
This kernel enables recursive VM nesting on ARM64 using FEAT_NV2:
- EL2 (Exception Level 2) - ARM hypervisor mode, required for KVM
- VHE mode (E2H=1) - Virtualization Host Extensions for efficient hypervisor
- NV2 - Nested Virtualization v2, allows guest kernels to run their own KVM
- Boot parameter
arm64.nv2overrides ID_AA64MMFR4_EL1 to advertise NV2 capability
Requirements
- Host: ARM64 with FEAT_NV2 (AWS Graviton3+: c7g.metal, m7g.metal)
- Host kernel: 6.18+ with
kvm-arm.mode=nestedboot parameter
Usage
# Download inception kernel
fcvm setup --inception
# Run VM with inception kernel for nested virtualization
fcvm podman run --kernel /mnt/fcvm-btrfs/kernels/vmlinux-inception-6.18-aarch64-cdf558e1c770.bin \
--privileged \
--name outer \
alpine:latest
# Inside the VM, run another fcvm
fcvm podman run --name inner alpine:latestv0.5.1 - Fix loopback IP allocation race condition
Bug Fixes
- Fix race condition in loopback IP allocation: When starting multiple VMs concurrently, there was a TOCTOU (time-of-check-time-of-use) race where VMs could all read the same "used" IPs and allocate the same unused IP. Now uses file locking to ensure atomic IP allocation with state persistence.
Changes
- Add
allocate_loopback_ip()to StateManager that holds exclusive file lock while persisting state - Add
with_loopback_ip()builder method to SlirpNetwork for pre-allocated IPs - Update podman.rs and snapshot.rs to allocate IP atomically before network setup
Testing
Verified with 100 concurrent clones - all received unique IPs (100% success rate).
Full Changelog: v0.5.0...v0.5.1
v0.5.0 - Security and Code Quality Improvements
Security Fixes
- Replace panic-prone
.to_str().unwrap()with safe error handling in path operations - Fix CString null byte panic in namespace handling
- Fix SystemTime panic for edge cases
- Add VM name validation to prevent path traversal and shell injection
Code Quality
- Remove 117 lines of unused readiness module
- Extract magic numbers to named constants
- Fix doctest for architecture diagram
Verification
- All 43 tests pass
- Cargo clippy clean
- Stress test: 3/3 VMs healthy (100%), 519ms time-to-first-response
v0.4.0 - True Rootless Networking with slirp4netns
Summary
This release adds true rootless networking support using slirp4netns, enabling fcvm to run containers in microVMs without requiring root privileges for network setup.
Highlights
True Rootless Networking
- slirp4netns integration: User namespace + network namespace isolation without root
- No CAP_NET_ADMIN required: Network setup happens entirely in userspace
- --network flag: Choose between
bridged(default, requires root) orrootlessmodes
Simplified IP Architecture
- Fixed guest subnet (192.168.1.0/24): Each VM runs in isolated user namespace, no IP conflicts
- Sequential loopback allocation: Health check IPs (127.0.0.2, .3, .4...) allocated sequentially on host
- Removed hash-based allocation: Simpler, more predictable IP assignment
Health Checks for Rootless VMs
- Unique loopback IPs per VM enable health checks without veth pairs
- Port forwarding via slirp4netns API socket
- Works for both baseline VMs and snapshot clones
Usage
# Rootless VM (no root required)
fcvm podman run --network rootless nginx:alpine
# Rootless clone from snapshot
fcvm snapshot run --pid <serve_pid> --network rootlessChanges
- Add slirp4netns rootless networking with health check port forwarding
- Simplify rootless networking: fixed guest subnet, sequential loopback allocation
- Add
with_guest_ip()builder method for clones to restore snapshot guest IPs - Update NetworkManager trait with
post_start()for deferred slirp4netns startup
Testing
Stress tested with 3 concurrent clones:
- 100% success rate (3/3 healthy)
- ~519ms time to first response
- ~10,000 page faults/sec served by UFFD memory server
v0.3.0: Type Safety, Security, and Performance
Major improvements to code quality, type safety, and security.
Highlights
- 10 clones, 100% healthy in ~1 second - Stress tested and verified
- reqwest .interface() - Proper network interface binding for health checks
- Type Safety - NetworkConfig and ProcessType converted to typed structs/enums
- Security Fixes - TOCTOU race conditions and UFFD bounds checking
New Features
--base-dirCLI argument for custom storage location- Schema version tracking for state files
- UFFD and snapshot test suite (36 tests total)
Improvements
- Replace curl subprocess with reqwest using SO_BINDTODEVICE
- Typed NetworkConfig struct eliminates JSON field access errors
- ProcessType enum prevents invalid process type strings
- Network namespace cleanup on setup failure
- Simplified health monitor log throttling
- Fix VM ID slice panic with truncate_id() helper
- Fix clippy borrowed_box warnings
Security
- UFFD bounds checking prevents out-of-bounds memory access
- UFFD mapping validation catches malformed regions
- TOCTOU race condition fixes for socket and file operations
Performance
- All 10 clones healthy in 518-1037ms (avg 774ms)
- UFFD serves ~6000-7000 page faults/second per VM
- Memory sharing via kernel page cache working correctly
Commits since v0.2.0
- Add UFFD and snapshot tests
- Fix clippy borrowed_box warnings
- Simplify health monitor log throttling
- Add network namespace cleanup on setup failure
- Add --base-dir CLI argument for custom storage location
- Convert ProcessType from String to enum for type safety
- Use typed NetworkConfig struct for type safety
- Fix TOCTOU race conditions in socket and file operations
- Add UFFD bounds checking and mapping validation
- Fix VM ID slice panic, add cleanup logging, add schema version
- Replace curl with reqwest using .interface() for health checks
- Return non-zero exit code when stress test has failures
- Fix clone networking by removing IP from namespace bridge
- Add UFFD process CPU tracking to stress test monitor
- Add MMDS-based ARP flush for fast clone restore
- Remove unnecessary default route setup in network namespace
- Fix network namespace cleanup and add hierarchical logging
v0.2.0 - Network Namespace Isolation & Stress Testing
fcvm v0.2.0 - Network Namespace Isolation & Stress Testing
Major release adding production-ready network isolation, comprehensive stress testing infrastructure, and significant performance optimizations for snapshot cloning.
Performance Results 🚀
Tested on c6g.metal (64 ARM cores, 125GB RAM) with nginx:alpine containers:
| Configuration | VMs | Success Rate | Avg Time-to-Health | Clone Time |
|---|---|---|---|---|
| Batched (10 at a time) | 100 | 100% | 1,629ms | 28ms |
| Single batch | 50 | 100% | 3,960ms | 28ms |
| Single batch | 100 | 97% | 14,412ms | 28ms |
Key Finding: UFFD is NOT the Bottleneck
With 100 VMs running in parallel, the UFFD memory server peaked at only 39.7% CPU. The actual bottleneck is overall system CPU contention (200 vCPUs competing for 64 cores), not memory page serving.
03:47:44 | Load: 46.61 | VMs: 100 | UFFD CPU: 39.7% | Mem: 12Gi/125Gi
New Features
Network Namespace Isolation
- Each VM runs in its own network namespace with dedicated veth pair
- Enables multiple clones with identical guest IPs (172.30.77.190) without conflicts
- Health checks route correctly via
curl --interface <veth>binding - Automatic cleanup of namespaces, veth pairs, and TAP devices on VM exit
Stress Test Infrastructure (fcvm test stress)
- Configurable parallel VM spawning with batch control
- Real-time system monitoring (CPU load, memory, UFFD CPU usage)
- Time-to-health metrics for each VM
- Comprehensive summary with success rates and timing statistics
MMDS-Based ARP Flush for Fast Clone Restore
- Host injects
restore-epochinto MMDS before resuming clones - fc-agent watches for epoch changes and flushes ARP cache
- Eliminates stale ARP entries that caused 5+ second delays
- Sub-second network connectivity after snapshot restore
PID-Based Process Management
- Track fcvm process PIDs (not Firecracker PIDs) for reliable lifecycle management
fcvm snapshot servesaves its PID for clone coordination- Automatic cleanup: killing serve process terminates all connected clones
fcvm ls --pid <pid>for direct process lookup
Code Quality Improvements
Removed Dead Code
- Unimplemented stub commands (stop, logs, inspect, top)
- Unimplemented readiness gates (vsock, log, exec)
- Simplified to rootless-only networking (removed privileged mode)
Added Test Coverage
tests/test_health_monitor.rs- Health check logictests/test_state_manager.rs- State persistence- Clippy warnings fixed throughout
Hierarchical Logging
- Clean log output with target tags:
[vm:] [firecracker:] [health-monitor:] - Firecracker timestamp stripping for cleaner output
- ANSI color codes disabled when output is piped
Breaking Changes
- Removed
--mode privileged(rootless-only now) - Removed
fcvm stop,fcvm logs,fcvm inspect,fcvm top(were unimplemented stubs) - Network config changed to 172.30.x.x/30 subnets
Stats
- 40 commits
- 48 files changed
- +3,825 / -1,821 lines
What's Next
- Investigate CPU contention for 100+ parallel VMs
- Add rate limiting for clone spawning
- Port mapping through namespaces
- Volume mounting support
🤖 Generated with Claude Code