Skip to content

Latest commit

 

History

History
635 lines (498 loc) · 21.4 KB

File metadata and controls

635 lines (498 loc) · 21.4 KB

Lattice Mesh Protocol (LMP)

Security Enhancements

This document collects security hardening ideas and design notes that complement the normative rules in MAIN.md.

Table of contents

1. Enhanced Handshake Security

Problem: Introduction token theft enables MITM before handshake completion

Solution: PAKE-augmented initial contact

// Add SPAKE2+ for introduction token protection
struct IntroductionToken {
    bob_ltik_pub: PublicKey,
    bob_mtsk_pub: PublicKey,
    temp_dht_address: Address,
    expiration: Timestamp,
    pake_salt: [u8; 32],  // NEW: SPAKE2+ salt
    signature: Signature,
}

// Alice initiates with password-based authentication
impl ClientHello {
    fn new_with_pake(
        intro_token: &IntroductionToken,
        shared_secret: &str  // From QR code or out-of-band
    ) -> Self {
        // SPAKE2+ prevents MITM even if token stolen
        let pake = SPAKE2Plus::new(
            b"LMP-introduction",
            shared_secret.as_bytes(),
            &intro_token.pake_salt
        );
        
        let (pake_msg, pake_state) = pake.start();
        
        ClientHello {
            // ... existing fields ...
            pake_message: pake_msg,  // NEW
            pake_commitment: hash(pake_state),  // NEW
        }
    }
}

Benefits:

Even if introduction token leaked, attacker needs shared secret from QR code Backward compatible: falls back to signature-only if PAKE not supported Adds ~50ms latency, acceptable for handshake

2. Adaptive Timestamp Tolerance

Problem: 60-second timestamp window too strict for high-latency mesh networks

Solution: Sliding Window with Network Condition Adaptation

struct TimestampValidator {
    base_tolerance: Duration,
    adaptive_bonus: Duration,
    recent_latencies: RingBuffer<Duration, 100>,
}

impl TimestampValidator {
    fn new() -> Self {
        Self {
            base_tolerance: Duration::from_secs(60),
            adaptive_bonus: Duration::from_secs(0),
            recent_latencies: RingBuffer::new(),
        }
    }
    
    fn validate(&mut self, msg_timestamp: i64) -> bool {
        let now = current_timestamp();
        let observed_latency = (now - msg_timestamp).abs();
        
        // Update adaptive tolerance based on network conditions
        self.recent_latencies.push(Duration::from_millis(observed_latency as u64));
        let p95_latency = self.percentile(0.95);
        
        // Allow timestamp if within base + adaptive window
        let max_tolerance = self.base_tolerance + self.adaptive_bonus;
        
        if observed_latency <= max_tolerance.as_millis() as i64 {
            true
        } else {
            // Check if network degraded recently
            if p95_latency > Duration::from_secs(30) {
                self.adaptive_bonus = p95_latency; // Increase tolerance
                log::warn!("High network latency detected, adjusting tolerance to {}s", 
                          (self.base_tolerance + self.adaptive_bonus).as_secs());
            }
            false
        }
    }
    
    fn percentile(&self, p: f64) -> Duration {
        let mut sorted = self.recent_latencies.to_vec();
        sorted.sort();
        sorted[(sorted.len() as f64 * p) as usize]
    }
}

Benefits:

Adapts to degraded network conditions automatically Maintains strict validation (60s) under normal conditions Prevents false rejections during mesh congestion

3. Multi-Device Nonce Collision Prevention

Problem: Multiple devices with same conversation_id could reuse (key, nonce) pairs

Solution: Device-Specific Nonce Derivation

// Nonce is 96 bits (12 bytes) for ChaCha20-Poly1305.
// Derive a deterministic 32-bit prefix from (conversation_id, sender_device_id), then append a 64-bit counter.
// This avoids cross-device collisions when multiple devices participate in the same conversation.
fn derive_nonce(conversation_id: &[u8; 32], sender_device_id: &[u8; 16], message_number: u64) -> [u8; 12] {
    let prefix32 = hkdf_sha3_256_extract(
        &[conversation_id.as_slice(), sender_device_id.as_slice()].concat(),
        b"LMP-nonce-prefix",
        4,
    );

    let mut nonce = [0u8; 12];
    nonce[0..4].copy_from_slice(&prefix32);
    nonce[4..12].copy_from_slice(&message_number.to_le_bytes());
    nonce
}

Benefits:

Mathematically prevents nonce reuse across devices Same conversation, different devices → different nonces guaranteed Maintains 64-bit message counter space per device

Trade-off: Slightly higher implementation complexity due to deterministic prefix derivation

4. Ratchet-State Authenticated Replay Protection

Problem: Attacker could replay old messages from previous ratchet iterations during out-of-order delivery

Solution: Ratchet-Bound MAC with Forward Processing

struct MessageHeader {
    conversation_id: [u8; 32],
    sender_device_id: [u8; 16],
    message_number: u64,
    prev_chain_length: u64,
    ratchet_id: [u8; 16],  // NEW: Unique per ratchet iteration
    timestamp: i64,
    eph_x25519_pub: Option<[u8; 32]>,
    eph_kyber_pub: Option<Vec<u8>>,
    mac: [u8; 16],  // Now covers ratchet_id
}

impl RatchetState {
    fn new_iteration(&mut self) -> [u8; 16] {
        // Generate unique ratchet ID
        let mut ratchet_id = [0u8; 16];
        let input = [
            &self.root_key[..],
            &self.iteration_count.to_le_bytes(),
            &current_timestamp().to_le_bytes()
        ].concat();
        
        hkdf_sha3_256_extract(
            &input,
            b"ratchet-iteration-id",
            16
        ).copy_to_slice(&mut ratchet_id);
        
        self.current_ratchet_id = ratchet_id;
        self.iteration_count += 1;
        ratchet_id
    }
    
    fn validate_message(&self, header: &MessageHeader) -> bool {
        // Reject if ratchet_id doesn't match current or recent iterations
        if header.ratchet_id != self.current_ratchet_id &&
           !self.recent_ratchet_ids.contains(&header.ratchet_id) {
            log::warn!("Message from unknown ratchet iteration, possible replay");
            return false;
        }
        
        // Verify MAC includes ratchet_id (prevents cross-ratchet replay)
        self.verify_mac(header)
    }
}

// Forward-processing: Accept new ratchet even if old messages pending
impl MessageProcessor {
    fn handle_out_of_order_ratchet(&mut self, new_ratchet_msg: Message) {
        // Don't wait for old messages; advance ratchet immediately
        log::info!("New ratchet detected with {} skipped messages, advancing", 
                  new_ratchet_msg.header.message_number);
        
        // Mark old messages as permanently lost
        self.mark_messages_lost(
            self.current_ratchet_id,
            self.last_received_msg..new_ratchet_msg.header.message_number
        );
        
        // Process new ratchet
        self.ratchet_state.advance(new_ratchet_msg);
    }
}

Benefits:

Prevents cross-ratchet replay attacks Attacker cannot replay M_old from ratchet_N during ratchet_N+1 Forward-processing prevents temporary state confusion Old messages explicitly marked lost rather than held indefinitely

5. Skipped Message Key DoS Mitigation

Problem: Attacker floods with gaps just below MAX_SKIP repeatedly, exhausting memory

Solution: Per-Sender Rate Limiting + Exponential Backoff

struct SkippedKeyManager {
    skipped_keys: HashMap<MessageId, MessageKey>,
    sender_skip_counts: HashMap<DeviceId, SkipStats>,
    global_skip_limit: usize,  // 1000
    per_sender_limit: usize,   // 100 per sender
}

struct SkipStats {
    total_skips: usize,
    last_skip_time: Instant,
    consecutive_skips: usize,
}

impl SkippedKeyManager {
    fn try_skip(&mut self, 
                sender: &DeviceId, 
                skip_count: usize) -> Result<(), SkipError> {
        
        let stats = self.sender_skip_counts
            .entry(*sender)
            .or_insert(SkipStats::default());
        
        // Check per-sender limit
        if stats.total_skips + skip_count > self.per_sender_limit {
            log::warn!("Sender {} exceeded skip limit, possible DoS", sender);
            return Err(SkipError::SenderLimitExceeded);
        }
        
        // Check global limit
        if self.skipped_keys.len() + skip_count > self.global_skip_limit {
            log::warn!("Global skip limit reached, rejecting message");
            return Err(SkipError::GlobalLimitExceeded);
        }
        
        // Exponential backoff for consecutive skips
        let min_delay = Duration::from_millis(100 * (1 << stats.consecutive_skips));
        if stats.last_skip_time.elapsed() < min_delay {
            log::warn!("Sender {} skipping too rapidly, applying backoff", sender);
            return Err(SkipError::RateLimited);
        }
        
        // Update stats
        stats.total_skips += skip_count;
        stats.last_skip_time = Instant::now();
        stats.consecutive_skips += 1;
        
        Ok(())
    }
    
    fn on_successful_decrypt(&mut self, sender: &DeviceId) {
        // Reset consecutive counter on normal message
        if let Some(stats) = self.sender_skip_counts.get_mut(sender) {
            stats.consecutive_skips = 0;
        }
    }
}

Benefits:

Limits attacker impact to 100 skipped keys per sender Exponential backoff prevents rapid skip flooding Normal out-of-order delivery unaffected (resets on successful decryption)

6. Multi-Path Mesh Routing

Problem: 30% compromised relays could correlate traffic or disrupt delivery

Solution: Redundant Parallel Paths + Path Diversity

struct MeshRouter {
    active_paths: HashMap<ConversationId, Vec<Path>>,
    path_metrics: HashMap<PathId, PathMetrics>,
}

struct Path {
    id: PathId,
    hops: Vec<NodeId>,
    latency: Duration,
    reliability: f64,
}

impl MeshRouter {
    fn send_message(&mut self, msg: &Message, conversation_id: &ConversationId) {
        let paths = self.select_diverse_paths(conversation_id, 3);  // 3 parallel paths
        
        for (i, path) in paths.iter().enumerate() {
            let fragment = if i == 0 {
                msg.clone()  // Primary: full message
            } else {
                msg.create_redundancy_shard(i)  // Secondary: erasure-coded shards
            };
            
            self.route_via_path(fragment, path);
        }
    }
    
    fn select_diverse_paths(&self, 
                            conversation_id: &ConversationId, 
                            count: usize) -> Vec<Path> {
        let mut paths = Vec::new();
        let mut used_nodes = HashSet::new();
        
        for _ in 0..count {
            // Select path with minimal node overlap
            let candidate = self.find_path_avoiding(&used_nodes);
            
            for hop in &candidate.hops {
                used_nodes.insert(*hop);
            }
            
            paths.push(candidate);
        }
        
        paths
    }
    
    fn find_path_avoiding(&self, excluded_nodes: &HashSet<NodeId>) -> Path {
        // Dijkstra's algorithm with node exclusion
        // Prioritize: low latency, high reliability, disjoint from excluded
        // ...
    }
}

// Erasure coding for redundancy
impl Message {
    fn create_redundancy_shard(&self, shard_id: usize) -> Message {
        // Use Reed-Solomon: 2 of 3 shards sufficient to reconstruct
        let encoder = ReedSolomon::new(2, 1).unwrap();
        let shards = encoder.encode(&self.payload);
        
        Message {
            payload: shards[shard_id].clone(),
            shard_metadata: ShardMetadata {
                shard_id,
                total_shards: 3,
                reconstruction_needed: 2,
            },
            ..self.clone()
        }
    }
}

Benefits:

Reduces correlation risk: Attacker needs to compromise multiple disjoint paths Improves reliability: Message delivered if any 2 of 3 paths succeed Increases cost for attackers: Must control 30% of nodes on multiple parallel paths

Trade-offs:

3x bandwidth usage (mitigated by erasure coding) Slightly higher latency (wait for 2 of 3 paths)

7. Adaptive Cover Traffic

Problem: Regular patterns in cover traffic may still leak metadata during long-term observation

Solution: Behavioral Mimicking + Random Bursts

struct CoverTrafficGenerator {
    baseline_rate: f64,  // 1 cell per 5 min
    user_activity_profile: ActivityProfile,
    last_burst: Instant,
}

struct ActivityProfile {
    hourly_distribution: [f64; 24],  // Probability per hour
    burst_probability: f64,
    burst_size_range: (usize, usize),
}

impl CoverTrafficGenerator {
    fn generate_schedule(&mut self) -> Vec<Instant> {
        let mut schedule = Vec::new();
        let now = Instant::now();
        
        // Baseline Poisson process
        let mut next_event = now + self.poisson_sample(self.baseline_rate);
        
        for _ in 0..100 {  // Next 100 events
            schedule.push(next_event);
            next_event += self.poisson_sample(self.baseline_rate);
        }
        
        // Add random bursts (mimic user typing sessions)
        if rand::random::<f64>() < self.user_activity_profile.burst_probability {
            let burst_time = now + Duration::from_secs(rand::gen_range(0..3600));
            let burst_size = rand::gen_range(
                self.user_activity_profile.burst_size_range.0..
                self.user_activity_profile.burst_size_range.1
            );
            
            for i in 0..burst_size {
                schedule.push(burst_time + Duration::from_millis(i * 500));
            }
        }
        
        // Sort and deduplicate
        schedule.sort();
        schedule.dedup();
        schedule
    }
    
    fn adapt_to_user_behavior(&mut self, real_message_times: &[Instant]) {
        // Learn user's messaging pattern
        let mut hourly_counts = [0usize; 24];
        
        for time in real_message_times {
            let hour = time.hour();
            hourly_counts[hour] += 1;
        }
        
        // Normalize to probability distribution
        let total: usize = hourly_counts.iter().sum();
        self.user_activity_profile.hourly_distribution = 
            hourly_counts.map(|c| c as f64 / total as f64);
        
        log::info!("Cover traffic adapted to user's activity pattern");
    }
}

Benefits:

Cover traffic mimics user's real behavior (e.g., active 9AM-5PM, quiet at night) Random bursts prevent "too regular" patterns Adapts over time to changing user habits

8. Post-Quantum Library Hardening

Problem: PQ libraries (Kyber, Dilithium) are new and may have side-channel vulnerabilities

Solution: Constant-Time Verification + Dual-Library Validation

// Use TWO independent PQ implementations, verify both agree
struct HardenedPQCrypto {
    primary_kyber: liboqs::Kyber768,
    secondary_kyber: pqcrypto::Kyber768,  // Different implementation
}

impl HardenedPQCrypto {
    fn encapsulate(&self, public_key: &[u8]) -> Result<(Vec<u8>, Vec<u8>), Error> {
        // Encapsulate with both libraries
        let (ct1, ss1) = self.primary_kyber.encapsulate(public_key)?;
        let (ct2, ss2) = self.secondary_kyber.encapsulate(public_key)?;
        
        // Verify both produce same result (catch implementation bugs)
        if constant_time_eq(&ss1, &ss2) {
            Ok((ct1, ss1))
        } else {
            log::error!("PQ library mismatch detected, possible bug or attack");
            Err(Error::PQLibraryMismatch)
        }
    }
    
    fn decapsulate(&self, 
                   ciphertext: &[u8], 
                   secret_key: &[u8]) -> Result<Vec<u8>, Error> {
        let ss1 = self.primary_kyber.decapsulate(ciphertext, secret_key)?;
        let ss2 = self.secondary_kyber.decapsulate(ciphertext, secret_key)?;
        
        if constant_time_eq(&ss1, &ss2) {
            Ok(ss1)
        } else {
            Err(Error::PQLibraryMismatch)
        }
    }
}

// Side-channel resistant comparison
fn constant_time_eq(a: &[u8], b: &[u8]) -> bool {
    if a.len() != b.len() {
        return false;
    }
    
    let mut diff = 0u8;
    for (x, y) in a.iter().zip(b.iter()) {
        diff |= x ^ y;
    }
    
    diff == 0
}

Benefits:

Catches implementation bugs before deployment Two independent codebases reduce risk of common vulnerabilities Constant-time operations prevent timing attacks

Trade-off: 2x computation for PQ operations (acceptable for handshakes, which are infrequent)


Security Policy (Responsible Disclosure)

If you believe you’ve found a vulnerability in the protocol or its implementations:

  • Do not open a public issue with exploit details.
  • Do contact the maintainers privately with:
    • affected component/version
    • reproduction steps or proof-of-concept
    • impact assessment
    • any suggested mitigation

We aim to acknowledge reports within 7 days and provide a remediation plan or fix timeline as soon as possible.


9. Implemented Security Fixes (v2.0)

The following critical and high-severity fixes have been implemented following a hostile security audit. See SECURITY_REMEDIATION.md for the complete remediation specification.

9.1 Handshake Shared Secret Fix (CRITICAL)

Issue: The InitiatorHandshake::complete() function re-encapsulated to the recipient's Kyber prekey instead of using the stored shared secret from ClientHello creation, causing Alice and Bob to derive DIFFERENT shared secrets.

Fix Implemented:

  • Added new_with_secret() method to ClientHello that returns the Kyber shared secret
  • Added kyber_ss_ab field to InitiatorHandshake to store the shared secret
  • Updated complete() to use the stored secret instead of re-encapsulating
  • Deprecated the old new() method with compile-time warning

Location: lmp-core/src/protocol/handshake.rs

9.2 Unknown Key-Share (UKS) Attack Prevention (HIGH)

Issue: An attacker could replace the kyber_ciphertext with one encapsulated to their own key, causing identity misbinding.

Fix Implemented:

  • Added recipient_identity_hash field to ClientHello
  • Hash is computed as HKDF(recipient_ed25519 || recipient_dilithium, "recipient-identity-commit")
  • Identity hash is included in the signed portion of ClientHello
  • Responder verifies identity hash before processing

Location: lmp-core/src/protocol/handshake.rs

9.3 HKDF RFC 5869 Compliance Fix (MEDIUM)

Issue: HKDF used H(salt || ikm) instead of proper HMAC(salt, ikm).

Fix Implemented:

  • Updated Hkdf::extract() to use HMAC-SHA3-256(salt, ikm)
  • Updated Hkdf::expand() to use HMAC-SHA3-256(PRK, T(i-1) | info | i)
  • Added hmac crate dependency

Location: lmp-core/src/crypto/hkdf.rs

9.4 Skip Rate Limiter for DoS Mitigation (MEDIUM)

Issue: Attackers could send messages with large gaps in message numbers, forcing victims to derive many skipped keys.

Fix Implemented:

  • New SkipRateLimiter module with per-sender and global limits
  • Exponential backoff for consecutive skip events
  • Normal message receipt resets consecutive skip counter
  • Configurable limits via SkipLimitConfig

Location: lmp-core/src/protocol/skip_limiter.rs

9.5 Peer Ratchet Staleness Enforcement (HIGH)

Issue: Malicious peers could refuse to advance DH ratchet, keeping compromised chain keys valid indefinitely.

Fix Implemented:

  • Added peer_last_ratchet_time and peer_ratchet_epoch tracking to RatchetState
  • New RatchetPolicy struct with configurable staleness thresholds
  • check_peer_staleness() returns RatchetStatus (Healthy/Warning/Stale/CriticallyStale)
  • Sessions can be rejected or terminated based on peer staleness

Location: lmp-core/src/protocol/ratchet.rs

9.6 New Error Types

Added structured error types for security conditions:

  • SkipLimitExceeded - Per-sender skip limit exceeded
  • GlobalSkipLimitExceeded - Global skip limit exceeded
  • SkipRateLimited - Exponential backoff not satisfied
  • IdentityMismatch - UKS attack prevention
  • PeerRatchetStale - Peer not ratcheting
  • NonceCounterOverflow - Requires mandatory ratchet
  • SecurityDowngrade - PAKE requirement not met
  • PakeRequired - PAKE authentication required

Location: lmp-core/src/error.rs


Protocol Invariants (Enforced)

ID Invariant Enforcement
INV-HS-1 Both parties MUST derive identical shared secrets Kyber secret stored, not re-encapsulated
INV-HS-2 Identity commitment MUST bind Kyber encapsulation recipient_identity_hash field
INV-HKDF-1 Extract MUST use HMAC-SHA3-256, not raw hash Proper HMAC implementation
INV-SKIP-1 Per-sender skipped keys MUST NOT exceed limit SkipRateLimiter enforcement
INV-RATCH-1 Peer MUST ratchet within policy time limits Staleness checking