Skip to content

Latest commit

 

History

History
612 lines (487 loc) · 23 KB

File metadata and controls

612 lines (487 loc) · 23 KB

AGENT.md - Comlink iOS: P2P Voice Intercom System

1. Project Overview

Comlink is a high-fidelity, peer-to-peer (P2P) voice intercom application designed for loud environments such as concerts, clubs, and festivals. The app enables offline voice communication using Apple's Multipeer Connectivity Framework over Bluetooth and Wi-Fi Direct, without requiring an internet connection.

Key Features:

  • Offline-First Architecture: Functions entirely in Airplane Mode (with Wi-Fi/Bluetooth enabled)
  • Background Audio Support: Continues transmitting and receiving audio when the phone is locked
  • Noise Isolation: Leverages iOS built-in voice processing to filter background noise and isolate the user's voice
  • Ultra-Low Latency: Optimized audio pipeline for real-time communication
  • OLED-Friendly UI: Dark/black theme optimized for low-light concert environments

Technical Specifications:

  • Language: Swift 6.0+
  • UI Framework: SwiftUI
  • Architecture: MVVM (Model-View-ViewModel)
  • Networking: Apple Multipeer Connectivity Framework
  • Audio: AVFoundation & AVAudioEngine
  • Target iOS: iOS 16.0+

2. Architecture Diagram (Text-Based)

┌─────────────────────────────────────────────────────────────────┐
│                         SwiftUI Views                            │
│  ┌──────────────────┐  ┌──────────────────┐  ┌───────────────┐ │
│  │ ConnectionView   │  │   TalkView       │  │  SettingsView │ │
│  └────────┬─────────┘  └────────┬─────────┘  └───────┬───────┘ │
└───────────┼────────────────────┼────────────────────┼──────────┘
            │                    │                    │
            └────────────────────┼────────────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │   ComlinkViewModel      │
                    │    (MVVM Coordinator)   │
                    └────┬──────────────┬─────┘
                         │              │
            ┌────────────▼────┐    ┌───▼──────────────┐
            │ MultipeerManager│    │ AudioManager     │
            │  (P2P Networking)│   │ (Audio Pipeline) │
            └────┬─────────────┘   └──┬───────────────┘
                 │                    │
     ┌───────────▼──────────┐  ┌──────▼────────────────┐
     │ MCNearbyServiceBrowser│ │  AVAudioEngine       │
     │ MCNearbyServiceAdv.   │ │  - Input Node        │
     │ MCSession             │ │  - Audio Tap         │
     └───────────┬───────────┘ │  - Output Node       │
                 │              │  - Voice Processing  │
                 │              └──────┬───────────────┘
                 │                     │
                 └──────────┬──────────┘
                            │
                ┌───────────▼────────────┐
                │   Data Flow Pipeline   │
                │                        │
                │  Mic → Tap → Buffer →  │
                │  → Network → Peer →    │
                │  → Speaker             │
                └────────────────────────┘

Data Flow:

  1. Audio Capture: Microphone → AVAudioInputNode → Audio Tap
  2. Processing: Audio Tap → Voice Isolation → PCM Buffer
  3. Transmission: PCM Buffer → MultipeerManager → MCSession → Peer
  4. Reception: Peer → MCSession → MultipeerManager → Audio Buffer
  5. Playback: Audio Buffer → AVAudioPlayerNode → AVAudioOutputNode → Speaker

3. Step-by-Step Implementation Plan

Phase 1: Project Setup & Permissions

Goal: Configure Xcode project with required capabilities and permissions.

Tasks:

  1. Create Xcode Project

    • New iOS App with SwiftUI
    • Minimum Deployment Target: iOS 16.0
    • Enable Swift 6.0 language mode
  2. Configure Info.plist

    <key>NSMicrophoneUsageDescription</key>
    <string>Comlink needs microphone access to transmit your voice to connected peers.</string>
    
    <key>NSLocalNetworkUsageDescription</key>
    <string>Comlink uses local network to discover and connect to nearby devices.</string>
    
    <key>NSBonjourServices</key>
    <array>
        <string>_comlink._tcp</string>
        <string>_comlink._udp</string>
    </array>
    
    <key>UIBackgroundModes</key>
    <array>
        <string>audio</string>
        <string>voip</string>
    </array>
  3. Configure Capabilities

    • Enable "Audio, AirPlay, and Picture in Picture"
    • Enable "Background Modes" → audio, voip
    • Consider enabling "Network Extensions" if needed
  4. Project Structure

    Comlink/
    ├── App/
    │   ├── ComlinkApp.swift
    │   └── AppDelegate.swift (for background audio)
    ├── Models/
    │   ├── Peer.swift
    │   └── AudioPacket.swift
    ├── ViewModels/
    │   └── ComlinkViewModel.swift
    ├── Views/
    │   ├── ConnectionView.swift
    │   ├── TalkView.swift
    │   └── Components/
    ├── Managers/
    │   ├── MultipeerManager.swift
    │   ├── AudioManager.swift
    │   └── PermissionsManager.swift
    ├── Utilities/
    │   ├── AudioCodec.swift
    │   └── Logger.swift
    └── Resources/
        └── Assets.xcassets
    

Deliverables:

  • ✅ Xcode project configured
  • ✅ Info.plist with all required permissions
  • ✅ Directory structure established

Phase 2: Multipeer Manager (Advertising & Browsing)

Goal: Implement P2P discovery and connection logic using MultipeerConnectivity.

Tasks:

  1. Create MultipeerManager Class

    • Singleton pattern with @Observable for SwiftUI integration
    • Properties:
      • peerID: MCPeerID
      • session: MCSession
      • serviceAdvertiser: MCNearbyServiceAdvertiser
      • serviceBrowser: MCNearbyServiceBrowser
      • @Published var connectedPeers: [MCPeerID]
      • @Published var availablePeers: [MCPeerID]
  2. Service Discovery Configuration

    • Service Type: "comlink" (max 15 characters, lowercase, alphanumeric)
    • Discovery Info: Include device name, app version
    • Security: Implement custom invitation handler (accept/decline)
  3. Session Management

    • Implement MCSessionDelegate methods:
      • session(_:peer:didChange:) → Update connection state
      • session(_:didReceive:fromPeer:) → Handle audio data
      • session(_:didReceive:withName:fromPeer:) → Handle streams (future)
  4. Connection Flow

    Device A (Host)          Device B (Client)
    ─────────────────        ─────────────────
    startAdvertising()       startBrowsing()
         │                         │
         │◄────Discovery────────────┤
         │                         │
         │────Invitation Request────►
         │                         │
         │◄───Accept/Decline────────┤
         │                         │
    Connected ◄──────────────► Connected
    
  5. Data Transmission Methods

    • sendAudioData(_ data: Data, to peer: MCPeerID) → Use .reliable or .unreliable mode
    • Decision: Use .unreliable for lower latency, handle packet loss gracefully

Deliverables:

  • ✅ MultipeerManager class with discovery/advertising
  • ✅ Peer connection and disconnection handling
  • ✅ Data transmission infrastructure

Phase 3: Audio Engine Setup (Input → Processing → Network)

Goal: Configure AVAudioEngine to capture microphone input, process it, and send to network.

Tasks:

  1. Create AudioManager Class

    • @Observable class with AVAudioEngine lifecycle management
    • Properties:
      • private let audioEngine: AVAudioEngine
      • private let inputNode: AVAudioInputNode
      • private let audioSession: AVAudioSession
      • @Published var isRecording: Bool
      • @Published var audioLevel: Float (for UI meter)
  2. Configure AVAudioSession

    let session = AVAudioSession.sharedInstance()
    try session.setCategory(.playAndRecord, mode: .voiceChat, options: [
        .defaultToSpeaker,
        .allowBluetooth,
        .allowBluetoothA2DP
    ])
    try session.setActive(true, options: .notifyOthersOnDeactivation)

    Why .voiceChat mode?

    • Enables built-in echo cancellation
    • Enables voice isolation (filters background noise)
    • Optimizes for low-latency duplex communication
  3. Install Audio Tap on Input Node

    let format = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0, bufferSize: 4096, format: format) { [weak self] buffer, time in
        self?.processAudioBuffer(buffer)
    }

    Buffer Size Selection:

    • 4096 frames = ~85ms latency at 48kHz (acceptable for voice)
    • Smaller buffer = lower latency but higher CPU usage
    • Larger buffer = smoother but increased delay
  4. Process Audio Buffer

    func processAudioBuffer(_ buffer: AVAudioPCMBuffer) {
        guard let channelData = buffer.floatChannelData else { return }
    
        // Convert PCM to Data
        let audioData = Data(bytes: channelData[0],
                            count: Int(buffer.frameLength) * MemoryLayout<Float>.size)
    
        // Optional: Compress with Opus codec (future enhancement)
    
        // Send to connected peers
        multipeerManager.sendAudioData(audioData)
    }
  5. Start/Stop Audio Engine

    • startRecording() → Prepare engine, start engine, activate session
    • stopRecording() → Stop engine, remove taps, deactivate session
    • Handle interruptions (phone calls, alarms)

Deliverables:

  • ✅ AudioManager with AVAudioEngine configuration
  • ✅ Audio tap installed on input node
  • ✅ Audio buffer processing and transmission

Phase 4: Audio Receiver (Network → Buffer → Playback)

Goal: Receive audio data from peers and play it through the speaker.

Tasks:

  1. Receive Audio Data in MultipeerManager

    func session(_ session: MCSession, didReceive data: Data, fromPeer peerID: MCPeerID) {
        // Route to AudioManager for playback
        audioManager.playReceivedAudio(data, from: peerID)
    }
  2. Create Audio Playback Pipeline

    • Use AVAudioPlayerNode for real-time playback
    • Attach player node to audio engine
    • Connect player node to output node (speaker)
  3. Convert Data to AVAudioPCMBuffer

    func playReceivedAudio(_ data: Data, from peer: MCPeerID) {
        guard let buffer = createPCMBuffer(from: data) else { return }
    
        playerNode.scheduleBuffer(buffer) {
            // Buffer finished playing
        }
    
        if !playerNode.isPlaying {
            playerNode.play()
        }
    }
  4. Handle Buffer Queue

    • Implement jitter buffer to handle network variability
    • Drop packets if queue exceeds threshold (prevent accumulating delay)
    • Smooth playback with interpolation if needed
  5. Prevent Feedback Loop

    • Ensure echo cancellation is working (.voiceChat mode)
    • Test with two physical devices (NOT simulator)
    • Consider muting local input during playback (optional)

Deliverables:

  • ✅ Audio reception and playback pipeline
  • ✅ AVAudioPlayerNode integration
  • ✅ Jitter buffer implementation
  • ✅ Echo cancellation verification

Phase 5: UI Implementation (SwiftUI Views)

Goal: Create intuitive, dark-themed UI for connection and communication.

Tasks:

  1. ConnectionView (Initial Screen)

    • Display list of discovered peers
    • Show connection status (Searching, Found, Connected)
    • Allow user to select a peer to connect
    • Display user's own device name
    • "Start Broadcasting" / "Stop Broadcasting" toggle
  2. TalkView (Active Communication Screen)

    • Large "Push to Talk" button (or toggle for continuous transmission)
    • Audio level meter (visual feedback)
    • Connected peer name/status
    • "Disconnect" button
    • Battery-saving dark background
  3. SwiftUI Integration with ViewModel

    @Observable
    class ComlinkViewModel {
        let multipeerManager: MultipeerManager
        let audioManager: AudioManager
    
        var isConnected: Bool { !multipeerManager.connectedPeers.isEmpty }
        var availablePeers: [MCPeerID] { multipeerManager.availablePeers }
    
        func connect(to peer: MCPeerID) { ... }
        func disconnect() { ... }
        func startTalking() { audioManager.startRecording() }
        func stopTalking() { audioManager.stopRecording() }
    }
  4. UI Design Principles

    • OLED Black: Use Color.black for backgrounds (true black = pixels off)
    • High Contrast: White/green text on black background
    • Large Touch Targets: Minimum 44x44pt for buttons
    • Haptic Feedback: Use UIImpactFeedbackGenerator for interactions
  5. Handle Background State

    • Display notification when app enters background
    • Continue audio transmission (background mode enabled)
    • Lock screen controls (MPRemoteCommandCenter - optional)

Deliverables:

  • ✅ ConnectionView with peer discovery UI
  • ✅ TalkView with push-to-talk functionality
  • ✅ Dark theme optimized for OLED
  • ✅ ViewModel coordinating managers

4. Known Risks & Mitigation Strategies

Risk 1: Audio Feedback Loop

Description: When two devices are physically close, audio from the speaker can be picked up by the microphone, creating a feedback loop.

Mitigation:

  • ✅ Use .voiceChat mode for built-in echo cancellation
  • ✅ Test with headphones/earbuds (recommended use case)
  • ✅ Implement AGC (Automatic Gain Control) if needed
  • ✅ Consider adding "mute speaker during talk" option

Risk 2: Background Suspension

Description: iOS may suspend the app in background to save battery, interrupting audio transmission.

Mitigation:

  • ✅ Enable UIBackgroundModes: audio in Info.plist
  • ✅ Keep AVAudioSession active with .playAndRecord category
  • ✅ Use beginBackgroundTask for critical operations
  • ✅ Test extensively with device locked and app in background
  • ✅ Monitor AVAudioSession.interruptionNotification and resume session

Risk 3: Multipeer Connectivity Reliability

Description: MCSession can be unstable, especially with unreliable data mode. Packets may be lost or arrive out of order.

Mitigation:

  • ✅ Use .unreliable mode for low latency (accept some packet loss)
  • ✅ Implement sequence numbers in audio packets for ordering
  • ✅ Add jitter buffer to smooth playback
  • ✅ Gracefully handle missing packets (interpolate or skip)
  • ✅ Fallback to .reliable mode if latency is acceptable

Risk 4: Permission Denials

Description: Users may deny microphone or local network permissions, breaking core functionality.

Mitigation:

  • ✅ Create PermissionsManager to check and request permissions upfront
  • ✅ Show educational alert explaining why permissions are needed
  • ✅ Provide deep link to Settings if permission is denied
  • ✅ Gracefully degrade (disable features) if permissions unavailable

Risk 5: High Latency in Loud Environments

Description: Bluetooth/Wi-Fi performance may degrade in crowded environments (concerts) due to interference.

Mitigation:

  • ✅ Prefer Wi-Fi Direct over Bluetooth when available
  • ✅ Use smallest feasible buffer size (balance latency vs stability)
  • ✅ Implement adaptive bitrate (reduce quality if connection degrades)
  • ✅ Display connection quality indicator in UI
  • ✅ Consider Opus codec for better compression (future)

Risk 6: Battery Drain

Description: Continuous audio processing and transmission will drain battery quickly.

Mitigation:

  • ✅ Optimize audio pipeline (avoid unnecessary processing)
  • ✅ Use efficient data formats (compressed audio)
  • ✅ Provide "Low Power Mode" option (lower sample rate)
  • ✅ Display battery usage warning in UI
  • ✅ Allow user to close connection when not needed

Risk 7: Privacy & Security

Description: Audio data transmitted over local network could be intercepted or eavesdropped.

Mitigation:

  • ✅ Use MCSession's built-in encryption (enabled by default)
  • ✅ Implement peer verification (confirm identity before accepting)
  • ✅ Add optional passcode/PIN for pairing
  • ✅ Display warning about secure environment usage
  • ✅ Future: Implement end-to-end encryption with custom keys

5. Development Workflow & Best Practices

Code Quality Standards:

  • Swift 6 Concurrency: Use async/await and @MainActor where appropriate
  • Error Handling: Comprehensive do-catch blocks, never force-unwrap in production
  • Logging: Use OSLog for debugging audio and network events
  • Testing: Unit tests for AudioManager and MultipeerManager logic
  • Code Review: All phases reviewed for performance and security

Testing Checklist:

  • Test with two physical devices (iPhone required, simulator insufficient)
  • Test in Airplane Mode with Wi-Fi/BT enabled
  • Test with app in background and device locked
  • Test in noisy environment (play loud music)
  • Test with Bluetooth headphones connected
  • Test battery usage over 30-minute session
  • Test permission denial scenarios
  • Test connection/disconnection edge cases

Performance Targets:

  • Latency: < 200ms end-to-end (audio input → transmission → playback)
  • Packet Loss Tolerance: < 5% packet loss without noticeable degradation
  • Battery Life: > 2 hours of continuous use at 50% brightness
  • Discovery Time: < 5 seconds to find nearby peer

6. Future Enhancements (Post-MVP)

Phase 6: Advanced Features

  • Opus Codec Integration: Replace PCM with Opus for 10x better compression
  • Multi-Peer Support: Allow 3+ people in a group chat
  • Noise Gate: Automatically mute when below threshold (save bandwidth)
  • Voice Effects: Optional filters (reverb, pitch shift) for fun
  • Message History: Brief text messages alongside voice
  • Spatial Audio: Use device orientation for 3D positioning

Phase 7: Optimization

  • Adaptive Bitrate: Dynamically adjust quality based on connection
  • Custom Transport Protocol: Replace MCSession with lower-level UDP if needed
  • Machine Learning Noise Reduction: Core ML model for superior filtering
  • Battery Optimization: Dynamic sample rate adjustment

7. Quick Start Commands

Clone and Setup:

git clone <repo-url>
cd comlink-ios
open Comlink.xcodeproj

Build and Run:

  1. Select physical iOS device (NOT simulator - audio features require hardware)
  2. Cmd+R to build and run
  3. Grant microphone and local network permissions
  4. Repeat on second device for testing

Testing P2P Connection:

  1. Device A: Tap "Start Broadcasting"
  2. Device B: Tap "Find Peers" → Select Device A
  3. Device A: Accept connection request
  4. Both devices: Test voice transmission

8. File Naming Conventions

  • Swift Files: PascalCase (e.g., MultipeerManager.swift)
  • Models: Singular nouns (e.g., Peer.swift, not Peers.swift)
  • Views: Descriptive + "View" suffix (e.g., ConnectionView.swift)
  • ViewModels: Same as View + "ViewModel" (e.g., ComlinkViewModel.swift)
  • Managers: Descriptive + "Manager" suffix (e.g., AudioManager.swift)

9. Commit Message Guidelines

Follow conventional commits:

  • feat: New feature (e.g., feat: implement MultipeerManager peer discovery)
  • fix: Bug fix (e.g., fix: resolve audio feedback loop)
  • refactor: Code restructuring (e.g., refactor: extract audio processing into utility)
  • docs: Documentation (e.g., docs: update AGENT.md with Phase 3 details)
  • test: Add tests (e.g., test: add unit tests for AudioManager)
  • chore: Maintenance (e.g., chore: update Xcode project settings)

10. Dependencies & Third-Party Libraries

Current: Zero Dependencies

This project uses only Apple frameworks to minimize complexity and binary size.

Considered for Future:

  • Opus-iOS: Opus codec bindings (if AVAudioEngine compression insufficient)
  • CocoaAsyncSocket: Alternative to MCSession for custom networking (if needed)
  • Realm/SwiftData: For message history persistence

Decision: Start with zero dependencies, add only if native frameworks are insufficient.


11. Security Considerations

Data Protection:

  • Audio buffers are ephemeral (not stored to disk)
  • No telemetry or analytics (fully offline)
  • User data never leaves device except during active P2P session

Network Security:

  • MCSession uses TLS-like encryption by default
  • Peer identity verified via device name (user confirmation required)
  • Future: Add optional passcode pairing

Permission Sandboxing:

  • Request microphone access only when needed
  • Local network usage limited to Bonjour service type
  • No location, camera, or contacts access required

12. Success Criteria

MVP Definition:

✅ Two devices can discover each other offline ✅ Audio transmitted with < 200ms latency ✅ Voice isolation filters background noise ✅ App continues working when device is locked ✅ Clean, dark UI suitable for concerts ✅ Stable connection for > 10 minutes without drops

Beta Release Criteria:

✅ All MVP features + tested by 10 users ✅ Battery life > 2 hours ✅ No critical bugs in 1-week testing period ✅ App Store compliance (privacy policy, metadata)


13. Contact & Support

Project Lead: Senior iOS Engineer (AI-Assisted Development) Issues: Use GitHub Issues for bug reports and feature requests Documentation: This file (AGENT.md) is the source of truth

Important: Always refer to this document before making architectural decisions.


14. Changelog

Version Date Changes
1.0.0 2025-12-11 Initial architecture document created

Next Step for AI Agent: Proceed to Phase 1 implementation after confirming this plan with the user.