Skip to content

theaayushstha1/aftertalk

Repository files navigation

Aftertalk logo

Aftertalk

Meeting memory that never leaves your phone.

Aftertalk iPhone demo: recording, structured summary, voice Q&A

▶︎ watch the 22-second walkthrough (MP4)


iOS 26+ Swift 6 Network zero License: MIT Tests


Swift 6 SwiftUI SwiftData Foundation Models Core ML NLContextual Moonshine FluidAudio Pyannote Kokoro TTS


Demo · Architecture · Q&A flow · Stack · Privacy · Build · Tests · Status · Decisions · How it was built


What it does

Capture Understand Ask
Live Moonshine streaming ASR while you record. Foundation Models extracts decisions, action items, topics, open questions. Hold-to-talk Q&A on this meeting or all of them.
Optional Parakeet polish for word-accurate timing. NLContextual embeddings + BM25 + RRF for hybrid retrieval. Streaming answers with citation pills, optional Kokoro TTS.
Pyannote diarization for speaker-attributed chunks. Sentence-aligned chunks indexed in SwiftData on the phone. Soft grounding gate, full-transcript context for short meetings.

Product tour

Live recording
Record
Meetings
Meetings
Summary
Summary
Transcript
Transcript
Actions
Actions
Ask
Ask
Search
Search
Global chat
Global

Architecture

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#F5E7D0', 'primaryTextColor': '#1D1712', 'primaryBorderColor': '#B4532A', 'lineColor': '#8A5A44', 'secondaryColor': '#E7F1EA', 'tertiaryColor': '#E7ECF8', 'fontFamily': 'Inter, ui-sans-serif, system-ui'}}}%%
flowchart LR
    A([iPhone mic]):::capture --> B[Moonshine live ASR]:::model
    B --> C[Live transcript]:::ui
    A --> D[WAV on device]:::data
    D --> E[Parakeet polish]:::model
    D --> F[FluidAudio diarization]:::model
    E --> G[Canonical transcript]:::data
    F --> G
    G --> H[Chunks + summary]:::data
    H --> I[NLContextualEmbedding]:::model
    H --> J[Foundation Models summary]:::model
    I --> K[(SwiftData)]:::store
    J --> K
    H --> K
    K --> L[Search]:::ui
    K --> M[Meeting chat]:::ui
    K --> N[Global chat]:::ui

    classDef capture fill:#F7D9C4,stroke:#B4532A,color:#1D1712;
    classDef model fill:#E7ECF8,stroke:#4F68A8,color:#172033;
    classDef data fill:#FFF4CC,stroke:#A07708,color:#2D2200;
    classDef store fill:#E7F1EA,stroke:#2F7D55,color:#102418;
    classDef ui fill:#F0E7FF,stroke:#7450A8,color:#241338;
Loading

Q&A flow

%%{init: {'theme': 'base', 'themeVariables': {'actorBkg': '#F5E7D0', 'actorBorder': '#B4532A', 'signalColor': '#4F68A8', 'activationBkgColor': '#E7F1EA', 'noteBkgColor': '#FFF4CC', 'fontFamily': 'Inter, ui-sans-serif, system-ui'}}}%%
sequenceDiagram
    autonumber
    participant U as User
    participant ASR as Question ASR
    participant Q as QAOrchestrator
    participant R as Hybrid retriever
    participant DB as SwiftData
    participant LLM as Foundation Models
    participant TTS as Local TTS

    U->>ASR: hold-to-talk question
    ASR-->>Q: local transcript
    alt Short meeting (≤ ~10k chars)
        Q->>DB: full transcript + structured summary
        Q->>R: best-effort retrieval (citations only)
    else Larger / global
        Q->>R: dense + BM25 + RRF fusion
        R->>DB: chunks, summaries, embeddings
    end
    DB-->>Q: packed context
    Q->>LLM: stream answer locally
    loop sentence stream
        LLM-->>Q: snapshot
        Q-->>U: chat bubble + citations
        Q->>TTS: speak completed sentence
    end
Loading

Stack

Layer Implementation Notes
App shell SwiftUI · SwiftData · AVAudioEngine iOS 26+, Swift 6 strict concurrency
Live ASR Moonshine small streaming + EnergyVADGate Real-time live preview; Parakeet produces the canonical transcript
Polish ASR FluidAudio Parakeet TDT 0.6B v2 Word-accurate timings, ~0.5× real-time
Diarization FluidAudio Pyannote 3.1 + WeSpeaker v2 Best-effort labels, clusteringThreshold=0.5 + ghost-cluster cleanup
LLM Apple Foundation Models 4096-token context, structured @Generable summary
Embeddings Apple NLContextualEmbedding (512-dim) System asset, no shipped weights
Retrieval Dense + BM25 + Reciprocal Rank Fusion Full-transcript path for short meetings
Storage SwiftData rows + app-local audio files Cascade delete + repair tool for degraded indexes
TTS FluidAudio Kokoro 82M (ANE) AVSpeechSynthesizer fallback

Privacy

Aftertalk is built so meeting content never leaves the phone.

Layer Guarantee
Runtime network No production URLSession or URLRequest usage in app Swift sources.
Capture Recording and Q&A run locally once model assets are present.
Storage Audio, transcript, summary, chat, and embeddings are app-local.
Verification Settings includes a live privacy audit and model-asset status.
git grep -nE "URLSession|URLRequest" -- 'Aftertalk/**/*.swift'
# returns zero matches in production sources

Build

git clone https://github.com/theaayushstha1/aftertalk
cd aftertalk
xcodegen generate

# Local model bundles (gitignored, downloaded by these scripts)
./Scripts/fetch-parakeet-models.sh
./Scripts/fetch-kokoro-models.sh
./Scripts/fetch-pyannote-models.sh

# Moonshine .ort weights go under
# Aftertalk/Models/moonshine-small-streaming-en/

open Aftertalk.xcodeproj

Requirements: Xcode 17+, iOS 26+ device, Apple Developer signing.

Tests

xcodebuild test -scheme Aftertalk \
  -destination 'platform=iOS Simulator,name=iPhone 17 Pro'

45 tests across 7 suites — VAD gating, sentence boundary detection, title sanitization, diarization cluster cleanup, BM25 tokenization, RRF fusion, and global Q&A router (mention-count + overview deterministic intents + spoken-TTS sanitation, including contraction/possessive preservation so Kokoro pronounces "don't" and "Andre's" correctly). The diarization regression test explicitly encodes the ghost-cluster cycle bug that broke speaker labels under degraded acoustic conditions.

Status

Shipping

  • Record · live transcript · structured summary · transcript detail · action items · search · per-meeting chat · global chat · Settings privacy audit.
  • Q&A avoids the old low-cosine refusal: full-transcript context for short meetings, hybrid dense+BM25+RRF for larger or cross-meeting queries.
  • Soft grounding gate refuses only when there are truly no chunks AND no summary on the device.
  • Embedding fallback + dim-mismatch filter so degraded indexes can't poison live retrieval.
  • Repair tool re-embeds chunks and creates missing summary embeddings when a working embedding service comes back online.
  • Optional model assets degrade explicitly with banners instead of silently breaking the recording path.

Known limits

  • Far-field classrooms are microphone-limited; a phone across a room cannot match a lapel mic near the speaker. The RecordingProfile.farField plumbing exists but isn't user-toggleable yet.
  • Single-channel diarization labels are best-effort, especially on PC-speaker-played audio or heavy room reverb. FluidAudio's OfflineDiarizerManager + VBx is the documented next step.
  • Pipeline parallelism. Polish and diarization run concurrently today via async let; full background diarization (chunk + summarize from polish alone) is deferred for submission stability.
  • Real-device perf capture: see perf/aftertalk-perf-20260430-20min.png for a 20-minute iPhone 17 Pro Max session (recording + Q&A). Memory peaks ~2.3 GB, settles ~1.7 GB; CPU averages 41% of one core; thermal stays in fair for the recording and steps to serious during Kokoro-heavy Q&A turns. Battery delta is +0.0% because the device was on charger — a 30-min + 10-min off-battery session is still the canonical run we'd ship for a v1 review.

License

License: MIT

Released under the MIT License. Use it commercially, fork it, ship it, study it, modify it. The only ask is that the copyright notice and permission text travel with the source. No warranty.

MIT License · Copyright (c) 2026 Aayush Shrestha

Credits

Moonshine ASR by Useful Sensors · FluidAudio by Fluid Inference · Apple Foundation Models · Apple NLContextualEmbedding · Pyannote by Hervé Bredin et al.


Built in 7 days during finals week by Aayush Shrestha. Read the architecture decisions or the day-by-day build log for the full engineering reasoning.

About

Fully offline iPhone meeting memory. Zero network calls, no Wi Fi required. Records, transcribes, summarises, and answers questions entirely on device. Audio, transcripts, summaries, embeddings, and chat never leave the phone.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors