▶︎ watch the 22-second walkthrough (MP4)
Demo · Architecture · Q&A flow · Stack · Privacy · Build · Tests · Status · Decisions · How it was built
| Capture | Understand | Ask |
|---|---|---|
| Live Moonshine streaming ASR while you record. | Foundation Models extracts decisions, action items, topics, open questions. | Hold-to-talk Q&A on this meeting or all of them. |
| Optional Parakeet polish for word-accurate timing. | NLContextual embeddings + BM25 + RRF for hybrid retrieval. | Streaming answers with citation pills, optional Kokoro TTS. |
| Pyannote diarization for speaker-attributed chunks. | Sentence-aligned chunks indexed in SwiftData on the phone. | Soft grounding gate, full-transcript context for short meetings. |
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#F5E7D0', 'primaryTextColor': '#1D1712', 'primaryBorderColor': '#B4532A', 'lineColor': '#8A5A44', 'secondaryColor': '#E7F1EA', 'tertiaryColor': '#E7ECF8', 'fontFamily': 'Inter, ui-sans-serif, system-ui'}}}%%
flowchart LR
A([iPhone mic]):::capture --> B[Moonshine live ASR]:::model
B --> C[Live transcript]:::ui
A --> D[WAV on device]:::data
D --> E[Parakeet polish]:::model
D --> F[FluidAudio diarization]:::model
E --> G[Canonical transcript]:::data
F --> G
G --> H[Chunks + summary]:::data
H --> I[NLContextualEmbedding]:::model
H --> J[Foundation Models summary]:::model
I --> K[(SwiftData)]:::store
J --> K
H --> K
K --> L[Search]:::ui
K --> M[Meeting chat]:::ui
K --> N[Global chat]:::ui
classDef capture fill:#F7D9C4,stroke:#B4532A,color:#1D1712;
classDef model fill:#E7ECF8,stroke:#4F68A8,color:#172033;
classDef data fill:#FFF4CC,stroke:#A07708,color:#2D2200;
classDef store fill:#E7F1EA,stroke:#2F7D55,color:#102418;
classDef ui fill:#F0E7FF,stroke:#7450A8,color:#241338;
%%{init: {'theme': 'base', 'themeVariables': {'actorBkg': '#F5E7D0', 'actorBorder': '#B4532A', 'signalColor': '#4F68A8', 'activationBkgColor': '#E7F1EA', 'noteBkgColor': '#FFF4CC', 'fontFamily': 'Inter, ui-sans-serif, system-ui'}}}%%
sequenceDiagram
autonumber
participant U as User
participant ASR as Question ASR
participant Q as QAOrchestrator
participant R as Hybrid retriever
participant DB as SwiftData
participant LLM as Foundation Models
participant TTS as Local TTS
U->>ASR: hold-to-talk question
ASR-->>Q: local transcript
alt Short meeting (≤ ~10k chars)
Q->>DB: full transcript + structured summary
Q->>R: best-effort retrieval (citations only)
else Larger / global
Q->>R: dense + BM25 + RRF fusion
R->>DB: chunks, summaries, embeddings
end
DB-->>Q: packed context
Q->>LLM: stream answer locally
loop sentence stream
LLM-->>Q: snapshot
Q-->>U: chat bubble + citations
Q->>TTS: speak completed sentence
end
Aftertalk is built so meeting content never leaves the phone.
| Layer | Guarantee |
|---|---|
| Runtime network | No production URLSession or URLRequest usage in app Swift sources. |
| Capture | Recording and Q&A run locally once model assets are present. |
| Storage | Audio, transcript, summary, chat, and embeddings are app-local. |
| Verification | Settings includes a live privacy audit and model-asset status. |
git grep -nE "URLSession|URLRequest" -- 'Aftertalk/**/*.swift'
# returns zero matches in production sourcesgit clone https://github.com/theaayushstha1/aftertalk
cd aftertalk
xcodegen generate
# Local model bundles (gitignored, downloaded by these scripts)
./Scripts/fetch-parakeet-models.sh
./Scripts/fetch-kokoro-models.sh
./Scripts/fetch-pyannote-models.sh
# Moonshine .ort weights go under
# Aftertalk/Models/moonshine-small-streaming-en/
open Aftertalk.xcodeprojRequirements: Xcode 17+, iOS 26+ device, Apple Developer signing.
xcodebuild test -scheme Aftertalk \
-destination 'platform=iOS Simulator,name=iPhone 17 Pro'45 tests across 7 suites — VAD gating, sentence boundary detection, title sanitization, diarization cluster cleanup, BM25 tokenization, RRF fusion, and global Q&A router (mention-count + overview deterministic intents + spoken-TTS sanitation, including contraction/possessive preservation so Kokoro pronounces "don't" and "Andre's" correctly). The diarization regression test explicitly encodes the ghost-cluster cycle bug that broke speaker labels under degraded acoustic conditions.
Shipping
- Record · live transcript · structured summary · transcript detail · action items · search · per-meeting chat · global chat · Settings privacy audit.
- Q&A avoids the old low-cosine refusal: full-transcript context for short meetings, hybrid dense+BM25+RRF for larger or cross-meeting queries.
- Soft grounding gate refuses only when there are truly no chunks AND no summary on the device.
- Embedding fallback + dim-mismatch filter so degraded indexes can't poison live retrieval.
- Repair tool re-embeds chunks and creates missing summary embeddings when a working embedding service comes back online.
- Optional model assets degrade explicitly with banners instead of silently breaking the recording path.
Known limits
- Far-field classrooms are microphone-limited; a phone across a room cannot match a lapel mic near the speaker. The
RecordingProfile.farFieldplumbing exists but isn't user-toggleable yet. - Single-channel diarization labels are best-effort, especially on PC-speaker-played audio or heavy room reverb. FluidAudio's
OfflineDiarizerManager+ VBx is the documented next step. - Pipeline parallelism. Polish and diarization run concurrently today via
async let; full background diarization (chunk + summarize from polish alone) is deferred for submission stability. - Real-device perf capture: see
perf/aftertalk-perf-20260430-20min.pngfor a 20-minute iPhone 17 Pro Max session (recording + Q&A). Memory peaks ~2.3 GB, settles ~1.7 GB; CPU averages 41% of one core; thermal stays infairfor the recording and steps toseriousduring Kokoro-heavy Q&A turns. Battery delta is +0.0% because the device was on charger — a 30-min + 10-min off-battery session is still the canonical run we'd ship for a v1 review.
Released under the MIT License. Use it commercially, fork it, ship it, study it, modify it. The only ask is that the copyright notice and permission text travel with the source. No warranty.
MIT License · Copyright (c) 2026 Aayush Shrestha
Moonshine ASR by Useful Sensors · FluidAudio by Fluid Inference · Apple Foundation Models · Apple NLContextualEmbedding · Pyannote by Hervé Bredin et al.
Built in 7 days during finals week by Aayush Shrestha. Read the architecture decisions or the day-by-day build log for the full engineering reasoning.







