ADsum is a cross-platform meeting recorder designed to capture system audio and microphone streams simultaneously, transcribe the conversation, and generate actionable notes. The repository is organised following a modular architecture so the audio engine, orchestration pipeline, transcription backends, and note generators can evolve independently.
- Dual-channel capture architecture with pluggable audio backends.
- Streaming-friendly recording pipeline that writes directly to disk.
- Storage layer backed by SQLite for recording metadata, transcripts, and notes.
- Transcription services with OpenAI integration and a lightweight dummy fallback for offline tests.
- Note synthesis service that can call OpenAI or fall back to heuristic summarisation.
- Typer-powered CLI for device discovery, recording, transcription, and note generation.
adsum/
cli.py # Command line interface entry point
config.py # Global configuration via environment variables
logging.py # Structured logging helpers
core/
audio/ # Audio capture abstractions and implementations
pipeline/ # Recording orchestrator
data/ # Pydantic models and SQLite storage helpers
services/
transcription/ # Transcription provider interfaces & implementations
notes/ # Notes generation provider interfaces & implementations
utils/ # Shared utilities (audio helpers, task helpers)
Install the package in editable mode:
pip install -e .All platforms: Ensure FFmpeg is installed and available on
PATH, or setADSUM_FFMPEG_BINARYto the executable path.
Listing audio devices:
adsum devicesLaunching the interactive console UI (recordings are controlled from there):
adsum ui --mic-device 2 --system-device 5 --transcription-backend openai --notes-backend openaiThe UI launches from the terminal and lets you start, pause, resume, and stop recordings without additional CLI commands. Each channel is written to recordings/<session-id>/raw, a combined track is optionally rendered, and transcription/note generation can be triggered from the interface. Results are stored in adsum.db.
ADsum now uses FFmpeg as the default capture engine so Bluetooth sources exposed by the operating system can be recorded reliably. When prompted for the microphone or system device provide an FFmpeg-style input specification using the pattern <format>:<target>?option=value&.... Examples:
# PulseAudio / PipeWire loopback for a Bluetooth headset
pulse:bluez_source.AA_BB_CC_DD_EE_FF.monitor?sample_rate=48000&channels=2
# Windows DirectShow capture from a Bluetooth microphone
dshow:audio=Bluetooth Headset?sample_rate=48000&channels=1
# Windows WASAPI loopback for the current system output (captures audio even when routed to Bluetooth)
wasapi:default?loopback=1
# macOS AVFoundation input index 1
avfoundation:1?channels=1
With WASAPI loopback you can stream the system mix while keeping Bluetooth headsets active—the mic and playback channels remain independent, mirroring tools such as Loom. When FFmpeg does not provide WASAPI support or the driver blocks the capture, ADsum falls back to an internal WASAPI loopback backend powered by the soundcard library, so the system output is still captured whenever Windows exposes it.
Additional FFmpeg flags can be added via query parameters. For instance args=-thread_queue_size 2048 (parsed with shell-style quoting) or opt_timeout=5 (expanded to -timeout 5).
Windows note: when a WASAPI loopback device is selected, ADsum will fall back to the
sounddevice(PortAudio) backend automatically whenever the installed FFmpeg build lacks WASAPI support. No extra configuration is required—just pick the speaker you want to capture (for example the Bluetooth output) and keeploopback=1in the device string.
Use the "Configure environment" menu entry to inspect or update any ADSUM_ variables directly from the UI. Changes are persisted to your .env file for future sessions.
Both the console and window interfaces call an internal helper named ensure_ffmpeg_available
whenever FFmpeg cannot be found. If ADSUM_FFMPEG_DOWNLOAD_URL is set, the helper downloads a
platform-specific archive (the {platform} placeholder expands to windows, darwin, or
linux) into <ADSUM_BASE_DIR>/cache/ffmpeg/<platform>, extracts the binary, and records its
location in ADSUM_FFMPEG_BINARY. You can opt-in to this behaviour from the prompts shown after a
failed recording attempt, or configure it ahead of time via the environment menu.
Prefer to manage FFmpeg manually? Simply leave ADSUM_FFMPEG_DOWNLOAD_URL unset. The same prompt
lets you browse for the executable and stores the selection in your .env file, ensuring future
sessions keep using your preferred installation.
Environment variables customise behaviour via pydantic settings (prefix ADSUM_):
ADSUM_BASE_DIR: root directory for recordings (defaultrecordings/).ADSUM_DATABASE_PATH: SQLite database path (defaultadsum.db).ADSUM_SAMPLE_RATE: Sample rate used for capture (default16000).ADSUM_CHANNELS: Number of channels per capture stream (default1).ADSUM_CHUNK_SECONDS: Preferred chunk duration when streaming (default1.0).ADSUM_AUDIO_BACKEND: Audio engine to use (ffmpeg).ADSUM_FFMPEG_BINARY: Override FFmpeg executable path when the binary is not available on PATH. On Windows, ADsum also checks common installation folders such asC:\\ffmpeg\\binandC:\\Program Files\\FFmpeg\\bin. If FFmpeg still cannot be found, download a build from ffmpeg.org and either add itsbindirectory toPATHor pointADSUM_FFMPEG_BINARYdirectly at theffmpeg.exefile. When ADsum cannot locate the executable during a recording attempt, both interactive interfaces now offer to download or browse for the correct binary and persist it to your.envfile automatically.ADSUM_FFMPEG_DOWNLOAD_URL: Optional direct download link used by the automatic bootstrapper. The URL may include a{platform}placeholder that resolves towindows,darwin, orlinux. When configured, ADsum caches the retrieved archive or binary under<ADSUM_BASE_DIR>/cache/ffmpeg/<platform>and records the resulting executable path inADSUM_FFMPEG_BINARY. Leave this setting empty if you prefer to manage FFmpeg manually.ADSUM_DEFAULT_MIC_DEVICE: Preferred microphone device identifier remembered between sessions.ADSUM_DEFAULT_SYSTEM_DEVICE: Preferred system audio device identifier remembered between sessions.ADSUM_OPENAI_TRANSCRIPTION_MODEL: Model used for OpenAI transcription.ADSUM_OPENAI_NOTES_MODEL: Model used for OpenAI notes/summarisation.ADSUM_OPENAI_API_KEY: Optional API key forwarded to the OpenAI client (falls back toOPENAI_API_KEY).ADSUM_OPENAI_MAX_UPLOAD_BYTES: Maximum payload size (default ~24 MiB) before recordings are automatically split for OpenAI uploads.
Recordings that exceed the upload limit are transparently chunked into sequential WAV files before contacting OpenAI, ensuring long meetings are transcribed without manual intervention.
ADsum ships with multiple transcription providers. The CLI and desktop window default to a lightweight dummy backend that
returns placeholder text so automated tests can run offline. When you are ready to capture real speech, explicitly pick another
provider:
- CLI – pass
--transcription-backend openai(or your preferred backend) toadsum recordoradsum uicommands. - Window UI – open Configure environment ▸ Transcription backend and select a real provider before starting a session.
If you choose one of the OpenAI providers, make sure an API key is available. Set the standard OPENAI_API_KEY environment variable
before launching ADsum or configure ADSUM_OPENAI_API_KEY via the Environment menu so the desktop app can save it to your .env file.
If the dummy backend is still active when you start recording, both interfaces surface a prominent warning so you can switch to a
real service before relying on the transcripts.
Run the unit test suite:
pytestThe dummy services ensure tests do not require external APIs or audio hardware.
Apache 2.0