AI-powered dog bark detection and cataloging
A complete modernization of the original woofalytics project, built for cataloging and fingerprinting barking dogs within earshot. Uses zero-shot audio classification (CLAP) to detect barks without training data, with automatic recording for documentation purposes.
- Project Goals
- Architecture Overview
- Detection Pipeline
- File Structure
- Module Documentation
- Configuration System
- API Reference
- Web UI
- Hardware Requirements
- Installation
- Docker Deployment
- Development
- Testing
- Design Decisions
- Known Issues & TODOs
- Original Project
- Versioning
This project was created with specific intentions:
- Learning - Push modern Python patterns to the limits (deliberately over-engineered)
- Dog Cataloging - Document and fingerprint all barking dogs within earshot
- Best Practices - Latest patterns, proper architecture, comprehensive documentation
- Zero-Shot Bark Detection - CLAP-powered classification without training data (~500ms inference)
- Multi-Layer Veto System - Rejects speech, percussion, and bird sounds to reduce false positives
- Direction of Arrival (DOA) - Know which direction barks come from using stereo microphones
- Evidence Recording - Automatic 30-second clips with JSON metadata sidecars
- Dog Fingerprinting - Identify and track individual dogs by bark signature
- Bark Management - Reassign barks to different dogs, untag, or delete directly from dog profiles
- Last Heard Tracking - See when each dog was last detected with accurate timestamps
- Webhook Notifications - Configurable webhooks for bark alerts with customizable payloads
- Quiet Hours - Schedule reduced sensitivity periods (e.g., nighttime) via Settings UI
- Clustering Analysis - Visual interface for analyzing untagged barks and creating dog profiles
- Modern Web UI - Real-time dashboard with WebSocket updates and persistent statistics
- Accessible by Design - Aims for WCAG AA compliance, screen reader support, respects motion preferences
- REST API - Full OpenAPI documentation at
/api/docs - Docker Support - Easy deployment with Docker Compose
- Flexible Configuration - YAML config with environment variable overrides
- AI Summaries - LLM-generated weekly/custom-range bark reports via Ollama (optional)
- Legacy MLP Support - Optional TorchScript models for faster inference
┌───────────────────────────────────────────────────────────────┐
│ FastAPI Application │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────────┐ │
│ │ REST API │ │ WebSocket │ │ Static Files │ │
│ │ /api/* │ │ /ws/bark │ │ /static/* │ │
│ │ │ │/ws/pipeline│ │ │ │
│ └─────┬──────┘ └─────┬──────┘ └────────────────────────┘ │
│ │ │ │
│ └───────┬───────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ BarkDetector │ │
│ │ - Coordinates audio capture, inference, callbacks │ │
│ │ - Runs inference loop every 500ms (CLAP) or 80ms (MLP) │ │
│ │ - Produces BarkEvent objects │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Audio │ │ VAD Gate │ │ DOA Estimator │ │
│ │ Capture │ │ (fast skip) │ │ (Bartlett/Capon/MEM) │ │
│ └───────────┘ └──────┬──────┘ └─────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ CLAP Detector │ │
│ │ - Zero-shot audio classification (laion/clap-htsat) │ │
│ │ - Multi-label veto (speech, percussion, birds) │ │
│ │ - Rolling window + high-confidence bypass │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ EvidenceStorage │ │
│ │ - Records WAV clips on bark detection │ │
│ │ - Creates JSON metadata sidecars │ │
│ │ - Maintains evidence index │ │
│ └─────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘
- Audio Capture (
audio/capture.py) runs in a background thread, filling a ring buffer - BarkDetector (
detection/model.py) reads ~100 frames (1 second) from buffer every 500ms - VAD Gate (
detection/vad.py) fast-rejects silent audio before expensive CLAP inference - CLAP Detector (
detection/clap.py) runs zero-shot classification with multi-label veto:- Compares "dog barking" against speech, percussion, bird, and other sound labels
- Uses rolling window (2/3 positives required) to smooth detections
- High-confidence barks (≥80%) bypass rolling window for instant detection
- Detection cooldown prevents rapid-fire triggers from the same sound
- DOA Estimator (
detection/doa.py) calculates direction using pyargus algorithms - BarkEvent is created and broadcast to all registered callbacks
- EvidenceStorage (
evidence/storage.py) records clips when barks are detected - WebSocket broadcasts events to connected web clients in real-time
Note: Legacy MLP mode uses 80ms inference with TorchScript for faster but less accurate detection.
Woofalytics uses a multi-stage filtering approach to balance accuracy with performance:
Audio Input → VAD Gate → YAMNet Gate → CLAP Detector → Bark Event
↓ ↓ ↓
(skip) (skip) (detect)
- Purpose: Fast energy-based rejection of silent audio
- Method: RMS energy threshold in dB
- Skip Rate: ~60-80% of frames (environment dependent)
- Latency: <1ms
- Purpose: Skip CLAP inference for non-dog sounds
- Model: Google's YAMNet (TensorFlow, ~3.7M params)
- Classes: AudioSet class 69 (Dog) and 70 (Bark)
- Threshold: 0.05 (kept low to avoid missing barks)
- Skip Rate: 30-40% of VAD-passed frames
- Latency: ~50ms
- Purpose: Zero-shot audio classification with multi-label veto
- Model: LAION CLAP (
laion/clap-htsat-unfused) - Features:
- Compares bark labels against speech, percussion, birds
- Rolling window (2/3 positives required)
- High-confidence bypass (≥80%)
- Detection cooldown prevents rapid-fire
- Latency: ~500ms
Monitor pipeline status in real-time via the Dashboard's Detection Pipeline card.
woofalytics-v2/
├── src/woofalytics/ # Python backend
│ ├── __init__.py # Package version and exports
│ ├── __main__.py # CLI entry point (python -m woofalytics)
│ ├── app.py # FastAPI application with lifespan
│ ├── config.py # Pydantic v2 settings system
│ │
│ ├── audio/
│ │ ├── __init__.py # Module exports
│ │ ├── devices.py # Microphone discovery (PyAudio wrapper)
│ │ └── capture.py # Async audio capture with ring buffer
│ │
│ ├── detection/
│ │ ├── __init__.py # Module exports
│ │ ├── model.py # BarkDetector orchestrator + BarkEvent
│ │ ├── clap.py # CLAP zero-shot classifier (primary)
│ │ ├── yamnet.py # YAMNet pre-filter gate (TensorFlow)
│ │ ├── vad.py # Voice activity detection gate
│ │ ├── features.py # Mel filterbank feature extraction (legacy)
│ │ ├── doa.py # Direction of arrival estimation
│ │ └── resample_cache.py # Cached audio resampling
│ │
│ ├── events/
│ │ ├── __init__.py # Module exports
│ │ ├── manager.py # Notification manager orchestrator
│ │ ├── debouncer.py # Per-dog notification debouncing
│ │ ├── models.py # Event data models
│ │ └── webhook.py # IFTTT and custom webhook delivery
│ │
│ ├── evidence/
│ │ ├── __init__.py # Module exports
│ │ ├── storage.py # Evidence recording and management
│ │ └── metadata.py # JSON metadata models
│ │
│ ├── fingerprint/ # Dog identification system
│ │ ├── __init__.py
│ │ ├── storage.py # SQLite fingerprint database
│ │ ├── matcher.py # CLAP embedding matching
│ │ ├── extractor.py # Feature extraction for fingerprints
│ │ ├── acoustic_features.py # Acoustic feature computation
│ │ ├── acoustic_matcher.py # Acoustic similarity matching
│ │ ├── clustering.py # HDBSCAN bark clustering
│ │ └── models.py # Fingerprint data models
│ │
│ ├── observability/
│ │ ├── __init__.py # Module exports
│ │ └── metrics.py # Prometheus-format metrics
│ │
│ ├── prompts/
│ │ └── weekly_summary.prompty # Jinja2 prompt template for AI summaries
│ │
│ └── api/
│ ├── __init__.py # Module exports
│ ├── auth.py # API key authentication
│ ├── ratelimit.py # Rate limiting (slowapi)
│ ├── routes.py # Core REST API endpoints
│ ├── routes_export.py # CSV/JSON data export
│ ├── routes_fingerprint.py # Dog profiles and bark tagging
│ ├── routes_notification.py # Notification status
│ ├── routes_settings.py # Runtime settings management
│ ├── routes_summary.py # Daily/weekly/monthly summaries + AI
│ ├── schemas.py # Core Pydantic response models
│ ├── schemas_export.py # Export response models
│ ├── schemas_fingerprint.py # Fingerprint response models
│ ├── schemas_summary.py # Summary response models
│ └── websocket.py # WebSocket endpoints + ConnectionManager
│
├── frontend/ # SvelteKit frontend (NASA Mission Control theme)
│ ├── src/
│ │ ├── routes/ # SvelteKit pages
│ │ │ ├── +page.svelte # Dashboard with real-time monitoring
│ │ │ ├── dogs/ # Dog management page
│ │ │ ├── fingerprints/ # Fingerprints explorer
│ │ │ ├── reports/ # Bark activity reports
│ │ │ └── settings/ # Settings & maintenance
│ │ ├── lib/
│ │ │ ├── api/ # Type-safe API client (openapi-fetch)
│ │ │ ├── components/ # Reusable UI components
│ │ │ └── stores/ # Svelte stores for WebSocket state
│ │ └── app.css # Global styles (glassmorphism theme)
│ ├── build/ # Production build (gitignored)
│ ├── package.json
│ └── svelte.config.js
│
├── static/ # Evidence audio files (served at /static)
│
├── models/
│ └── traced_model.pt # TorchScript bark detection model
│
├── evidence/ # Evidence recordings (created at runtime)
│
├── tests/
│ ├── __init__.py
│ ├── conftest.py # Pytest fixtures
│ ├── test_api_routes.py # API endpoint tests
│ ├── test_api_websocket.py # WebSocket tests
│ ├── test_audio.py # Audio module tests
│ ├── test_config.py # Configuration tests
│ ├── test_detection.py # Detection module tests
│ ├── test_evidence.py # Evidence module tests
│ ├── test_export.py # Data export tests
│ ├── test_fingerprint_clustering.py # Clustering tests
│ ├── test_fingerprint_matching.py # Fingerprint matching tests
│ ├── test_quiet_hours.py # Quiet hours tests
│ ├── test_resample_cache.py # Resample cache tests
│ ├── test_summary.py # Summary endpoint tests
│ └── test_yamnet.py # YAMNet gate tests
│
├── pyproject.toml # Python packaging (PEP 517/518)
├── Dockerfile # Multi-stage Docker build
├── docker-compose.yml # Docker Compose deployment
├── config.yaml # Default configuration
├── .env.example # Environment variable template
└── README.md # This file
Pattern: Pydantic v2 with proper nesting (BaseModel for nested, BaseSettings for root only)
# Nested configs use BaseModel (NOT BaseSettings)
class AudioConfig(BaseModel):
device_name: str | None = None
sample_rate: int = 44100
channels: int = 2
# ...
# Only root uses BaseSettings
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_prefix="WOOFALYTICS__",
env_nested_delimiter="__",
)
audio: AudioConfig = Field(default_factory=AudioConfig)
# ...Environment Variables:
- Prefix:
WOOFALYTICS__ - Nested delimiter:
__ - Example:
WOOFALYTICS__AUDIO__SAMPLE_RATE=48000
MicrophoneInfo- Dataclass for device infolist_microphones(min_channels)- List all input devicesfind_microphone(device_name, min_channels)- Auto-detect or filter by nameset_microphone_volume(percent)- ALSA amixer wrapper (Linux only)
AudioFrame- Single frame with timestamp, raw bytes, metadataAsyncAudioCapture- Runs PyAudio in background thread, async interface- Ring buffer (default 30 seconds)
get_recent_frames(count)- Get N most recent framesget_buffer_as_array(seconds)- Get audio as numpy array
FeatureExtractor- Converts audio to Mel filterbank features- Resamples from source rate (44.1kHz) to model rate (16kHz)
- 80 Mel bins, 25ms frame, 10ms hop
- Uses
torchaudio.compliance.kaldi.fbankfor Kaldi compatibility - Output:
(1, 480)tensor (6 frames × 80 mels)
YAMNetGate- TensorFlow-based pre-filter (~3.7M params)- Uses Google's YAMNet to detect dog/bark audio classes
- Skips expensive CLAP inference for non-dog sounds
- Falls back to CLAP-only if TensorFlow fails to load
- Caches resampled audio to avoid redundant computation across pipeline stages
DirectionEstimator- Estimates sound direction using ULA or UCA geometry- Bartlett - Simple beamforming (default)
- Capon (MVDR) - Higher resolution
- MEM - Maximum entropy, best for close sources
angle_to_direction(angle)- Converts degrees to compass directions
CLAPConfig- Configuration for CLAP detectionbark_labels- Positive bark sound labelsspeech_labels- Human speech for vetopercussive_labels- Claps, knocks for vetobird_labels- Bird sounds for vetothreshold,speech_veto_threshold,bird_veto_thresholdrolling_window_size,detection_cooldown_frames
CLAPDetector- Zero-shot audio classifier using LAION CLAP- Uses
laion/clap-htsat-unfusedmodel by default - Caches text embeddings for efficiency
- Multi-label detection with veto system
- Rolling window smoothing with high-confidence bypass
- Detection cooldown to prevent rapid-fire triggers
- Uses
VADConfig- Configuration for VAD gateVADGate- Fast energy-based rejection of silent audio- Skips expensive CLAP inference on silent frames
- Configurable energy threshold in dB
BarkEvent- Detection event with timestamp, probability, DOABarkDetector- Main orchestrator- Supports both CLAP (default) and legacy MLP modes
- CLAP mode: 500ms inference interval with 1s audio windows
- Legacy mode: 80ms inference interval with TorchScript
- Manages callbacks for event notification
- Tracks statistics (uptime, total barks, VAD skips)
DetectionInfo- Probability, bark count, DOA valuesDeviceInfo- Hostname, microphone nameEvidenceMetadata- Complete metadata for a recordingEvidenceIndex- Index of all evidence files
EvidenceStorage- Records bark clips- Triggers on bark detection
- Records past context (15s) + future context (15s)
- Saves WAV + JSON sidecar
- Maintains searchable index
NotificationManager- Orchestrates bark alert notifications- Integrates quiet hours, debouncing, and webhook delivery
- Runs webhook calls in a thread pool to avoid blocking
- Per-dog rate limiting to prevent notification spam
- Configurable debounce window (default 5 minutes)
- IFTTT Maker Webhooks and custom HTTPS webhook support
- SSRF protection (blocks private IPs and internal hostnames)
- Retry with configurable timeout
- Extracts CLAP embeddings and acoustic features from bark audio
- Computes spectral centroid, bandwidth, rolloff, and other acoustic features for bark characterization
- Weighted acoustic feature similarity for dog matching
- HDBSCAN-based clustering of untagged bark fingerprints for discovering new dogs
- Prometheus-compatible metrics endpoint (
/api/metrics) - Tracks bark counts, inference latency, VAD/YAMNet skip rates, evidence storage
- Optional API key authentication via
X-API-Keyheader - Configurable via
server.api_keyorWOOFALYTICS__SERVER__API_KEY
- Per-endpoint rate limiting using slowapi
- Configurable limits for read, write, download, and WebSocket operations
See API Reference below.
ConnectionManager- Manages active WebSocket connections/ws/bark- Real-time bark events/ws/pipeline- Detection pipeline state at 10Hz (VAD/YAMNet/CLAP stages, stats)
- Uses
lifespancontext manager for startup/shutdown - Dependency injection via
app.state - Mounts static files, includes routers
audio:
device_name: null # null = auto-detect, or specific name e.g. "pulse"
sample_rate: 44100 # Hz
channels: 2 # Minimum 2 for DOA (use 4 for circular arrays)
chunk_size: 441 # Samples per chunk (~10ms at 44.1kHz)
volume_percent: 75 # Microphone gain (0-100)
model:
use_clap: true # Use CLAP zero-shot (recommended)
clap_model: laion/clap-htsat-unfused
clap_threshold: 0.6 # Bark confidence threshold (0.0-1.0)
clap_bird_veto_threshold: 0.15 # Bird veto threshold (lower = more aggressive)
clap_min_harmonic_ratio: 0.1 # Minimum harmonic ratio (0 to disable)
clap_device: cpu # or cuda
vad_enabled: true # Fast rejection of silent audio
vad_threshold_db: -40 # Energy threshold for VAD (dBFS)
yamnet_enabled: true # YAMNet pre-filter (skips CLAP on non-dog audio)
yamnet_threshold: 0.05 # YAMNet dog probability threshold (kept low)
# Legacy MLP settings (when use_clap: false)
path: ./models/traced_model.pt
target_sample_rate: 16000
threshold: 0.88
doa:
enabled: true
array_type: ula # 'ula' (linear) or 'uca' (circular)
element_spacing: 0.1 # Inter-element spacing in wavelengths (ULA)
radius: 0.1 # Array radius in wavelengths (UCA, ~0.093 for ReSpeaker 4-Mic)
num_elements: 2 # Number of microphone elements
angle_min: 0
angle_max: 180 # Use 360 for UCA
method: bartlett # 'bartlett', 'capon', or 'mem'
evidence:
directory: ./evidence
past_context_seconds: 15
future_context_seconds: 15
notification:
enabled: false # Enable notification system
webhook:
enabled: false
ifttt_event: woof
# ifttt_key: set via environment
debounce_seconds: 300 # Min seconds between notifications per dog
quiet_hours:
enabled: false
start: "22:00" # Quiet period start (HH:MM)
end: "06:00" # Quiet period end (HH:MM)
threshold: 0.9 # Higher threshold during quiet hours
notifications: false # Suppress notifications during quiet hours
timezone: UTC # IANA timezone (e.g. 'Australia/Sydney')
server:
host: 127.0.0.1 # Localhost only by default (use 0.0.0.0 for network access)
port: 8000
api_key: null # Set for API authentication (generate with: python -c 'import secrets; print(secrets.token_hex(16))')
rate_limit:
enabled: true
read_limit: "120/minute"
write_limit: "30/minute"
log_level: INFO # DEBUG, INFO, WARNING, ERROR
log_format: console # console or json
debug: false # Enable debug diagnostics# Override any config value
WOOFALYTICS__LOG_LEVEL=DEBUG
WOOFALYTICS__MODEL__THRESHOLD=0.90
WOOFALYTICS__AUDIO__DEVICE_NAME=ReSpeaker
WOOFALYTICS__WEBHOOK__IFTTT_KEY=your_secret_keyThe /api/summary/weekly/ai and /api/summary/ai endpoints generate natural-language bark reports using a local LLM via Ollama. This is entirely optional -- all other summary endpoints work without it.
Setup:
# Install Ollama (https://ollama.com/download)
curl -fsSL https://ollama.com/install.sh | sh
# Pull the default model
ollama pull qwen2.5:3bEnvironment variables:
| Variable | Default | Description |
|---|---|---|
OLLAMA_URL |
http://localhost:11434 |
Ollama API base URL |
OLLAMA_MODEL |
qwen2.5:3b |
Model to use for generation |
Hardware note: The default qwen2.5:3b model requires ~2GB RAM (Q4 quantized). Since woofalytics itself already uses significant RAM for CLAP + YAMNet, you'll want at least 8GB total if running Ollama on the same machine. Summaries are generated on-demand so generation speed isn't critical -- a few seconds on a modern x86 CPU is typical. You can also point OLLAMA_URL at a remote Ollama instance to offload generation entirely.
If Ollama is not running, the AI summary endpoints return a 503 error; all other functionality is unaffected.
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check with uptime, bark count, evidence count |
/api/status |
GET | Detector status (running, uptime, last event, gate stats) |
/api/config |
GET | Current configuration (sanitized, no secrets) |
/api/metrics |
GET | Prometheus-format metrics |
| Endpoint | Method | Description |
|---|---|---|
/api/bark |
GET | Latest bark event |
/api/bark/probability |
GET | Just the probability value |
/api/bark/recent?count=10 |
GET | Recent events (1-100) |
/api/direction |
GET | Current DOA with all methods |
| Endpoint | Method | Description |
|---|---|---|
/api/evidence?count=20 |
GET | List recent evidence |
/api/evidence/stats |
GET | Storage statistics |
/api/evidence/{filename} |
GET | Download WAV or JSON file |
/api/evidence/date/{YYYY-MM-DD} |
GET | Evidence by date |
/api/evidence/purge |
POST | Purge evidence older than N days |
| Endpoint | Method | Description |
|---|---|---|
/api/dogs |
GET | List all dog profiles |
/api/dogs |
POST | Create a new dog profile |
/api/dogs/{id} |
GET | Get dog profile |
/api/dogs/{id} |
PUT | Update dog profile |
/api/dogs/{id} |
DELETE | Delete dog profile |
/api/dogs/{id}/barks |
GET | Get barks for a specific dog |
/api/dogs/{id}/confirm |
POST | Confirm a dog profile |
/api/dogs/{id}/unconfirm |
POST | Unconfirm a dog profile |
/api/dogs/{id}/reset-embedding |
POST | Reset dog's embedding |
/api/dogs/merge |
POST | Merge two dog profiles |
/api/fingerprints |
GET | List fingerprints (with filtering) |
/api/fingerprints/aggregates |
GET | Fingerprint aggregate stats |
/api/fingerprints/stats |
GET | Fingerprint system statistics |
/api/fingerprints/{id} |
DELETE | Delete a fingerprint |
/api/fingerprints/purge |
POST | Purge fingerprints older than N days |
/api/fingerprints/purge-without-evidence |
POST | Remove orphaned fingerprints |
/api/fingerprints/recalculate-bark-counts |
POST | Recalculate bark counts |
| Endpoint | Method | Description |
|---|---|---|
/api/barks/untagged |
GET | List untagged barks |
/api/barks/{id}/tag |
POST | Tag a bark to a dog |
/api/barks/bulk-tag |
POST | Bulk tag multiple barks |
/api/barks/{id}/correct |
POST | Correct a bark's dog assignment |
/api/barks/{id}/untag |
POST | Remove a bark's tag |
/api/barks/{id}/reject |
POST | Mark a bark as false positive |
/api/barks/{id}/unreject |
POST | Un-reject a bark |
/api/barks/{id}/confirm |
POST | Confirm a bark detection |
/api/barks/{id}/unconfirm |
POST | Unconfirm a bark detection |
/api/barks/cluster |
POST | Cluster untagged barks (HDBSCAN) |
/api/barks/cluster/{id}/create-dog |
POST | Create dog from cluster |
| Endpoint | Method | Description |
|---|---|---|
/api/summary/daily |
GET | Daily bark summary |
/api/summary/weekly |
GET | Weekly bark summary |
/api/summary/monthly |
GET | Monthly bark summary |
/api/summary/range |
GET | Custom date range summary |
/api/summary/weekly/ai |
GET | AI-generated weekly summary (Ollama) |
/api/summary/ai |
GET | AI-generated range summary (Ollama) |
/api/export/json |
GET | Export bark data as JSON |
/api/export/csv |
GET | Export bark data as CSV |
/api/export/stats |
GET | Export statistics |
| Endpoint | Method | Description |
|---|---|---|
/api/settings |
GET | Get all runtime settings |
/api/settings |
PUT | Update runtime settings (persisted to config.yaml) |
/api/notifications/status |
GET | Notification system status |
| Endpoint | Description |
|---|---|
/ws/bark |
Real-time bark events (JSON) |
/ws/pipeline |
Detection pipeline state at 10Hz (VAD/YAMNet/CLAP stages) |
- Swagger UI:
/api/docs - ReDoc:
/api/redoc - OpenAPI JSON:
/api/openapi.json
The frontend is a SvelteKit SPA with a NASA Mission Control-inspired theme (glassmorphism, dark UI, cyan/amber accents).
| Route | Description |
|---|---|
/ |
Dashboard - Real-time bark probability, detection pipeline monitor, dog overview with last heard timestamps, persistent statistics |
/dogs |
Dog Management - View registered dogs, bark counts, last heard indicators, bark modal with reassign/untag/delete actions |
/fingerprints |
Fingerprints Explorer - Browse bark fingerprints with filtering, playback, and clustering analysis |
/reports |
Reports - Bark activity reports and trend analysis |
/settings |
Settings & Maintenance - Detection parameters, quiet hours, webhooks, fingerprint purge |
- Real-time Updates - WebSocket streams for live bark events and audio levels
- Type-safe API Client - Generated from OpenAPI schema using
openapi-fetch - Svelte 5 Runes - Modern reactive state with
$state,$derived,$effect - Responsive Design - Works on desktop and tablet
- Evidence Playback - Listen to recorded bark clips directly in the browser
- Bark Management Modal - View dog's barks with reassign, untag, and delete controls
- Last Heard Indicators - Teal audio icon showing when each dog was last detected
- Clustering UI - Visual bark clustering for pattern analysis and dog profile creation
- Persistent Dashboard Stats - Bark counts survive page refreshes via API persistence
- Toast Notifications - Non-blocking feedback replacing browser alerts
- Active Navigation - Clear indication of current page with amber highlight
- Accessibility - Targets WCAG AA text contrast, labeled form inputs,
prefers-reduced-motionsupport
The SvelteKit frontend is built to static files and served directly by FastAPI. No separate Node.js server required in production.
- Python 3.11+ with a working PyAudio/PortAudio installation
- 2GB+ RAM (CLAP + YAMNet models need memory)
- Any microphone (1+ channels; 2+ for DOA)
- ReSpeaker 2-Mic HAT (~$12) - HAT form factor, 2 mics
- ReSpeaker 4-Mic Array (~$35) - 360° coverage, use
array_type: uca
# Install seeed-voicecard driver
git clone https://github.com/respeaker/seeed-voicecard
cd seeed-voicecard
sudo ./install.sh
sudo rebootgit clone https://github.com/machug/woofalytics-v2.git
cd woofalytics-v2
cp .env.example .env
docker-compose up -d# System dependencies (Debian/Ubuntu)
sudo apt-get update
sudo apt-get install -y \
python3.11 python3.11-venv \
portaudio19-dev libasound2-dev \
alsa-utils nodejs npm
# Create virtual environment
python3.11 -m venv venv
source venv/bin/activate
# Install Python package
pip install -e .
# Build frontend
cd frontend
npm install
npm run build
cd ..
# Verify audio devices
woofalytics --list-devices
# Run
woofalyticswoofalytics [OPTIONS]
Options:
-c, --config PATH Config file (default: config.yaml)
--host TEXT Override host
-p, --port INTEGER Override port
--reload Enable hot reload (dev)
--log-level LEVEL Override log level
--list-devices List audio devices and exit
--version Show version
- Multi-stage build (builder + runtime)
- Non-root user (
woofalytics) - Audio libraries pre-installed
- Health check included
- Evidence volume for persistence
services:
woofalytics:
build: .
container_name: woofalytics
ports:
- "8000:8000"
devices:
- /dev/snd:/dev/snd # Audio device access
group_add:
- audio # Audio group membership
volumes:
- ./config.yaml:/home/woofalytics/app/config.yaml:ro
- ./evidence:/home/woofalytics/app/evidence
- ./models:/home/woofalytics/app/models:ro
environment:
- TZ=Europe/London
- WOOFALYTICS__WEBHOOK__IFTTT_KEY=${IFTTT_KEY:-}
- WOOFALYTICS__LOG_LEVEL=${LOG_LEVEL:-INFO}
restart: unless-stopped
deploy:
resources:
limits:
memory: 1G
reservations:
memory: 512M
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s# Build and start
docker-compose up -d --build
# View logs
docker-compose logs -f
# Stop
docker-compose down
# Rebuild after code changes
docker-compose up -d --build --force-recreate# Install with dev dependencies
pip install -e ".[dev]"
# Install pre-commit hooks (optional)
pre-commit install
# Install frontend dependencies
cd frontend && npm install && cd ..# With hot reload
woofalytics --reload --log-level DEBUG
# Or directly with uvicorn
uvicorn woofalytics.app:app --reload --host 0.0.0.0 --port 8000# Start the SvelteKit dev server (auto-proxies API calls to backend)
cd frontend
npm run dev
# Frontend available at http://localhost:5173
# Backend must be running on port 8000cd frontend
npm run build # Outputs to frontend/build/
npm run preview # Preview production build locally# Python linting
ruff check src/woofalytics
# Python type checking
mypy src/woofalytics
# Python format
ruff format src/woofalytics
# Frontend type checking
cd frontend && npm run check# All tests
pytest
# With coverage
pytest --cov=woofalytics --cov-report=html
# Specific module
pytest tests/test_config.py -v
# With output
pytest -sconftest.py- Shared fixtures (mock PyAudio, test settings, etc.)test_api_routes.py- API endpoint teststest_api_websocket.py- WebSocket teststest_audio.py- Audio frame and device teststest_config.py- Configuration validationtest_detection.py- DOA and bark event teststest_evidence.py- Metadata and storage teststest_export.py- Data export teststest_fingerprint_clustering.py- Bark clustering teststest_fingerprint_matching.py- Fingerprint matching teststest_quiet_hours.py- Quiet hours scheduling teststest_resample_cache.py- Resample cache teststest_summary.py- Summary endpoint teststest_yamnet.py- YAMNet gate tests
Tests mock PyAudio to run without audio hardware:
@pytest.fixture
def mock_pyaudio():
with patch("pyaudio.PyAudio") as mock:
# Configure mock device list
yield mockUsing BaseSettings for nested configs causes environment variable conflicts. The correct pattern:
BaseModelfor nested configs (AudioConfig, ModelConfig, etc.)BaseSettingsonly for root Settings class- Environment variables work with
__delimiter:WOOFALYTICS__AUDIO__SAMPLE_RATE
PyAudio is blocking, but FastAPI is async. Solution:
- Run PyAudio in a background daemon thread
- Use thread-safe ring buffer (deque with lock)
- Async methods for control (
start(),stop()) - Sync methods for buffer access (called from any context)
Each has trade-offs:
- Bartlett - Robust, works well with noise
- Capon - Better resolution, more sensitive to calibration
- MEM - Best for multiple sources, computationally heavier
CLAP (Contrastive Language-Audio Pretraining) offers key advantages:
- Zero-shot - No training data required, works immediately
- Multi-label - Can detect bark AND check for speech/birds simultaneously
- Veto system - Reduces false positives by rejecting similar sounds
- Generalizes - Works across dog breeds without fine-tuning
The downside is slower inference (~500ms vs 80ms), which is why:
- VAD gate fast-rejects silent audio before CLAP
- High-confidence bypass (≥80%) enables instant detection
- Detection cooldown prevents rapid-fire from same sound
For constrained hardware or faster inference, the legacy MLP model offers:
- 80ms inference interval (12.5 inferences/second)
- Smaller memory footprint
- Less accurate but faster
For documentation purposes, metadata must be:
- Human-readable (JSON, not binary)
- Separate from audio (can't be embedded in WAV easily)
- Include precise timestamps, probabilities, device info
- Machine-parseable for cataloging and fingerprinting
- Evidence Cleanup - Automatic old file removal (manual purge available via API)
- Audio Spectrogram - Visual display in web UI
- Home Assistant Integration - MQTT or REST
- SMS/Push Notifications - Via Pushover/Twilio
- Webhook Notifications - Configurable webhooks for bark alerts
- Multi-Dog Fingerprinting - Identify individual dogs by bark signature
- Bark Pattern Analysis - Clustering UI for analyzing bark patterns
- Quiet Hours - Scheduled reduced sensitivity periods
- Fingerprint Purge - Remove orphaned fingerprints without audio evidence
- Notification Debouncing - Per-dog rate limiting via
events/debouncer.py - Prometheus Metrics - Prometheus-format metrics at
/api/metrics - API Authentication - Optional API key authentication
- Rate Limiting - Per-endpoint rate limiting
- Runtime Settings - Update settings via UI, persisted to config.yaml
- ALSA Volume Control - Microphone volume adjustment (
volume_percent) uses ALSA and is Linux-specific; detection works on any OS with PyAudio - CPU Only - Inference is CPU-only (GPU not required)
This is a fork/rewrite of the original woofalytics project. Key changes:
| Aspect | Original | v2.5 |
|---|---|---|
| Python | 3.9+ | 3.11+ |
| Detection | Custom MLP | CLAP zero-shot (+ legacy MLP) |
| False Positives | High | Multi-layer veto system |
| Web Framework | Basic HTTP | FastAPI |
| Config | Hardcoded | Pydantic v2 |
| Microphone | Andrea only | Any USB mic |
| Real-time | Polling | WebSocket |
| Evidence | WAV only | WAV + JSON metadata |
| Deployment | Manual | Docker |
| Tests | None | pytest suite |
Version is tracked in the VERSION file at the repository root. See CHANGELOG.md for release history.
MIT License - See original project for attribution.
- Fork the repository
- Create a feature branch
- Run tests:
pytest - Run linting:
ruff check src/ - Submit a pull request
# Start the server
woofalytics
# List audio devices
woofalytics --list-devices
# Run with debug logging
woofalytics --log-level DEBUG
# Docker
docker-compose up -d
# Run tests
pytest
# Check API docs
open http://localhost:8000/api/docs