Skip to content

machug/woofalytics-v2

 
 

Repository files navigation

🐕 Woofalytics v2.5.0

AI-powered dog bark detection and cataloging

A complete modernization of the original woofalytics project, built for cataloging and fingerprinting barking dogs within earshot. Uses zero-shot audio classification (CLAP) to detect barks without training data, with automatic recording for documentation purposes.


Table of Contents


Project Goals

This project was created with specific intentions:

  1. Learning - Push modern Python patterns to the limits (deliberately over-engineered)
  2. Dog Cataloging - Document and fingerprint all barking dogs within earshot
  3. Best Practices - Latest patterns, proper architecture, comprehensive documentation

Key Features

  • Zero-Shot Bark Detection - CLAP-powered classification without training data (~500ms inference)
  • Multi-Layer Veto System - Rejects speech, percussion, and bird sounds to reduce false positives
  • Direction of Arrival (DOA) - Know which direction barks come from using stereo microphones
  • Evidence Recording - Automatic 30-second clips with JSON metadata sidecars
  • Dog Fingerprinting - Identify and track individual dogs by bark signature
  • Bark Management - Reassign barks to different dogs, untag, or delete directly from dog profiles
  • Last Heard Tracking - See when each dog was last detected with accurate timestamps
  • Webhook Notifications - Configurable webhooks for bark alerts with customizable payloads
  • Quiet Hours - Schedule reduced sensitivity periods (e.g., nighttime) via Settings UI
  • Clustering Analysis - Visual interface for analyzing untagged barks and creating dog profiles
  • Modern Web UI - Real-time dashboard with WebSocket updates and persistent statistics
  • Accessible by Design - Aims for WCAG AA compliance, screen reader support, respects motion preferences
  • REST API - Full OpenAPI documentation at /api/docs
  • Docker Support - Easy deployment with Docker Compose
  • Flexible Configuration - YAML config with environment variable overrides
  • AI Summaries - LLM-generated weekly/custom-range bark reports via Ollama (optional)
  • Legacy MLP Support - Optional TorchScript models for faster inference

Architecture Overview

┌───────────────────────────────────────────────────────────────┐
│                      FastAPI Application                      │
│  ┌────────────┐  ┌────────────┐  ┌────────────────────────┐   │
│  │  REST API  │  │  WebSocket │  │     Static Files       │   │
│  │  /api/*    │  │  /ws/bark  │  │     /static/*          │   │
│  │            │  │/ws/pipeline│  │                        │   │
│  └─────┬──────┘  └─────┬──────┘  └────────────────────────┘   │
│        │               │                                      │
│        └───────┬───────┘                                      │
│                ▼                                              │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │                     BarkDetector                        │  │
│  │  - Coordinates audio capture, inference, callbacks      │  │
│  │  - Runs inference loop every 500ms (CLAP) or 80ms (MLP) │  │
│  │  - Produces BarkEvent objects                           │  │
│  └─────────────────────────────────────────────────────────┘  │
│        │                │                   │                 │
│        ▼                ▼                   ▼                 │
│  ┌───────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ Audio     │  │  VAD Gate   │  │     DOA Estimator       │  │
│  │ Capture   │  │ (fast skip) │  │  (Bartlett/Capon/MEM)   │  │
│  └───────────┘  └──────┬──────┘  └─────────────────────────┘  │
│                        ▼                                      │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │                    CLAP Detector                        │  │
│  │  - Zero-shot audio classification (laion/clap-htsat)    │  │
│  │  - Multi-label veto (speech, percussion, birds)         │  │
│  │  - Rolling window + high-confidence bypass              │  │ 
│  └─────────────────────────────────────────────────────────┘  │
│        │                                                      │
│        ▼                                                      │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │                   EvidenceStorage                       │  │
│  │  - Records WAV clips on bark detection                  │  │
│  │  - Creates JSON metadata sidecars                       │  │
│  │  - Maintains evidence index                             │  │
│  └─────────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────────┘

Data Flow

  1. Audio Capture (audio/capture.py) runs in a background thread, filling a ring buffer
  2. BarkDetector (detection/model.py) reads ~100 frames (1 second) from buffer every 500ms
  3. VAD Gate (detection/vad.py) fast-rejects silent audio before expensive CLAP inference
  4. CLAP Detector (detection/clap.py) runs zero-shot classification with multi-label veto:
    • Compares "dog barking" against speech, percussion, bird, and other sound labels
    • Uses rolling window (2/3 positives required) to smooth detections
    • High-confidence barks (≥80%) bypass rolling window for instant detection
    • Detection cooldown prevents rapid-fire triggers from the same sound
  5. DOA Estimator (detection/doa.py) calculates direction using pyargus algorithms
  6. BarkEvent is created and broadcast to all registered callbacks
  7. EvidenceStorage (evidence/storage.py) records clips when barks are detected
  8. WebSocket broadcasts events to connected web clients in real-time

Note: Legacy MLP mode uses 80ms inference with TorchScript for faster but less accurate detection.


Detection Pipeline

Woofalytics uses a multi-stage filtering approach to balance accuracy with performance:

Audio Input → VAD Gate → YAMNet Gate → CLAP Detector → Bark Event
                ↓            ↓              ↓
             (skip)       (skip)        (detect)

1. VAD Gate (Voice Activity Detection)

  • Purpose: Fast energy-based rejection of silent audio
  • Method: RMS energy threshold in dB
  • Skip Rate: ~60-80% of frames (environment dependent)
  • Latency: <1ms

2. YAMNet Gate (Pre-filter)

  • Purpose: Skip CLAP inference for non-dog sounds
  • Model: Google's YAMNet (TensorFlow, ~3.7M params)
  • Classes: AudioSet class 69 (Dog) and 70 (Bark)
  • Threshold: 0.05 (kept low to avoid missing barks)
  • Skip Rate: 30-40% of VAD-passed frames
  • Latency: ~50ms

3. CLAP Detector (Primary)

  • Purpose: Zero-shot audio classification with multi-label veto
  • Model: LAION CLAP (laion/clap-htsat-unfused)
  • Features:
    • Compares bark labels against speech, percussion, birds
    • Rolling window (2/3 positives required)
    • High-confidence bypass (≥80%)
    • Detection cooldown prevents rapid-fire
  • Latency: ~500ms

Monitor pipeline status in real-time via the Dashboard's Detection Pipeline card.


File Structure

woofalytics-v2/
├── src/woofalytics/             # Python backend
│   ├── __init__.py              # Package version and exports
│   ├── __main__.py              # CLI entry point (python -m woofalytics)
│   ├── app.py                   # FastAPI application with lifespan
│   ├── config.py                # Pydantic v2 settings system
│   │
│   ├── audio/
│   │   ├── __init__.py          # Module exports
│   │   ├── devices.py           # Microphone discovery (PyAudio wrapper)
│   │   └── capture.py           # Async audio capture with ring buffer
│   │
│   ├── detection/
│   │   ├── __init__.py          # Module exports
│   │   ├── model.py             # BarkDetector orchestrator + BarkEvent
│   │   ├── clap.py              # CLAP zero-shot classifier (primary)
│   │   ├── yamnet.py            # YAMNet pre-filter gate (TensorFlow)
│   │   ├── vad.py               # Voice activity detection gate
│   │   ├── features.py          # Mel filterbank feature extraction (legacy)
│   │   ├── doa.py               # Direction of arrival estimation
│   │   └── resample_cache.py    # Cached audio resampling
│   │
│   ├── events/
│   │   ├── __init__.py          # Module exports
│   │   ├── manager.py           # Notification manager orchestrator
│   │   ├── debouncer.py         # Per-dog notification debouncing
│   │   ├── models.py            # Event data models
│   │   └── webhook.py           # IFTTT and custom webhook delivery
│   │
│   ├── evidence/
│   │   ├── __init__.py          # Module exports
│   │   ├── storage.py           # Evidence recording and management
│   │   └── metadata.py          # JSON metadata models
│   │
│   ├── fingerprint/             # Dog identification system
│   │   ├── __init__.py
│   │   ├── storage.py           # SQLite fingerprint database
│   │   ├── matcher.py           # CLAP embedding matching
│   │   ├── extractor.py         # Feature extraction for fingerprints
│   │   ├── acoustic_features.py # Acoustic feature computation
│   │   ├── acoustic_matcher.py  # Acoustic similarity matching
│   │   ├── clustering.py        # HDBSCAN bark clustering
│   │   └── models.py            # Fingerprint data models
│   │
│   ├── observability/
│   │   ├── __init__.py          # Module exports
│   │   └── metrics.py           # Prometheus-format metrics
│   │
│   ├── prompts/
│   │   └── weekly_summary.prompty  # Jinja2 prompt template for AI summaries
│   │
│   └── api/
│       ├── __init__.py          # Module exports
│       ├── auth.py              # API key authentication
│       ├── ratelimit.py         # Rate limiting (slowapi)
│       ├── routes.py            # Core REST API endpoints
│       ├── routes_export.py     # CSV/JSON data export
│       ├── routes_fingerprint.py # Dog profiles and bark tagging
│       ├── routes_notification.py # Notification status
│       ├── routes_settings.py   # Runtime settings management
│       ├── routes_summary.py    # Daily/weekly/monthly summaries + AI
│       ├── schemas.py           # Core Pydantic response models
│       ├── schemas_export.py    # Export response models
│       ├── schemas_fingerprint.py # Fingerprint response models
│       ├── schemas_summary.py   # Summary response models
│       └── websocket.py         # WebSocket endpoints + ConnectionManager
│
├── frontend/                    # SvelteKit frontend (NASA Mission Control theme)
│   ├── src/
│   │   ├── routes/              # SvelteKit pages
│   │   │   ├── +page.svelte     # Dashboard with real-time monitoring
│   │   │   ├── dogs/            # Dog management page
│   │   │   ├── fingerprints/    # Fingerprints explorer
│   │   │   ├── reports/         # Bark activity reports
│   │   │   └── settings/        # Settings & maintenance
│   │   ├── lib/
│   │   │   ├── api/             # Type-safe API client (openapi-fetch)
│   │   │   ├── components/      # Reusable UI components
│   │   │   └── stores/          # Svelte stores for WebSocket state
│   │   └── app.css              # Global styles (glassmorphism theme)
│   ├── build/                   # Production build (gitignored)
│   ├── package.json
│   └── svelte.config.js
│
├── static/                      # Evidence audio files (served at /static)
│
├── models/
│   └── traced_model.pt          # TorchScript bark detection model
│
├── evidence/                    # Evidence recordings (created at runtime)
│
├── tests/
│   ├── __init__.py
│   ├── conftest.py              # Pytest fixtures
│   ├── test_api_routes.py       # API endpoint tests
│   ├── test_api_websocket.py    # WebSocket tests
│   ├── test_audio.py            # Audio module tests
│   ├── test_config.py           # Configuration tests
│   ├── test_detection.py        # Detection module tests
│   ├── test_evidence.py         # Evidence module tests
│   ├── test_export.py           # Data export tests
│   ├── test_fingerprint_clustering.py  # Clustering tests
│   ├── test_fingerprint_matching.py    # Fingerprint matching tests
│   ├── test_quiet_hours.py      # Quiet hours tests
│   ├── test_resample_cache.py   # Resample cache tests
│   ├── test_summary.py          # Summary endpoint tests
│   └── test_yamnet.py           # YAMNet gate tests
│
├── pyproject.toml               # Python packaging (PEP 517/518)
├── Dockerfile                   # Multi-stage Docker build
├── docker-compose.yml           # Docker Compose deployment
├── config.yaml                  # Default configuration
├── .env.example                 # Environment variable template
└── README.md                    # This file

Module Documentation

config.py - Configuration System

Pattern: Pydantic v2 with proper nesting (BaseModel for nested, BaseSettings for root only)

# Nested configs use BaseModel (NOT BaseSettings)
class AudioConfig(BaseModel):
    device_name: str | None = None
    sample_rate: int = 44100
    channels: int = 2
    # ...

# Only root uses BaseSettings
class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="WOOFALYTICS__",
        env_nested_delimiter="__",
    )
    audio: AudioConfig = Field(default_factory=AudioConfig)
    # ...

Environment Variables:

  • Prefix: WOOFALYTICS__
  • Nested delimiter: __
  • Example: WOOFALYTICS__AUDIO__SAMPLE_RATE=48000

audio/devices.py - Microphone Discovery

  • MicrophoneInfo - Dataclass for device info
  • list_microphones(min_channels) - List all input devices
  • find_microphone(device_name, min_channels) - Auto-detect or filter by name
  • set_microphone_volume(percent) - ALSA amixer wrapper (Linux only)

audio/capture.py - Async Audio Capture

  • AudioFrame - Single frame with timestamp, raw bytes, metadata
  • AsyncAudioCapture - Runs PyAudio in background thread, async interface
    • Ring buffer (default 30 seconds)
    • get_recent_frames(count) - Get N most recent frames
    • get_buffer_as_array(seconds) - Get audio as numpy array

detection/features.py - Feature Extraction

  • FeatureExtractor - Converts audio to Mel filterbank features
    • Resamples from source rate (44.1kHz) to model rate (16kHz)
    • 80 Mel bins, 25ms frame, 10ms hop
    • Uses torchaudio.compliance.kaldi.fbank for Kaldi compatibility
    • Output: (1, 480) tensor (6 frames × 80 mels)

detection/yamnet.py - YAMNet Pre-filter Gate

  • YAMNetGate - TensorFlow-based pre-filter (~3.7M params)
    • Uses Google's YAMNet to detect dog/bark audio classes
    • Skips expensive CLAP inference for non-dog sounds
    • Falls back to CLAP-only if TensorFlow fails to load

detection/resample_cache.py - Cached Resampling

  • Caches resampled audio to avoid redundant computation across pipeline stages

detection/doa.py - Direction of Arrival

  • DirectionEstimator - Estimates sound direction using ULA or UCA geometry
    • Bartlett - Simple beamforming (default)
    • Capon (MVDR) - Higher resolution
    • MEM - Maximum entropy, best for close sources
  • angle_to_direction(angle) - Converts degrees to compass directions

detection/clap.py - CLAP Zero-Shot Classifier (Primary)

  • CLAPConfig - Configuration for CLAP detection
    • bark_labels - Positive bark sound labels
    • speech_labels - Human speech for veto
    • percussive_labels - Claps, knocks for veto
    • bird_labels - Bird sounds for veto
    • threshold, speech_veto_threshold, bird_veto_threshold
    • rolling_window_size, detection_cooldown_frames
  • CLAPDetector - Zero-shot audio classifier using LAION CLAP
    • Uses laion/clap-htsat-unfused model by default
    • Caches text embeddings for efficiency
    • Multi-label detection with veto system
    • Rolling window smoothing with high-confidence bypass
    • Detection cooldown to prevent rapid-fire triggers

detection/vad.py - Voice Activity Detection Gate

  • VADConfig - Configuration for VAD gate
  • VADGate - Fast energy-based rejection of silent audio
    • Skips expensive CLAP inference on silent frames
    • Configurable energy threshold in dB

detection/model.py - Bark Detector Orchestrator

  • BarkEvent - Detection event with timestamp, probability, DOA
  • BarkDetector - Main orchestrator
    • Supports both CLAP (default) and legacy MLP modes
    • CLAP mode: 500ms inference interval with 1s audio windows
    • Legacy mode: 80ms inference interval with TorchScript
    • Manages callbacks for event notification
    • Tracks statistics (uptime, total barks, VAD skips)

evidence/metadata.py - Metadata Models

  • DetectionInfo - Probability, bark count, DOA values
  • DeviceInfo - Hostname, microphone name
  • EvidenceMetadata - Complete metadata for a recording
  • EvidenceIndex - Index of all evidence files

evidence/storage.py - Evidence Storage

  • EvidenceStorage - Records bark clips
    • Triggers on bark detection
    • Records past context (15s) + future context (15s)
    • Saves WAV + JSON sidecar
    • Maintains searchable index

events/manager.py - Notification Manager

  • NotificationManager - Orchestrates bark alert notifications
    • Integrates quiet hours, debouncing, and webhook delivery
    • Runs webhook calls in a thread pool to avoid blocking

events/debouncer.py - Notification Debouncing

  • Per-dog rate limiting to prevent notification spam
  • Configurable debounce window (default 5 minutes)

events/webhook.py - Webhook Delivery

  • IFTTT Maker Webhooks and custom HTTPS webhook support
  • SSRF protection (blocks private IPs and internal hostnames)
  • Retry with configurable timeout

fingerprint/extractor.py - Feature Extraction

  • Extracts CLAP embeddings and acoustic features from bark audio

fingerprint/acoustic_features.py - Acoustic Feature Computation

  • Computes spectral centroid, bandwidth, rolloff, and other acoustic features for bark characterization

fingerprint/acoustic_matcher.py - Acoustic Similarity

  • Weighted acoustic feature similarity for dog matching

fingerprint/clustering.py - Bark Clustering

  • HDBSCAN-based clustering of untagged bark fingerprints for discovering new dogs

observability/metrics.py - Prometheus Metrics

  • Prometheus-compatible metrics endpoint (/api/metrics)
  • Tracks bark counts, inference latency, VAD/YAMNet skip rates, evidence storage

api/auth.py - Authentication

  • Optional API key authentication via X-API-Key header
  • Configurable via server.api_key or WOOFALYTICS__SERVER__API_KEY

api/ratelimit.py - Rate Limiting

  • Per-endpoint rate limiting using slowapi
  • Configurable limits for read, write, download, and WebSocket operations

api/routes.py - Core REST Endpoints

See API Reference below.

api/websocket.py - WebSocket Streaming

  • ConnectionManager - Manages active WebSocket connections
  • /ws/bark - Real-time bark events
  • /ws/pipeline - Detection pipeline state at 10Hz (VAD/YAMNet/CLAP stages, stats)

app.py - FastAPI Application

  • Uses lifespan context manager for startup/shutdown
  • Dependency injection via app.state
  • Mounts static files, includes routers

Configuration System

config.yaml

audio:
  device_name: null        # null = auto-detect, or specific name e.g. "pulse"
  sample_rate: 44100       # Hz
  channels: 2              # Minimum 2 for DOA (use 4 for circular arrays)
  chunk_size: 441          # Samples per chunk (~10ms at 44.1kHz)
  volume_percent: 75       # Microphone gain (0-100)

model:
  use_clap: true           # Use CLAP zero-shot (recommended)
  clap_model: laion/clap-htsat-unfused
  clap_threshold: 0.6      # Bark confidence threshold (0.0-1.0)
  clap_bird_veto_threshold: 0.15  # Bird veto threshold (lower = more aggressive)
  clap_min_harmonic_ratio: 0.1    # Minimum harmonic ratio (0 to disable)
  clap_device: cpu         # or cuda
  vad_enabled: true        # Fast rejection of silent audio
  vad_threshold_db: -40    # Energy threshold for VAD (dBFS)
  yamnet_enabled: true     # YAMNet pre-filter (skips CLAP on non-dog audio)
  yamnet_threshold: 0.05   # YAMNet dog probability threshold (kept low)
  # Legacy MLP settings (when use_clap: false)
  path: ./models/traced_model.pt
  target_sample_rate: 16000
  threshold: 0.88

doa:
  enabled: true
  array_type: ula          # 'ula' (linear) or 'uca' (circular)
  element_spacing: 0.1     # Inter-element spacing in wavelengths (ULA)
  radius: 0.1              # Array radius in wavelengths (UCA, ~0.093 for ReSpeaker 4-Mic)
  num_elements: 2          # Number of microphone elements
  angle_min: 0
  angle_max: 180           # Use 360 for UCA
  method: bartlett          # 'bartlett', 'capon', or 'mem'

evidence:
  directory: ./evidence
  past_context_seconds: 15
  future_context_seconds: 15

notification:
  enabled: false           # Enable notification system

webhook:
  enabled: false
  ifttt_event: woof
  # ifttt_key: set via environment
  debounce_seconds: 300    # Min seconds between notifications per dog

quiet_hours:
  enabled: false
  start: "22:00"           # Quiet period start (HH:MM)
  end: "06:00"             # Quiet period end (HH:MM)
  threshold: 0.9           # Higher threshold during quiet hours
  notifications: false     # Suppress notifications during quiet hours
  timezone: UTC            # IANA timezone (e.g. 'Australia/Sydney')

server:
  host: 127.0.0.1          # Localhost only by default (use 0.0.0.0 for network access)
  port: 8000
  api_key: null            # Set for API authentication (generate with: python -c 'import secrets; print(secrets.token_hex(16))')
  rate_limit:
    enabled: true
    read_limit: "120/minute"
    write_limit: "30/minute"

log_level: INFO            # DEBUG, INFO, WARNING, ERROR
log_format: console        # console or json
debug: false               # Enable debug diagnostics

Environment Variables

# Override any config value
WOOFALYTICS__LOG_LEVEL=DEBUG
WOOFALYTICS__MODEL__THRESHOLD=0.90
WOOFALYTICS__AUDIO__DEVICE_NAME=ReSpeaker
WOOFALYTICS__WEBHOOK__IFTTT_KEY=your_secret_key

AI Summaries (Ollama)

The /api/summary/weekly/ai and /api/summary/ai endpoints generate natural-language bark reports using a local LLM via Ollama. This is entirely optional -- all other summary endpoints work without it.

Setup:

# Install Ollama (https://ollama.com/download)
curl -fsSL https://ollama.com/install.sh | sh

# Pull the default model
ollama pull qwen2.5:3b

Environment variables:

Variable Default Description
OLLAMA_URL http://localhost:11434 Ollama API base URL
OLLAMA_MODEL qwen2.5:3b Model to use for generation

Hardware note: The default qwen2.5:3b model requires ~2GB RAM (Q4 quantized). Since woofalytics itself already uses significant RAM for CLAP + YAMNet, you'll want at least 8GB total if running Ollama on the same machine. Summaries are generated on-demand so generation speed isn't critical -- a few seconds on a modern x86 CPU is typical. You can also point OLLAMA_URL at a remote Ollama instance to offload generation entirely.

If Ollama is not running, the AI summary endpoints return a 503 error; all other functionality is unaffected.


API Reference

Health & Status

Endpoint Method Description
/api/health GET Health check with uptime, bark count, evidence count
/api/status GET Detector status (running, uptime, last event, gate stats)
/api/config GET Current configuration (sanitized, no secrets)
/api/metrics GET Prometheus-format metrics

Bark Detection

Endpoint Method Description
/api/bark GET Latest bark event
/api/bark/probability GET Just the probability value
/api/bark/recent?count=10 GET Recent events (1-100)
/api/direction GET Current DOA with all methods

Evidence

Endpoint Method Description
/api/evidence?count=20 GET List recent evidence
/api/evidence/stats GET Storage statistics
/api/evidence/{filename} GET Download WAV or JSON file
/api/evidence/date/{YYYY-MM-DD} GET Evidence by date
/api/evidence/purge POST Purge evidence older than N days

Dog Profiles & Fingerprints

Endpoint Method Description
/api/dogs GET List all dog profiles
/api/dogs POST Create a new dog profile
/api/dogs/{id} GET Get dog profile
/api/dogs/{id} PUT Update dog profile
/api/dogs/{id} DELETE Delete dog profile
/api/dogs/{id}/barks GET Get barks for a specific dog
/api/dogs/{id}/confirm POST Confirm a dog profile
/api/dogs/{id}/unconfirm POST Unconfirm a dog profile
/api/dogs/{id}/reset-embedding POST Reset dog's embedding
/api/dogs/merge POST Merge two dog profiles
/api/fingerprints GET List fingerprints (with filtering)
/api/fingerprints/aggregates GET Fingerprint aggregate stats
/api/fingerprints/stats GET Fingerprint system statistics
/api/fingerprints/{id} DELETE Delete a fingerprint
/api/fingerprints/purge POST Purge fingerprints older than N days
/api/fingerprints/purge-without-evidence POST Remove orphaned fingerprints
/api/fingerprints/recalculate-bark-counts POST Recalculate bark counts

Bark Tagging

Endpoint Method Description
/api/barks/untagged GET List untagged barks
/api/barks/{id}/tag POST Tag a bark to a dog
/api/barks/bulk-tag POST Bulk tag multiple barks
/api/barks/{id}/correct POST Correct a bark's dog assignment
/api/barks/{id}/untag POST Remove a bark's tag
/api/barks/{id}/reject POST Mark a bark as false positive
/api/barks/{id}/unreject POST Un-reject a bark
/api/barks/{id}/confirm POST Confirm a bark detection
/api/barks/{id}/unconfirm POST Unconfirm a bark detection
/api/barks/cluster POST Cluster untagged barks (HDBSCAN)
/api/barks/cluster/{id}/create-dog POST Create dog from cluster

Summaries & Export

Endpoint Method Description
/api/summary/daily GET Daily bark summary
/api/summary/weekly GET Weekly bark summary
/api/summary/monthly GET Monthly bark summary
/api/summary/range GET Custom date range summary
/api/summary/weekly/ai GET AI-generated weekly summary (Ollama)
/api/summary/ai GET AI-generated range summary (Ollama)
/api/export/json GET Export bark data as JSON
/api/export/csv GET Export bark data as CSV
/api/export/stats GET Export statistics

Settings & Notifications

Endpoint Method Description
/api/settings GET Get all runtime settings
/api/settings PUT Update runtime settings (persisted to config.yaml)
/api/notifications/status GET Notification system status

WebSocket

Endpoint Description
/ws/bark Real-time bark events (JSON)
/ws/pipeline Detection pipeline state at 10Hz (VAD/YAMNet/CLAP stages)

OpenAPI Documentation

  • Swagger UI: /api/docs
  • ReDoc: /api/redoc
  • OpenAPI JSON: /api/openapi.json

Web UI

The frontend is a SvelteKit SPA with a NASA Mission Control-inspired theme (glassmorphism, dark UI, cyan/amber accents).

Pages

Route Description
/ Dashboard - Real-time bark probability, detection pipeline monitor, dog overview with last heard timestamps, persistent statistics
/dogs Dog Management - View registered dogs, bark counts, last heard indicators, bark modal with reassign/untag/delete actions
/fingerprints Fingerprints Explorer - Browse bark fingerprints with filtering, playback, and clustering analysis
/reports Reports - Bark activity reports and trend analysis
/settings Settings & Maintenance - Detection parameters, quiet hours, webhooks, fingerprint purge

Features

  • Real-time Updates - WebSocket streams for live bark events and audio levels
  • Type-safe API Client - Generated from OpenAPI schema using openapi-fetch
  • Svelte 5 Runes - Modern reactive state with $state, $derived, $effect
  • Responsive Design - Works on desktop and tablet
  • Evidence Playback - Listen to recorded bark clips directly in the browser
  • Bark Management Modal - View dog's barks with reassign, untag, and delete controls
  • Last Heard Indicators - Teal audio icon showing when each dog was last detected
  • Clustering UI - Visual bark clustering for pattern analysis and dog profile creation
  • Persistent Dashboard Stats - Bark counts survive page refreshes via API persistence
  • Toast Notifications - Non-blocking feedback replacing browser alerts
  • Active Navigation - Clear indication of current page with amber highlight
  • Accessibility - Targets WCAG AA text contrast, labeled form inputs, prefers-reduced-motion support

Production Serving

The SvelteKit frontend is built to static files and served directly by FastAPI. No separate Node.js server required in production.


Hardware Requirements

Minimum

  • Python 3.11+ with a working PyAudio/PortAudio installation
  • 2GB+ RAM (CLAP + YAMNet models need memory)
  • Any microphone (1+ channels; 2+ for DOA)

Recommended for DOA

  • ReSpeaker 2-Mic HAT (~$12) - HAT form factor, 2 mics
  • ReSpeaker 4-Mic Array (~$35) - 360° coverage, use array_type: uca

ReSpeaker HAT Setup

# Install seeed-voicecard driver
git clone https://github.com/respeaker/seeed-voicecard
cd seeed-voicecard
sudo ./install.sh
sudo reboot

Installation

Quick Start (Docker)

git clone https://github.com/machug/woofalytics-v2.git
cd woofalytics-v2
cp .env.example .env
docker-compose up -d

Manual Installation

# System dependencies (Debian/Ubuntu)
sudo apt-get update
sudo apt-get install -y \
    python3.11 python3.11-venv \
    portaudio19-dev libasound2-dev \
    alsa-utils nodejs npm

# Create virtual environment
python3.11 -m venv venv
source venv/bin/activate

# Install Python package
pip install -e .

# Build frontend
cd frontend
npm install
npm run build
cd ..

# Verify audio devices
woofalytics --list-devices

# Run
woofalytics

CLI Options

woofalytics [OPTIONS]

Options:
  -c, --config PATH       Config file (default: config.yaml)
  --host TEXT             Override host
  -p, --port INTEGER      Override port
  --reload                Enable hot reload (dev)
  --log-level LEVEL       Override log level
  --list-devices          List audio devices and exit
  --version               Show version

Docker Deployment

Dockerfile Features

  • Multi-stage build (builder + runtime)
  • Non-root user (woofalytics)
  • Audio libraries pre-installed
  • Health check included
  • Evidence volume for persistence

docker-compose.yml

services:
  woofalytics:
    build: .
    container_name: woofalytics
    ports:
      - "8000:8000"
    devices:
      - /dev/snd:/dev/snd    # Audio device access
    group_add:
      - audio                 # Audio group membership
    volumes:
      - ./config.yaml:/home/woofalytics/app/config.yaml:ro
      - ./evidence:/home/woofalytics/app/evidence
      - ./models:/home/woofalytics/app/models:ro
    environment:
      - TZ=Europe/London
      - WOOFALYTICS__WEBHOOK__IFTTT_KEY=${IFTTT_KEY:-}
      - WOOFALYTICS__LOG_LEVEL=${LOG_LEVEL:-INFO}
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 1G
        reservations:
          memory: 512M
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s

Commands

# Build and start
docker-compose up -d --build

# View logs
docker-compose logs -f

# Stop
docker-compose down

# Rebuild after code changes
docker-compose up -d --build --force-recreate

Development

Setup

# Install with dev dependencies
pip install -e ".[dev]"

# Install pre-commit hooks (optional)
pre-commit install

# Install frontend dependencies
cd frontend && npm install && cd ..

Running (Backend)

# With hot reload
woofalytics --reload --log-level DEBUG

# Or directly with uvicorn
uvicorn woofalytics.app:app --reload --host 0.0.0.0 --port 8000

Running (Frontend Development)

# Start the SvelteKit dev server (auto-proxies API calls to backend)
cd frontend
npm run dev

# Frontend available at http://localhost:5173
# Backend must be running on port 8000

Building Frontend for Production

cd frontend
npm run build    # Outputs to frontend/build/
npm run preview  # Preview production build locally

Code Quality

# Python linting
ruff check src/woofalytics

# Python type checking
mypy src/woofalytics

# Python format
ruff format src/woofalytics

# Frontend type checking
cd frontend && npm run check

Testing

Run Tests

# All tests
pytest

# With coverage
pytest --cov=woofalytics --cov-report=html

# Specific module
pytest tests/test_config.py -v

# With output
pytest -s

Test Structure

  • conftest.py - Shared fixtures (mock PyAudio, test settings, etc.)
  • test_api_routes.py - API endpoint tests
  • test_api_websocket.py - WebSocket tests
  • test_audio.py - Audio frame and device tests
  • test_config.py - Configuration validation
  • test_detection.py - DOA and bark event tests
  • test_evidence.py - Metadata and storage tests
  • test_export.py - Data export tests
  • test_fingerprint_clustering.py - Bark clustering tests
  • test_fingerprint_matching.py - Fingerprint matching tests
  • test_quiet_hours.py - Quiet hours scheduling tests
  • test_resample_cache.py - Resample cache tests
  • test_summary.py - Summary endpoint tests
  • test_yamnet.py - YAMNet gate tests

Mocking

Tests mock PyAudio to run without audio hardware:

@pytest.fixture
def mock_pyaudio():
    with patch("pyaudio.PyAudio") as mock:
        # Configure mock device list
        yield mock

Design Decisions

Why Pydantic v2 with BaseModel for Nested Configs?

Using BaseSettings for nested configs causes environment variable conflicts. The correct pattern:

  • BaseModel for nested configs (AudioConfig, ModelConfig, etc.)
  • BaseSettings only for root Settings class
  • Environment variables work with __ delimiter: WOOFALYTICS__AUDIO__SAMPLE_RATE

Why Async Audio Capture?

PyAudio is blocking, but FastAPI is async. Solution:

  • Run PyAudio in a background daemon thread
  • Use thread-safe ring buffer (deque with lock)
  • Async methods for control (start(), stop())
  • Sync methods for buffer access (called from any context)

Why Three DOA Algorithms?

Each has trade-offs:

  • Bartlett - Robust, works well with noise
  • Capon - Better resolution, more sensitive to calibration
  • MEM - Best for multiple sources, computationally heavier

Why CLAP Instead of Custom Models?

CLAP (Contrastive Language-Audio Pretraining) offers key advantages:

  • Zero-shot - No training data required, works immediately
  • Multi-label - Can detect bark AND check for speech/birds simultaneously
  • Veto system - Reduces false positives by rejecting similar sounds
  • Generalizes - Works across dog breeds without fine-tuning

The downside is slower inference (~500ms vs 80ms), which is why:

  • VAD gate fast-rejects silent audio before CLAP
  • High-confidence bypass (≥80%) enables instant detection
  • Detection cooldown prevents rapid-fire from same sound

Why Legacy MLP Mode?

For constrained hardware or faster inference, the legacy MLP model offers:

  • 80ms inference interval (12.5 inferences/second)
  • Smaller memory footprint
  • Less accurate but faster

Why JSON Sidecars for Evidence?

For documentation purposes, metadata must be:

  • Human-readable (JSON, not binary)
  • Separate from audio (can't be embedded in WAV easily)
  • Include precise timestamps, probabilities, device info
  • Machine-parseable for cataloging and fingerprinting

Known Issues & TODOs

Not Yet Implemented

  1. Evidence Cleanup - Automatic old file removal (manual purge available via API)
  2. Audio Spectrogram - Visual display in web UI

Potential Improvements

  1. Home Assistant Integration - MQTT or REST
  2. SMS/Push Notifications - Via Pushover/Twilio

Recently Implemented (v2.5.0)

  1. Webhook Notifications - Configurable webhooks for bark alerts
  2. Multi-Dog Fingerprinting - Identify individual dogs by bark signature
  3. Bark Pattern Analysis - Clustering UI for analyzing bark patterns
  4. Quiet Hours - Scheduled reduced sensitivity periods
  5. Fingerprint Purge - Remove orphaned fingerprints without audio evidence
  6. Notification Debouncing - Per-dog rate limiting via events/debouncer.py
  7. Prometheus Metrics - Prometheus-format metrics at /api/metrics
  8. API Authentication - Optional API key authentication
  9. Rate Limiting - Per-endpoint rate limiting
  10. Runtime Settings - Update settings via UI, persisted to config.yaml

Known Limitations

  1. ALSA Volume Control - Microphone volume adjustment (volume_percent) uses ALSA and is Linux-specific; detection works on any OS with PyAudio
  2. CPU Only - Inference is CPU-only (GPU not required)

Original Project

This is a fork/rewrite of the original woofalytics project. Key changes:

Aspect Original v2.5
Python 3.9+ 3.11+
Detection Custom MLP CLAP zero-shot (+ legacy MLP)
False Positives High Multi-layer veto system
Web Framework Basic HTTP FastAPI
Config Hardcoded Pydantic v2
Microphone Andrea only Any USB mic
Real-time Polling WebSocket
Evidence WAV only WAV + JSON metadata
Deployment Manual Docker
Tests None pytest suite

Versioning

Version is tracked in the VERSION file at the repository root. See CHANGELOG.md for release history.


License

MIT License - See original project for attribution.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Run tests: pytest
  4. Run linting: ruff check src/
  5. Submit a pull request

Quick Reference

# Start the server
woofalytics

# List audio devices
woofalytics --list-devices

# Run with debug logging
woofalytics --log-level DEBUG

# Docker
docker-compose up -d

# Run tests
pytest

# Check API docs
open http://localhost:8000/api/docs

About

AI Powered Woof Analytics!

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 60.0%
  • Svelte 26.7%
  • TypeScript 11.9%
  • CSS 1.2%
  • Dockerfile 0.2%
  • JavaScript 0.0%