This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Soulcaster is a feedback triage and automated fix generation system. It ingests bug reports from Reddit, Sentry, GitHub issues, Splunk, DataDog, and PostHog, clusters similar feedback using embeddings, and triggers a coding agent to generate fixes and open PRs.
Flow: Feedback Sources → Clustered Issues → Dashboard Triage → Coding Agent → GitHub PR
Two main components sharing Upstash Redis:
- Backend (
/backend) - FastAPI service for ingestion, clustering, and agent orchestration - Dashboard (
/dashboard) - Next.js 15 (App Router) web UI for triage and management
- Backend: FastAPI, Pydantic, redis-py, Upstash REST, Gemini embeddings, E2B sandboxes
- Dashboard: Next.js, TypeScript, Tailwind, NextAuth (GitHub OAuth), Prisma (PostgreSQL)
- Storage: Upstash Redis + Upstash Vector for embeddings
- LLM: Gemini (
gemini-embedding-001for embeddings,gemini-2.5-flashfor summaries)
Quick Start:
just # Show all available commands
just install # Install all dependencies
just dev-backend # Run backend (localhost:8000)
just dev-dashboard # Run dashboard (localhost:3000)just dev-backend # Run with uv
just test-backend # Run all tests
just install-backend # Install dependencies
# Run specific test file
cd backend && uv run pytest tests/test_store.py -v
# Run single test by name
cd backend && uv run pytest -v -k "test_add_feedback_item"
# Manual commands
cd backend && uv sync
cd backend && uv run uvicorn main:app --reload --port 8000just dev-dashboard # Run dev server
just test-dashboard # Run all tests
just install-dashboard # Install dependencies
# Manual commands
cd dashboard && npm install
cd dashboard && npx prisma migrate dev # Setup/migrate database
cd dashboard && npx prisma generate # Regenerate Prisma client
cd dashboard && npm run dev
cd dashboard && npm run build
cd dashboard && npm run lint
cd dashboard && npm run type-checkIMPORTANT: Always run tests and checks before committing or pushing:
cd backend && uv run pytest --tb=line -q # Run all tests
python3 -m py_compile <file>.py # Check syntaxcd dashboard && npm run type-check # TypeScript type checking
cd dashboard && npm run build # Production build test
cd dashboard && npm run lint # ESLintNever commit or push without verifying:
- ✅ Tests pass (or document known failures)
- ✅ No syntax/import errors
- ✅ TypeScript type check passes
- ✅ Production build succeeds
Backend:
backend/main.py- FastAPI routes (all/ingest/*,/clusters,/feedback,/jobs,/cluster-jobs)backend/store.py- Redis/in-memory storage abstractionbackend/models.py- Pydantic models (FeedbackItem,IssueCluster,AgentJob,ClusterJob)backend/clustering_runner.py- Async clustering job runner with Redis locksbackend/clustering.py- Embedding generation and similarity calculationsbackend/vector_store.py- Upstash Vector wrapper for ANN searchbackend/limits.py- Free tier quota limits (1500 issues, 20 jobs per user)
Dashboard:
dashboard/lib/auth.ts- NextAuth configurationdashboard/lib/github.ts- GitHub API clientdashboard/app/api/clusters/*/route.ts- Cluster management endpointsdashboard/prisma/schema.prisma- Database schema (auth, projects)
feedback:{id} - Hash: feedback item data
feedback:created:{proj} - Sorted set: feedback IDs by timestamp for project
feedback:unclustered:{proj}- Set: IDs pending clustering
cluster:{id} - Hash: cluster data
cluster:items:{id} - Set: feedback IDs in cluster
clusters:all:{proj} - Set: all cluster IDs for project
job:{id} - Hash: agent job data
cluster_job:{id} - Hash: clustering job data
Uses in-memory batch clustering to avoid Upstash Vector eventual consistency issues:
- Generate embeddings via Gemini API for
title + body - Query Upstash Vector for existing similar items (read-only)
- In-memory clustering: compare batch items against each other + existing DB items
- If similarity ≥ 0.72 AND existing cluster → join that cluster
- If similar batch items → group into new cluster
- Batch upsert all items to Vector DB at once (single write)
- Persist clusters to Redis
- 1500 max feedback items per user (across all projects)
- 20 successful coding jobs per user
- Enforced in
backend/limits.py, checked at ingestion and job creation
Use a single .env file in the project root:
cp .env.example .env
# Required:
ENVIRONMENT=development
UPSTASH_REDIS_REST_URL=
UPSTASH_REDIS_REST_TOKEN=
UPSTASH_VECTOR_REST_URL=
UPSTASH_VECTOR_REST_TOKEN=
GEMINI_API_KEY=
GITHUB_ID= # GitHub OAuth client ID
GITHUB_SECRET= # GitHub OAuth client secret
E2B_API_KEY=
KILOCODE_TEMPLATE_NAME=kilo-sandbox-v-0-1-dev
BLOB_READ_WRITE_TOKEN=
NEXTAUTH_URL=http://localhost:3000
NEXTAUTH_SECRET= # Generate: openssl rand -base64 32
DATABASE_URL=postgresql://...
BACKEND_URL=http://localhost:8000Backend (:8000):
POST /ingest/reddit- Reddit postsPOST /ingest/sentry- Sentry webhooksPOST /ingest/splunk/webhook- Splunk alertsPOST /ingest/datadog/webhook- DataDog alertsPOST /ingest/posthog/webhook- PostHog eventsPOST /ingest/manual- Manual feedbackPOST /ingest/github/sync- Sync GitHub issues for projectGET /feedback- List feedback (?project_id=,?source=,?limit=)GET /clusters,GET /clusters/{id}- List/detail clustersPOST /clusters/{id}/start_fix- Trigger fix generationPOST /cluster-jobs- Trigger backend clustering jobGET /cluster-jobs/{id}- Get clustering job statusPOST /jobs,GET /jobs/{id},PATCH /jobs/{id}- Agent job management
Dashboard (:3000/api):
POST /api/clusters/cleanup- Merge duplicate clustersGET /api/clusters/jobs- List clustering jobsPOST /api/ingest/github/sync- Trigger GitHub syncGET /api/feedback- Proxy to backend
- Users sign in via GitHub OAuth (required)
- Access token stored in encrypted NextAuth session
- PRs created from user's account (not a bot)
- Scopes:
repo,read:user
just dev-reset # Reset dev data (with confirmation)
just dev-reset-force # Reset without confirmationOnly works when ENVIRONMENT=development to prevent accidental prod data loss.