OwnYou Consumer Application

A comprehensive email parsing and consumer intelligence system that analyzes Gmail and Outlook emails using multiple LLM providers to build detailed IAB Taxonomy consumer profiles. The system provides advanced analytics including demographic classification, interest profiling, purchase intent prediction, and household analysis.

Overview

The OwnYou Consumer Application is a privacy-first email analysis system that:

Downloads emails from Gmail and Outlook via OAuth2
Processes emails using multiple LLM providers (OpenAI, Claude, Google Gemini, Ollama)
Classifies users according to IAB Audience Taxonomy 1.1
Builds consumer profiles with demographics, interests, purchase intent, and household data
Provides analytics dashboard for visualizing consumer insights
Maintains privacy through local processing and encrypted storage

Key Features

Multi-Provider Email Support: Gmail and Outlook integration with OAuth2
Multi-LLM Processing: OpenAI GPT-5, Claude Sonnet-4, Google Gemini, Ollama (local)
Batch Processing: Intelligent batching for 20-30x faster processing
IAB Taxonomy Mapping: 1,600+ categories across demographics, interests, and purchase intent
Visual Dashboard: React/Next.js frontend with real-time analytics
LangGraph Workflow: Agentic workflow with evidence validation
Privacy-First: No cloud storage, local SQLite persistence

Architecture

System Components

┌─────────────────────────────────────────────────────────────┐
│                      Frontend (Next.js)                      │
│  - Dashboard UI (React + Tailwind CSS)                       │
│  - Classification Viewer                                     │
│  - Analytics & Visualizations (Recharts)                     │
│  - Real-time Updates                                         │
└───────────────────────────┬─────────────────────────────────┘
                            │ HTTP/REST API
┌───────────────────────────▼─────────────────────────────────┐
│                    Backend (Flask API)                       │
│  - Authentication & Session Management                       │
│  - Profile & Analytics Endpoints                             │
│  - Evidence Retrieval                                        │
│  - Model Selection & Analysis Triggers                       │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│               Email Processing Pipeline                      │
│                                                              │
│  Step 1: Email Download (OAuth2)                            │
│  ├─ Gmail Provider                                          │
│  └─ Outlook Provider                                        │
│                                                              │
│  Step 2: Email Summarization (EMAIL_MODEL)                  │
│  └─ Fast LLM processing to extract key information          │
│                                                              │
│  Step 3: IAB Classification (TAXONOMY_MODEL)                │
│  ├─ LangGraph Agentic Workflow                             │
│  ├─ Batch Optimizer (10-20 emails per batch)               │
│  ├─ Category-Specific Agents                               │
│  ├─ Evidence Judge (LLM-as-Judge validation)               │
│  └─ Memory Manager (LangMem + SQLite)                      │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                   Data Persistence                           │
│  - SQLite Database (LangMem storage)                        │
│  - User Profiles (JSON exports)                             │
│  - Email Summaries (CSV)                                    │
│  - Classification History                                   │
└─────────────────────────────────────────────────────────────┘

Processing Pipeline

Three-Stage Independent Pipeline:

Email Download → Raw emails CSV
Email Summarization → Summaries CSV (with EMAIL_MODEL)
IAB Classification → User profile JSON (with TAXONOMY_MODEL)

Each stage can be run independently, allowing for:

Resilience (re-run failed steps)
Iteration (test different models)
Cost savings (skip expensive LLM calls)

Batch Processing

The IAB Classification stage uses intelligent batching:

Dynamically calculates batch size based on model context window
Processes 10-20 emails per LLM call
20-30x faster than single-email processing
Evidence validation for each classification

Prerequisites

System Requirements

Python: 3.8 or higher
Node.js: 18.x or higher
npm: 9.x or higher
Operating System: macOS, Linux, or Windows

Required Accounts

LLM Provider (choose at least one):
- OpenAI API key (recommended)
- Anthropic API key (Claude)
- Google AI API key (Gemini)
- Local Ollama (no key required)
Email Providers (choose at least one):
- Gmail: Google Cloud project with Gmail API enabled
- Outlook: Microsoft Azure app registration

Installation

1. Clone the Repository

cd /path/to/your/workspace
git clone <repository-url>
cd ownyou_consumer_application

2. Backend Setup

Install Python Dependencies

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Optional: Install development dependencies
pip install -e ".[dev]"

Verify Installation

python -m src.email_parser.main --version

3. Frontend Setup

cd dashboard/frontend

# Install dependencies
npm install

# Verify installation
npm run build

Configuration

1. Environment Variables

Create a .env file in the project root:

cp .env.example .env  # If example exists, or create manually

Minimal .env Configuration:

# =============================================================================
# LLM Provider Configuration
# =============================================================================

# Primary provider: openai, claude, google, or ollama
LLM_PROVIDER=openai

# OpenAI Configuration (Recommended)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini
OPENAI_TEMPERATURE=1.0

# Claude (Anthropic) Configuration (Optional)
ANTHROPIC_API_KEY=your_anthropic_api_key_here
ANTHROPIC_MODEL=claude-sonnet-4-20250514

# Google Gemini Configuration (Optional)
GOOGLE_API_KEY=your_google_api_key_here

# Ollama Configuration (Local, Optional)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=deepseek-r1:70b

# =============================================================================
# Stage-Specific Model Configuration
# =============================================================================

# Format: provider:model
EMAIL_MODEL=openai:gpt-4o-mini        # Fast model for email summarization
TAXONOMY_MODEL=openai:gpt-4o          # Accurate model for classification

# =============================================================================
# Memory Backend Configuration
# =============================================================================

MEMORY_BACKEND=sqlite
MEMORY_DATABASE_PATH=/Volumes/T7_new/developer_old/ownyou_consumer_application/data/email_parser_memory.db

# =============================================================================
# LangGraph Studio Configuration (Optional)
# =============================================================================

# LangSmith Project Deep-Linking
# These enable direct links to your specific LangSmith project from the dashboard
# Find these values in your LangSmith project URL:
# https://smith.langchain.com/o/{ORG_ID}/projects/p/{PROJECT_ID}

LANGSMITH_ORG_ID=your_organization_id_here
LANGSMITH_PROJECT_ID=your_project_id_here

# =============================================================================
# Email Provider Configuration
# =============================================================================

# Gmail Configuration
GMAIL_CREDENTIALS_FILE=credentials.json
GMAIL_TOKEN_FILE=token.json

# Microsoft Graph (Outlook) Configuration
MICROSOFT_CLIENT_ID=your_client_id_here
MICROSOFT_CLIENT_SECRET=your_client_secret_here
MICROSOFT_TENANT_ID=common
MICROSOFT_TOKEN_FILE=ms_token.json

# =============================================================================
# Processing Configuration
# =============================================================================

MAX_EMAILS=500
BATCH_SIZE=50
LOG_LEVEL=INFO

2. Email Provider Setup

Gmail Setup

# Interactive setup wizard
python -m src.email_parser.main setup gmail

# Manual setup:
# 1. Create Google Cloud project
# 2. Enable Gmail API
# 3. Create OAuth 2.0 credentials
# 4. Download credentials as credentials.json
# 5. Place in project root

Outlook Setup

# Interactive setup wizard
python -m src.email_parser.main setup outlook

# Manual setup:
# 1. Register app in Azure Portal
# 2. Add Microsoft Graph Mail.Read permission
# 3. Copy Client ID and Client Secret to .env

3. Verify Configuration

# Check setup status
python -m src.email_parser.main setup status

# Test database connection
python -m src.email_parser.main --test-db

Starting the Application

Quick Start (Recommended)

Use the provided start script that handles cleanup and startup:

# From project root
./start_app.sh

Manual Start (Step-by-Step)

Step 1: Kill Any Running Instances

# Kill backend (Flask)
lsof -ti:5001 | xargs kill -9 2>/dev/null || true

# Kill frontend (Next.js)
lsof -ti:3000 | xargs kill -9 2>/dev/null || true

# Verify ports are free
lsof -i:5001
lsof -i:3000

Step 2: Start Backend Server

# Option A: Using Python module
cd /path/to/ownyou_consumer_application
python3 dashboard/backend/run.py

# Option B: Using Flask directly
cd dashboard/backend
python3 -m flask run --host=0.0.0.0 --port=5001

# Backend will start on: http://localhost:5001

Expected output:

 * Serving Flask app 'app'
 * Debug mode: on
INFO:werkzeug:WARNING: This is a development server.
 * Running on http://0.0.0.0:5001

Step 3: Configure Frontend Environment

IMPORTANT: Before starting the frontend, verify the API URL configuration to avoid CORS issues.

# Navigate to frontend directory
cd /path/to/ownyou_consumer_application/dashboard/frontend

# Check .env.local file exists
cat .env.local

The .env.local file MUST have an empty NEXT_PUBLIC_API_URL to use the Next.js API proxy:

# Backend API URL
# IMPORTANT: Leave empty to use Next.js proxy (avoids CORS issues)
# This routes requests through /api which handles session cookies properly
NEXT_PUBLIC_API_URL=

DO NOT set it to http://localhost:5001 - this bypasses the proxy and causes CORS errors.

Step 4: Start Frontend

# Start development server (already in dashboard/frontend)
npm run dev

# Frontend will start on: http://localhost:3000

Expected output:

   ▲ Next.js 14.2.0
   - Local:        http://localhost:3000
   - Network:      http://192.168.1.x:3000

 ✓ Ready in 2.3s

Step 5: Open Application

Open your browser and navigate to:

http://localhost:3000

Stopping the Application

Option 1: Graceful Shutdown

Press Ctrl+C in each terminal window running backend/frontend.

Option 2: Force Kill

# Kill all processes on ports
lsof -ti:5001 | xargs kill -9
lsof -ti:3000 | xargs kill -9

# Or kill by process name
pkill -f "flask run"
pkill -f "next dev"

Production Start

Note: For development, use the Quick Start method above. Production mode is for deployment only.

Prerequisites:

# Ensure you're in the virtual environment
source venv/bin/activate  # or .venv_dashboard/bin/activate

# Install production dependencies
pip install -r requirements.txt  # Includes gunicorn

Step 1: Build Frontend

cd dashboard/frontend
npm run build

# Verify build succeeded (should create .next directory)
ls -la .next/

Step 2: Start Backend (Terminal 1)

# From project root
cd /path/to/ownyou_consumer_application

# Activate virtual environment
source venv/bin/activate  # or .venv_dashboard/bin/activate

# Start with gunicorn using wsgi.py entry point
gunicorn -w 4 -b 0.0.0.0:5001 wsgi:app

# Backend will run on http://localhost:5001
# Press Ctrl+C to stop

Step 3: Start Frontend (Terminal 2)

cd /path/to/ownyou_consumer_application/dashboard/frontend

# Start production frontendcd 
npm start

# Frontend will run on http://localhost:3000
# Press Ctrl+C to stop

Production Notes:

The wsgi.py file in the project root is the production entry point
For background processes, use process managers:
- PM2 (Node.js): pm2 start npm --name "frontend" -- start
- systemd (Linux): Create service files for both backend/frontend
- supervisor: Alternative process manager
Set FLASK_ENV=production in .env for production mode
Use nginx or Apache as a reverse proxy in front of gunicorn
Set up proper logging and monitoring

Alternative: Background Processes

# Start backend in background
nohup gunicorn -w 4 -b 0.0.0.0:5001 wsgi:app > backend.log 2>&1 &

# Start frontend in background
cd dashboard/frontend
nohup npm start > ../../frontend.log 2>&1 &

# View logs
tail -f backend.log
tail -f frontend.log

# Stop processes
pkill -f gunicorn
pkill -f "next start"

How It Works

Workflow Overview

User Authentication: OAuth2 flow for Gmail/Outlook
Email Download: Fetch emails via provider APIs
Email Summarization: Extract key information with LLM
IAB Classification: Multi-agent workflow classifies emails
Profile Building: Aggregate classifications into user profile
Dashboard Display: Visualize insights in real-time

Detailed Processing Flow

1. Email Download

# Command
python -m src.email_parser.main --provider gmail --max-emails 100

# What happens:
# - OAuth2 authentication
# - API calls to Gmail/Outlook
# - Download emails (subject, body, metadata)
# - Save to CSV: data/emails_raw_<timestamp>.csv

2. Email Summarization

# Triggered automatically or manually
python -m src.email_parser.main --summarize emails_raw.csv

# What happens:
# - Load raw emails
# - Call EMAIL_MODEL (fast, cheap model)
# - Extract: sender, category, key topics, intent
# - Save to CSV: data/emails_summarized_<timestamp>.csv

3. IAB Classification (The Magic)

# Start classification
python -m src.email_parser.main --classify emails_summarized.csv

# What happens:
# 1. Load summarized emails
# 2. Retrieve existing user profile from LangMem
# 3. Batch optimizer groups emails (10-20 per batch)
# 4. For each batch:
#    a. Demographics agent analyzes (age, gender, education)
#    b. Household agent analyzes (size, income, location)
#    c. Interests agent analyzes (hobbies, preferences)
#    d. Purchase intent agent analyzes (shopping behavior)
# 5. Evidence judge validates each classification
# 6. Update LangMem semantic memory
# 7. Save profile JSON: data/profile_<user>_<timestamp>.json

Batch Processing Example:

Input: 100 emails
Context Window: 128,000 tokens
Batch Size: 15 emails

Process:
├─ Batch 1 (emails 1-15)  → 42 classifications
├─ Batch 2 (emails 16-30) → 38 classifications
├─ Batch 3 (emails 31-45) → 51 classifications
└─ ... (7 batches total)

Result: Profile with 287 validated classifications
Time: ~6 minutes (vs 3 hours single-email)

4. Profile Structure

{
  "schema_version": "2.0",
  "user_id": "nick",
  "generated_at": "2025-01-28T12:34:56Z",

  "demographics": {
    "age": {
      "primary": {
        "taxonomy_id": 12,
        "value": "35-44",
        "confidence": 0.92,
        "evidence_count": 15
      }
    },
    "gender": {
      "primary": {
        "taxonomy_id": 59,
        "value": "Male",
        "confidence": 0.88,
        "evidence_count": 23
      }
    }
  },

  "interests": [
    {
      "taxonomy_id": 342,
      "category": "Technology & Computing",
      "subcategory": "Software Development",
      "confidence": 0.95,
      "evidence_count": 47,
      "evidence": [
        "GitHub notifications about pull requests",
        "Stack Overflow digest emails"
      ]
    }
  ],

  "purchase_intent": [
    {
      "taxonomy_id": 1234,
      "category": "Consumer Electronics",
      "subcategory": "Laptops",
      "confidence": 0.78,
      "evidence_count": 5,
      "purchase_intent_flag": true
    }
  ]
}

LangGraph Workflow

The classification uses a sophisticated LangGraph workflow:

                  ┌──────────────┐
                  │ Load Emails  │
                  └──────┬───────┘
                         │
                  ┌──────▼────────┐
                  │ Retrieve      │
                  │ Profile       │
                  └──────┬────────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
    ┌────▼────┐    ┌────▼────┐    ┌────▼────┐
    │Demo     │    │House    │    │Interest │
    │Agent    │    │Agent    │    │Agent    │
    └────┬────┘    └────┬────┘    └────┬────┘
         │               │               │
         └───────────────┼───────────────┘
                         │
                  ┌──────▼────────┐
                  │ Evidence      │
                  │ Judge         │
                  └──────┬────────┘
                         │
                  ┌──────▼────────┐
                  │ Reconcile     │
                  │ Results       │
                  └──────┬────────┘
                         │
                  ┌──────▼────────┐
                  │ Update        │
                  │ Memory        │
                  └───────────────┘

Visual Debugging with LangGraph Studio:

# Start LangGraph Studio
langgraph dev

# Open in browser
http://127.0.0.1:2024

# Features:
# - Visual workflow graph
# - State inspection at each node
# - Time-travel debugging
# - Replay past executions

Usage Examples

1. Quick Analysis (Recommended for First Time)

# Download and analyze 50 emails from Gmail
python -m src.email_parser.main --pull 50 --model openai

This single command:

Downloads 50 emails
Summarizes them
Classifies into IAB taxonomy
Saves profile to data/

2. Multi-Provider Analysis

# Analyze emails from both Gmail and Outlook
python -m src.email_parser.main --provider gmail outlook --max-emails 100

3. Step-by-Step Processing

# Step 1: Download only
python -m src.email_parser.main --provider gmail --max-emails 200 --download-only

# Step 2: Summarize
python -m src.email_parser.main --summarize data/emails_raw_20250128.csv

# Step 3: Classify (use different model)
python -m src.email_parser.main --classify data/emails_summarized_20250128.csv --model claude

4. Dashboard-Driven Workflow

Start backend and frontend (see Starting the Application)
Navigate to http://localhost:3000
Click "New Analysis"
Select provider (Gmail/Outlook)
Choose models:
- Email Model: Fast/cheap (gpt-4o-mini)
- Taxonomy Model: Accurate (gpt-4o or claude-sonnet-4)
Set email count (50-500)
Click "Start Analysis"
Monitor progress in real-time
View results in Classifications tab

5. Using Different LLM Providers

# OpenAI (Recommended - fastest, cost-effective)
python -m src.email_parser.main --pull 100 --model openai

# Claude (Best quality, more expensive)
python -m src.email_parser.main --pull 100 --model claude

# Google Gemini (Good balance)
python -m src.email_parser.main --pull 100 --model google

# Ollama (Local, free, slower)
python -m src.email_parser.main --pull 100 --model ollama

6. Stage-Specific Models

# Use cheap model for summarization, premium for classification
EMAIL_MODEL=openai:gpt-4o-mini \
TAXONOMY_MODEL=claude:claude-sonnet-4 \
python -m src.email_parser.main --pull 100

Development

Project Structure

ownyou_consumer_application/
├── src/
│   └── email_parser/
│       ├── main.py                      # CLI entry point
│       ├── providers/                   # Email providers
│       │   ├── gmail_provider.py
│       │   └── outlook_provider.py
│       ├── llm_clients/                 # LLM integrations
│       │   ├── openai_client.py
│       │   ├── claude_client.py
│       │   └── google_client.py
│       ├── workflow/                    # LangGraph workflow
│       │   ├── graph.py                 # Workflow definition
│       │   ├── nodes/                   # Workflow nodes
│       │   │   ├── analyzers.py         # Agent nodes
│       │   │   ├── reconcile.py         # Reconciliation
│       │   │   └── update_memory.py     # Memory updates
│       │   ├── batch_optimizer.py       # Batching logic
│       │   └── state.py                 # Workflow state
│       ├── memory/                      # LangMem integration
│       │   └── manager.py
│       ├── analysis/                    # Legacy analyzers
│       ├── models/                      # Pydantic models
│       └── utils/                       # Utilities
├── dashboard/
│   ├── backend/                         # Flask API
│   │   ├── app.py                       # Flask app
│   │   ├── api/                         # API endpoints
│   │   │   ├── analyze.py               # Analysis triggers
│   │   │   ├── profile.py               # Profile retrieval
│   │   │   └── evidence.py              # Evidence endpoints
│   │   └── db/
│   │       └── queries.py               # Database queries
│   └── frontend/                        # Next.js app
│       ├── app/                         # App router
│       ├── components/                  # React components
│       └── lib/                         # Utilities
├── data/                                # Data directory
│   ├── email_parser_memory.db           # SQLite database
│   └── profile_*.json                   # Profile exports
├── logs/                                # Log files
├── tests/                               # Test suite
├── .env                                 # Configuration
├── requirements.txt                     # Python deps
└── README.md                            # This file

Running Tests

# Run all tests
pytest

# Run specific test suite
pytest tests/unit/
pytest tests/integration/

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test
pytest tests/unit/test_batch_optimizer.py::test_calculate_batch_size

Code Quality

# Format code
black src/ tests/

# Lint
flake8 src/

# Type checking
mypy src/

# Sort imports
isort src/ tests/

Debugging

Enable Debug Logging:

# In .env
LOG_LEVEL=DEBUG

# Or via command line
python -m src.email_parser.main --pull 50 --debug

LangGraph Studio Debugging:

LangGraph Studio provides visual workflow debugging and real-time state inspection.

Option 1: Auto-Start via Dashboard (Recommended)

The dashboard can automatically start Studio when you enable visualization:

Navigate to http://localhost:3000/analyze
Check "Enable LangGraph Studio visualization" checkbox
Studio server automatically starts on port 2024
During analysis, click "View workflow in LangGraph Studio →"
Studio UI opens with direct link to your project

Option 2: Manual Start via CLI

# Start Studio manually
langgraph dev

# Set debug mode (optional)
export LANGGRAPH_STUDIO_DEBUG=true

# Run workflow
python -m src.email_parser.main --pull 10

# View in Studio at http://127.0.0.1:2024

Features:

Visual workflow graph with node inspection
Time-travel debugging (replay past executions)
State inspection at each workflow step
Real-time execution monitoring
Evidence trail visualization

Database Inspection

# Open SQLite database
sqlite3 data/email_parser_memory.db

# View tables
.tables

# Query memories
SELECT * FROM memories WHERE namespace LIKE '%nick%' LIMIT 10;

# Count classifications
SELECT COUNT(*) FROM memories WHERE key LIKE 'semantic_%';

Troubleshooting

CORS Errors (Access-Control-Allow-Origin)

Problem: Browser console shows CORS errors like:

Access to fetch at 'http://localhost:5001/api/...' from origin 'http://localhost:3000'
has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present

Root Cause: The frontend is making requests directly to Flask instead of using the Next.js API proxy.

Solution:

Check frontend environment configuration:
```
cd dashboard/frontend
cat .env.local
```

Ensure NEXT_PUBLIC_API_URL is empty:

# Correct configuration (empty = use Next.js proxy)
NEXT_PUBLIC_API_URL=

# WRONG - causes CORS errors
# NEXT_PUBLIC_API_URL=http://localhost:5001

Restart frontend to pick up changes:

# Kill frontend
lsof -ti:3000 | xargs kill -9

# Restart
cd dashboard/frontend
npm run dev

Verify the fix:
- Open browser DevTools (F12)
- Go to Network tab
- Refresh page
- API requests should go to /api/... (not http://localhost:5001/api/...)

Why This Works:

Empty NEXT_PUBLIC_API_URL makes requests use relative paths (/api/...)
Next.js API proxy (app/api/[...path]/route.ts) forwards requests to Flask
Proxy handles CORS headers and session cookies automatically
No cross-origin requests = no CORS issues

Prevention: Never set NEXT_PUBLIC_API_URL=http://localhost:5001 in development.

Port Already in Use

Problem: Address already in use error

Solution:

# Kill processes on ports
lsof -ti:5001 | xargs kill -9  # Backend
lsof -ti:3000 | xargs kill -9  # Frontend

# Verify ports are free
lsof -i:5001
lsof -i:3000

OAuth2 Authentication Fails

Problem: Gmail/Outlook authentication fails

Solution:

# Re-run setup wizard
python -m src.email_parser.main setup gmail

# Delete cached tokens
rm token.json          # Gmail
rm ms_token.json       # Outlook

# Check credentials file exists
ls -la credentials.json

Database Locked Error

Problem: database is locked

Solution:

# Stop all processes
pkill -f "python.*email_parser"

# Check for locks
fuser data/email_parser_memory.db

# Remove lock file if exists
rm data/email_parser_memory.db-shm
rm data/email_parser_memory.db-wal

LLM API Errors

Problem: Rate limit exceeded or Invalid API key

Solution:

# Check API key in .env
cat .env | grep API_KEY

# Test API connection
python -c "
from openai import OpenAI
client = OpenAI()
print(client.models.list())
"

# Use different provider
python -m src.email_parser.main --pull 50 --model claude

Frontend Build Fails

Problem: npm run build fails

Solution:

# Clear cache
rm -rf dashboard/frontend/.next
rm -rf dashboard/frontend/node_modules

# Reinstall
cd dashboard/frontend
npm install

# Rebuild
npm run build

No Emails Downloaded

Problem: Email download returns 0 emails

Solution:

# Check authentication
python -m src.email_parser.main setup status

# Test provider connection
python -m src.email_parser.main --provider gmail --max-emails 1 --debug

# Verify email access
# - Gmail: Check Gmail API is enabled in Google Cloud Console
# - Outlook: Check Mail.Read permission in Azure Portal

Memory/Performance Issues

Problem: High memory usage or slow processing

Solution:

# Reduce batch size
export BATCH_SIZE=25

# Use faster model for EMAIL_MODEL
export EMAIL_MODEL=openai:gpt-4o-mini

# Process fewer emails
python -m src.email_parser.main --pull 50 --model openai

# Clear old data
rm data/emails_*.csv
rm data/profile_*.json

Additional Resources

CLAUDE.md: Development guidelines and architecture details
docs/: Comprehensive documentation
- requirements/: Feature specifications
- reference/: Technical references
- STUDIO_QUICKSTART.md: LangGraph Studio guide
tests/: Test suite with examples
_archive/: Historical documentation

Support

For issues, questions, or contributions:

Check Troubleshooting section
Review existing issues in repository
Create new issue with:
- Error message
- Steps to reproduce
- Environment details (OS, Python version)
- Relevant logs from logs/ directory

License

[Specify your license here]

Acknowledgments

IAB Tech Lab for Audience Taxonomy 1.1
LangChain/LangGraph for workflow orchestration
OpenAI, Anthropic, Google for LLM APIs
Email provider APIs (Gmail, Microsoft Graph)

Name		Name	Last commit message	Last commit date
Latest commit History 378 Commits
.claude		.claude
.gemini-clipboard		.gemini-clipboard
.github		.github
.review-output		.review-output
.specstory		.specstory
Figma_assets		Figma_assets
apps/consumer		apps/consumer
browser-extension		browser-extension
ceramic-research		ceramic-research
consumer_profiles		consumer_profiles
docs		docs
edge-functions		edge-functions
google-cloud-sdk		google-cloud-sdk
ikigai_insights		ikigai_insights
infrastructure		infrastructure
marketing_insights		marketing_insights
packages		packages
research_spike		research_spike
scripts		scripts
test-screenshots		test-screenshots
.cursorindexingignore		.cursorindexingignore
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierrc		.prettierrc
AUTH_TEST_SUMMARY.md		AUTH_TEST_SUMMARY.md
Ad Product Taxonomy 2.0 (1).tsv		Ad Product Taxonomy 2.0 (1).tsv
CLAUDE.md		CLAUDE.md
MANIFEST.in		MANIFEST.in
PRODUCTION_SETUP.md		PRODUCTION_SETUP.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
TRIGGER_ENGINE_TEST_PROCEDURE.md		TRIGGER_ENGINE_TEST_PROCEDURE.md
TRIGGER_ENGINE_TEST_REPORT.md		TRIGGER_ENGINE_TEST_REPORT.md
TRIGGER_ENGINE_TEST_SUMMARY.md		TRIGGER_ENGINE_TEST_SUMMARY.md
email_parser_workspace.code-workspace		email_parser_workspace.code-workspace
google-cloud-cli-darwin-arm.tar.gz		google-cloud-cli-darwin-arm.tar.gz
langgraph.json		langgraph.json
package.json		package.json
pnpm-workspace.yaml		pnpm-workspace.yaml
pyproject.toml		pyproject.toml
test-auth-detailed.js		test-auth-detailed.js
test-auth-persistence.mjs		test-auth-persistence.mjs
test-auth-workflow.js		test-auth-workflow.js
token.pickle		token.pickle
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
turbo.json		turbo.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

OwnYou Consumer Application

Table of Contents

Overview

Key Features

Architecture

System Components

Processing Pipeline

Batch Processing

Prerequisites

System Requirements

Required Accounts

Installation

1. Clone the Repository

2. Backend Setup

Install Python Dependencies

Verify Installation

3. Frontend Setup

Configuration

1. Environment Variables

2. Email Provider Setup

Gmail Setup

Outlook Setup

3. Verify Configuration

Starting the Application

Quick Start (Recommended)

Manual Start (Step-by-Step)

Step 1: Kill Any Running Instances

Step 2: Start Backend Server

Step 3: Configure Frontend Environment

Step 4: Start Frontend

Step 5: Open Application

Stopping the Application

Option 1: Graceful Shutdown

Option 2: Force Kill

Production Start

How It Works

Workflow Overview

Detailed Processing Flow

1. Email Download

2. Email Summarization

3. IAB Classification (The Magic)

4. Profile Structure

LangGraph Workflow

Usage Examples

1. Quick Analysis (Recommended for First Time)

2. Multi-Provider Analysis

3. Step-by-Step Processing

4. Dashboard-Driven Workflow

5. Using Different LLM Providers

6. Stage-Specific Models

Development

Project Structure

Running Tests

Code Quality

Debugging

Database Inspection

Troubleshooting

CORS Errors (Access-Control-Allow-Origin)

Port Already in Use

OAuth2 Authentication Fails

Database Locked Error

LLM API Errors

Frontend Build Fails

No Emails Downloaded

Memory/Performance Issues

Additional Resources

Support

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages