Skip to content

Latest commit

ย 

History

History
145 lines (103 loc) ยท 4.06 KB

File metadata and controls

145 lines (103 loc) ยท 4.06 KB

ThreadFlow ๐Ÿงต

Visual AI Pipeline for Reddit Analysis - Drag-and-drop nodes to build investigation workflows that analyze social sentiment using Gemini AI.

ThreadFlow Demo

๐ŸŽฏ What It Does

ThreadFlow lets you visually investigate Reddit discussions by:

  • Drag & drop nodes to build analysis pipelines (n8n-style)
  • Filter data by score, keywords, or custom criteria
  • AI-powered analysis - sentiment, bot detection, evidence extraction, summarization
  • 3D visualizations - Canada map, political party breakdown, bar/pie charts

โšก Quick Setup (5 minutes)

Prerequisites

1๏ธโƒฃ Clone & Setup Backend

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies (includes DuckDB)
pip install -r requirements.txt

# Create .env file with your Gemini API key
echo "GEMINI_API_KEY=your_gemini_api_key_here" > .env

2๏ธโƒฃ Build the Database (DuckDB)

โš ๏ธ IMPORTANT: The app uses DuckDB to query Reddit data. You must build the database first:

# Still in backend folder, with venv activated
python ingest.py

This creates reddit_data.duckdb from the CSV files in the archive/ folder. Takes ~30 seconds.

Verify it worked:

python -c "import duckdb; db = duckdb.connect('reddit_data.duckdb'); print(f'Comments: {db.execute(\"SELECT COUNT(*) FROM comments\").fetchone()[0]}')"

3๏ธโƒฃ Start Backend Server

# Still in backend folder
uvicorn main:app --reload --port 8000

Backend runs at http://localhost:8000. Test: curl http://localhost:8000/health

4๏ธโƒฃ Setup & Start Frontend

# Open new terminal
cd frontend

# Install dependencies
npm install  # or: pnpm install

# Start dev server
npm run dev

Frontend runs at http://localhost:3000


๐ŸŽฎ How to Use

  1. Open http://localhost:3000
  2. Drag nodes from the sidebar onto the canvas
  3. Connect nodes by dragging from output (right) to input (left)
  4. Configure nodes - enter search queries, set filters, etc.
  5. Click "Run Pipeline" to execute
  6. View results in the nodes or click "View AI Visualization" for 3D charts

Node Types

Category Nodes Description
Source Reddit Source, Thread Loader Load data from Reddit
Filter Score Filter, Keyword Sieve Filter comments
AI Sentiment, Evidence, Bot Hunter, Summarizer Gemini-powered analysis
Viz Data Table, Canada Map, Political, Bar/Pie Charts Visualize results

๐Ÿ“ Project Structure

โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ main.py          # FastAPI server
โ”‚   โ”œโ”€โ”€ ai_analyzer.py   # Gemini AI integration
โ”‚   โ”œโ”€โ”€ ingest.py        # DuckDB database builder
โ”‚   โ”œโ”€โ”€ requirements.txt
โ”‚   โ””โ”€โ”€ .env             # GEMINI_API_KEY goes here
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ app/         # Next.js pages
โ”‚   โ”‚   โ”œโ”€โ”€ components/  # React components (nodes, canvas, viz)
โ”‚   โ”‚   โ””โ”€โ”€ lib/         # API client, pipeline logic
โ”‚   โ””โ”€โ”€ package.json
โ””โ”€โ”€ archive/             # Source CSV data
    โ”œโ”€โ”€ canada_subreddit_comments.csv
    โ””โ”€โ”€ canada_subreddit_threads.csv

๐Ÿ”ง Troubleshooting

Issue Solution
reddit_data.duckdb not found Run python ingest.py in backend folder
GEMINI_API_KEY not set Create .env file in backend with your key
Connection refused Make sure backend is running on port 8000
Rate limit errors Gemini has 15 req/min limit - the app handles this automatically

๐Ÿ›  Tech Stack

  • Frontend: Next.js 15, React Flow, Three.js, TailwindCSS
  • Backend: FastAPI, DuckDB, Google Gemini AI
  • Data: Reddit r/Canada subreddit (comments + threads)

๐Ÿ“œ License

MIT - Built for AI Collective Hackathon 2026