Skip to content

Esashiero/reddit2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Reddit AI Curator

Reddit AI Curator is an advanced, AI-powered information retrieval system designed to find high-quality, relevant Reddit discussions. It combines professional Boolean search logic with Large Language Model (LLM) analysis to sift through thousands of posts and deliver the most impactful results.


🚀 Key Features

🏆 Multi-Query Tournament

Doesn't just run one search. It generates multiple query variations (Broad, Specific, Narrative, Jargon) and runs a "tournament" on a sample size to see which one performs best before committing to a full search.

🌊 Smart Search Cascade

If it doesn't find enough high-quality posts, it automatically triggers a tiered fallback system:

  1. Sort Variation: Retries with relevance, top, hot, and comments sorts.
  2. Variant Fallback: Uses the runner-up queries from the tournament.
  3. Expansion: Increases fetch limits to 500 posts per sub and expands the time filter.
  4. Adaptive Scoring: Intelligently relaxes the quality threshold (from 80 to 70/60) if the quota is still unmet.

🧠 Intent-Based Search (New!)

An advanced search mode that understands user intent through interactive clarification:

  • Semantic Decomposition: Breaks requests into core requirements, bonus criteria, and preferences
  • Interactive Clarification: AI asks targeted questions to resolve ambiguities
  • Multi-Stage Scoring: 5-stage scoring algorithm (Disqualifiers → Core → Base → Bonus → Preference)
  • Parallel Execution: Runs alongside standard keyword search

📚 Continual Learning System

  • Tag Extraction: Automatically extracts semantic tags from high-scoring results to learn the "vocabulary" of successful matches.
  • Favorites: Save your favorite posts to train the AI. It will use your favorites to prioritize themes and keywords in future searches.
  • Auto-Blacklist: Automatically blacklists posts scoring >85 to ensure every new search provides fresh content.

🛠️ Setup & Installation

  1. Python 3.12+
  2. Install Dependencies:
    pip install praw mistralai google-generativeai flask python-dotenv PyJWT
  3. Configure Environment: Create a .env file with your credentials:
    REDDIT_CLIENT_ID=your_id
    REDDIT_CLIENT_SECRET=your_secret
    REDDIT_USER_AGENT=your_agent
    MISTRAL_API_KEY=your_key  # or GOOGLE_API_KEY
    JWT_SECRET_KEY=your_jwt_secret
    JWT_ALGORITHM=HS256
    JWT_EXPIRATION_HOURS=24

🔐 JWT Authentication

The V2 API requires JWT authentication for all protected endpoints. Generate a token using the auth endpoints:

# Get access token (POST /api/v2/auth/token)
curl -X POST http://localhost:5000/api/v2/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "your_password"}'

# Use token in requests
curl http://localhost:5000/api/v2/search \
  -H "Authorization: Bearer <your_jwt_token>"

🖥️ Usage

CLI Commands

1. AI-Powered Curation (Best Results) Generates queries, runs a tournament, and performs the search cascade automatically.

python app.py curate --description "First person stories of car accidents in heavy rain" --target_posts 10

2. Direct Boolean Search

python app.py search --keywords "car accident AND (rain OR storm)" --criteria "High detail stories only"

3. Discover Subreddits Finds subreddits that are most likely to contain the content you are looking for.

python app.py discover --keywords "adventure travel and hiking"

Optional Flags:

  • --exhaustive: Try all sorts and variants for maximum recall.
  • --no-fallback: Disable the automatic cascade.
  • --json: Output raw data only.

🌐 Web Interface

Launch the interactive dashboard to run searches, view live results, manage subreddits, and browse your favorites.

python app.py
# Open http://localhost:5000 in your browser

🏗️ Architecture & DI Container

Reddit AI Curator uses a Dependency Injection (DI) Container for service management:

Dependency Injection Container

The DI container (app/core/container.py) manages all service dependencies:

from app.core.container import container

# Get services from container
llm_provider = container.llm_provider
search_engine = container.search_engine
reddit_engine = container.reddit_engine

Service Registration

Services are registered in app/core/service_registration.py:

Service Interface Description
llm_provider LLMProvider LLM interface (Mistral, Gemini, or Mock)
reddit_engine RedditSearchEngine Reddit API client
search_engine SearchEngine Main search orchestration

V2 API Endpoints

All V2 API endpoints require JWT authentication:

Endpoint Method Description
/api/v2/auth/token POST Get JWT access token
/api/v2/search POST Execute standard AI-powered search
/api/v2/search/intent/analyze POST Analyze intent & get clarification questions
/api/v2/search/intent/clarify POST Submit clarification answers
/api/v2/search/intent/execute POST Execute search with finalized intent
/api/v2/search/intent/quick POST One-shot intent search (no clarification)
/api/v2/llm/generate-queries POST Generate query variants
/api/v2/llm/score POST Score a post with LLM
/api/v2/health GET Health check (no auth required)

Testing with MockLLMProvider

Use MockLLMProvider for testing without external API calls:

from app.core.container import container

# Replace with mock for testing
container.register_mock_llm_provider()

# Tests run without real LLM API calls
results = container.search_engine.search(...)

📂 Project Structure

  • app.py: Main application entry point (CLI & Web).
  • app/core/: Core architecture (DI container, service registration)
  • app/services/: LLM providers and search services
  • app/routes_v2.py: V2 API endpoints (JWT authenticated)
  • tag_learning.py: The "brain" that manages favorites and tag-based refinement.
  • report_generator.py: Generates the beautiful, standalone HTML reports.
  • config/: Centralized folder for all JSON data (favorites, learning DB, queries, blacklist).
  • static/ & templates/: Responsive frontend assets.
  • results/: Output folder for JSON and HTML findings.
  • tests/integration/: Zero-API integration tests (MockLLMProvider)

License: MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors