Reddit AI Curator is an advanced, AI-powered information retrieval system designed to find high-quality, relevant Reddit discussions. It combines professional Boolean search logic with Large Language Model (LLM) analysis to sift through thousands of posts and deliver the most impactful results.
Doesn't just run one search. It generates multiple query variations (Broad, Specific, Narrative, Jargon) and runs a "tournament" on a sample size to see which one performs best before committing to a full search.
If it doesn't find enough high-quality posts, it automatically triggers a tiered fallback system:
- Sort Variation: Retries with
relevance,top,hot, andcommentssorts. - Variant Fallback: Uses the runner-up queries from the tournament.
- Expansion: Increases fetch limits to 500 posts per sub and expands the time filter.
- Adaptive Scoring: Intelligently relaxes the quality threshold (from 80 to 70/60) if the quota is still unmet.
An advanced search mode that understands user intent through interactive clarification:
- Semantic Decomposition: Breaks requests into core requirements, bonus criteria, and preferences
- Interactive Clarification: AI asks targeted questions to resolve ambiguities
- Multi-Stage Scoring: 5-stage scoring algorithm (Disqualifiers → Core → Base → Bonus → Preference)
- Parallel Execution: Runs alongside standard keyword search
- Tag Extraction: Automatically extracts semantic tags from high-scoring results to learn the "vocabulary" of successful matches.
- Favorites: Save your favorite posts to train the AI. It will use your favorites to prioritize themes and keywords in future searches.
- Auto-Blacklist: Automatically blacklists posts scoring >85 to ensure every new search provides fresh content.
- Python 3.12+
- Install Dependencies:
pip install praw mistralai google-generativeai flask python-dotenv PyJWT
- Configure Environment:
Create a
.envfile with your credentials:REDDIT_CLIENT_ID=your_id REDDIT_CLIENT_SECRET=your_secret REDDIT_USER_AGENT=your_agent MISTRAL_API_KEY=your_key # or GOOGLE_API_KEY JWT_SECRET_KEY=your_jwt_secret JWT_ALGORITHM=HS256 JWT_EXPIRATION_HOURS=24
The V2 API requires JWT authentication for all protected endpoints. Generate a token using the auth endpoints:
# Get access token (POST /api/v2/auth/token)
curl -X POST http://localhost:5000/api/v2/auth/token \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "your_password"}'
# Use token in requests
curl http://localhost:5000/api/v2/search \
-H "Authorization: Bearer <your_jwt_token>"1. AI-Powered Curation (Best Results) Generates queries, runs a tournament, and performs the search cascade automatically.
python app.py curate --description "First person stories of car accidents in heavy rain" --target_posts 102. Direct Boolean Search
python app.py search --keywords "car accident AND (rain OR storm)" --criteria "High detail stories only"3. Discover Subreddits Finds subreddits that are most likely to contain the content you are looking for.
python app.py discover --keywords "adventure travel and hiking"Optional Flags:
--exhaustive: Try all sorts and variants for maximum recall.--no-fallback: Disable the automatic cascade.--json: Output raw data only.
Launch the interactive dashboard to run searches, view live results, manage subreddits, and browse your favorites.
python app.py
# Open http://localhost:5000 in your browserReddit AI Curator uses a Dependency Injection (DI) Container for service management:
The DI container (app/core/container.py) manages all service dependencies:
from app.core.container import container
# Get services from container
llm_provider = container.llm_provider
search_engine = container.search_engine
reddit_engine = container.reddit_engineServices are registered in app/core/service_registration.py:
| Service | Interface | Description |
|---|---|---|
llm_provider |
LLMProvider |
LLM interface (Mistral, Gemini, or Mock) |
reddit_engine |
RedditSearchEngine |
Reddit API client |
search_engine |
SearchEngine |
Main search orchestration |
All V2 API endpoints require JWT authentication:
| Endpoint | Method | Description |
|---|---|---|
/api/v2/auth/token |
POST | Get JWT access token |
/api/v2/search |
POST | Execute standard AI-powered search |
/api/v2/search/intent/analyze |
POST | Analyze intent & get clarification questions |
/api/v2/search/intent/clarify |
POST | Submit clarification answers |
/api/v2/search/intent/execute |
POST | Execute search with finalized intent |
/api/v2/search/intent/quick |
POST | One-shot intent search (no clarification) |
/api/v2/llm/generate-queries |
POST | Generate query variants |
/api/v2/llm/score |
POST | Score a post with LLM |
/api/v2/health |
GET | Health check (no auth required) |
Use MockLLMProvider for testing without external API calls:
from app.core.container import container
# Replace with mock for testing
container.register_mock_llm_provider()
# Tests run without real LLM API calls
results = container.search_engine.search(...)app.py: Main application entry point (CLI & Web).app/core/: Core architecture (DI container, service registration)app/services/: LLM providers and search servicesapp/routes_v2.py: V2 API endpoints (JWT authenticated)tag_learning.py: The "brain" that manages favorites and tag-based refinement.report_generator.py: Generates the beautiful, standalone HTML reports.config/: Centralized folder for all JSON data (favorites, learning DB, queries, blacklist).static/&templates/: Responsive frontend assets.results/: Output folder for JSON and HTML findings.tests/integration/: Zero-API integration tests (MockLLMProvider)
License: MIT