-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
ambitiousLarge scope, multiple componentsLarge scope, multiple components
Description
Summary
Group related papers that appear across different disciplines based on topic similarity, rather than just source-level deduplication.
Context
The app already deduplicates papers within a discipline using URL matching (PaperDeduplicator). But conceptually related papers across disciplines (e.g., a neuroscience paper and a CS/AI paper about the same neural network technique) are not connected. Clustering would surface these cross-discipline connections.
Possible approaches
- Keyword/TF-IDF clustering — Extract keywords from titles and abstracts, cluster by similarity. Simple, no external API needed.
- Embedding-based clustering — Use an embedding model (Gemini, OpenAI, or local) to compute paper embeddings, then cluster with k-means or DBSCAN.
- LLM-assisted grouping — Send all paper titles to the AI provider and ask it to identify groups. Simplest to implement but uses API quota.
What to build
- A
PaperClustererservice that takes the full digest and returns cluster assignments - A UI section showing "Related across disciplines" with grouped papers
- Configurable: opt-in via a checkbox or setting (clustering adds latency)
Considerations
- This is a significant feature — consider starting with approach 1 (keyword-based) as a proof of concept
- The digest is generated per-session, so clustering runs once after generation completes
- Papers may belong to multiple clusters
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
ambitiousLarge scope, multiple componentsLarge scope, multiple components