-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Problem or Motivation
When a user uploads a file (e.g., a research paper) and enables web search, the search query is constructed directly from the requirement field. If the requirement is something like "给我讲讲这篇论文" ("Tell me about this paper"), the search query sent to Tavily is literally that phrase — which returns no useful results.
The root cause is in lib/server/classroom-generation.ts:244:
const searchResult = await searchWithTavily({ query: requirement, apiKey: tavilyKey });The search query has no awareness of the uploaded file's content, making web search effectively useless when the requirement is a reference to the document rather than a topic description.
Proposed Solution
Add a query enhancement step before calling the search API when pdfContent is present:
- Extract key metadata from the PDF text (title, authors, abstract, keywords) — either via a lightweight LLM call or heuristic extraction from the first ~500 characters.
- Combine the extracted context with the original requirement to form a meaningful search query.
- Fall back to the raw requirement if no PDF content is available (current behavior).
For example:
- Input: requirement = "给我讲讲这篇论文", PDF title = "Attention Is All You Need"
- Enhanced query:
"Attention Is All You Need" transformer architecture Vaswani 2017
Alternatives Considered
- Use the full PDF text as search query: Not viable — Tavily has a 400-character query limit, and raw text would be noisy.
- Skip web search when PDF is uploaded: Loses the benefit of supplementing the paper with up-to-date context (e.g., citation count, follow-up work, related resources).
- Let the LLM decide the query in the prompt: Adds latency with an extra LLM round-trip, but could be more accurate than heuristic extraction.
Area
Classroom generation
Additional Context
Relevant code locations:
- Search query construction:
lib/server/classroom-generation.ts:244 - Tavily search implementation:
lib/web-search/tavily.ts(400-char truncation at line 13) - Prompt template where both contexts are injected:
lib/generation/prompts/templates/requirements-to-outlines/user.md:21-35 - The prompt already instructs the LLM to "reference specific findings and sources" from search results (user.md:78), but this is moot if the search results are irrelevant.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request