Skip to content

[Feature]: Context-aware web search query when files are uploaded #246

@cosarah

Description

@cosarah

Problem or Motivation

When a user uploads a file (e.g., a research paper) and enables web search, the search query is constructed directly from the requirement field. If the requirement is something like "给我讲讲这篇论文" ("Tell me about this paper"), the search query sent to Tavily is literally that phrase — which returns no useful results.

The root cause is in lib/server/classroom-generation.ts:244:

const searchResult = await searchWithTavily({ query: requirement, apiKey: tavilyKey });

The search query has no awareness of the uploaded file's content, making web search effectively useless when the requirement is a reference to the document rather than a topic description.

Proposed Solution

Add a query enhancement step before calling the search API when pdfContent is present:

  1. Extract key metadata from the PDF text (title, authors, abstract, keywords) — either via a lightweight LLM call or heuristic extraction from the first ~500 characters.
  2. Combine the extracted context with the original requirement to form a meaningful search query.
  3. Fall back to the raw requirement if no PDF content is available (current behavior).

For example:

  • Input: requirement = "给我讲讲这篇论文", PDF title = "Attention Is All You Need"
  • Enhanced query: "Attention Is All You Need" transformer architecture Vaswani 2017

Alternatives Considered

  • Use the full PDF text as search query: Not viable — Tavily has a 400-character query limit, and raw text would be noisy.
  • Skip web search when PDF is uploaded: Loses the benefit of supplementing the paper with up-to-date context (e.g., citation count, follow-up work, related resources).
  • Let the LLM decide the query in the prompt: Adds latency with an extra LLM round-trip, but could be more accurate than heuristic extraction.

Area

Classroom generation

Additional Context

Relevant code locations:

  • Search query construction: lib/server/classroom-generation.ts:244
  • Tavily search implementation: lib/web-search/tavily.ts (400-char truncation at line 13)
  • Prompt template where both contexts are injected: lib/generation/prompts/templates/requirements-to-outlines/user.md:21-35
  • The prompt already instructs the LLM to "reference specific findings and sources" from search results (user.md:78), but this is moot if the search results are irrelevant.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions