Skip to content

refactor: clean up debug logs, deduplicate rate-limit handling and output logic#6

Open
quotentiroler wants to merge 13 commits into
ssdeanx:mainfrom
quotentiroler:main
Open

refactor: clean up debug logs, deduplicate rate-limit handling and output logic#6
quotentiroler wants to merge 13 commits into
ssdeanx:mainfrom
quotentiroler:main

Conversation

@quotentiroler

@quotentiroler quotentiroler commented Mar 6, 2026

Copy link
Copy Markdown

Summary

Code quality cleanup and deduplication across the codebase.

Changes

  • Remove 43 console.error('[DBG]') debug statements across providers.ts and deep-research.ts
  • Centralize rate-limit detection — extract isRateLimitError() and rateLimitSummary() to new src/utils/errors.ts, replacing 6+ copy-pasted inline checks
  • Centralize output file logic — extract resolveOutputDir(), buildOutputPaths(), writeResearchOutput() to new src/utils/output.ts, deduplicating code between mcp-server.ts and run.ts
  • Simplify callGeminiProConfigurable() — remove redundant try/catch wrapper (errors already propagate from generateContentInternal)
  • Decouple truncateLearnings() from module-level output object
  • Remove unreachable comment in processGeminiResponse()
  • Clean .gitignore — deduplicate entries (5x dist/), fix typos, organize into sections
  • Improve JSON extraction — replace regex-based extraction with brace-balanced parser in src/utils/json.ts
  • Simplify SERP query prompt template in src/prompt.ts
  • Redirect progress output to stderr for MCP stdio JSON-RPC compliance

Files Changed (11)

File Change
src/utils/errors.ts New — centralized rate-limit error helpers
src/utils/output.ts New — shared output path/write utilities
src/ai/providers.ts Cleaned debug logs, simplified error handling
src/deep-research.ts Removed 31 DBG logs, deduplicated rate-limit checks
src/mcp-server.ts Uses shared output utility
src/run.ts Uses shared output utility
src/output-manager.ts Stderr redirect for MCP compliance
src/prompt.ts Simplified prompt templates
src/utils/json.ts Improved JSON extraction
.gitignore Cleaned and organized
package-lock.json Lockfile sync

Testing

  • TypeScript compiles cleanly (tsc --noEmit passes with exit code 0)

Summary by Sourcery

Refactor deep research, provider, and MCP server flows to improve JSON handling, centralize rate-limit and output management, and surface structured errors and outputs to callers.

New Features:

  • Add shared output utilities for resolving output directories, building filenames, and writing report and learnings files.
  • Surface structured error lists from deep research runs, including rate-limit issues, to both library consumers and MCP clients.

Bug Fixes:

  • Fix SERP query generation to handle both array and object-wrapped JSON formats and to avoid losing valid queries on partial failures.
  • Ensure Firecrawl-derived URLs and learnings are merged with Gemini grounding results instead of being overwritten, and that visited URLs/learnings propagate correctly through recursive research.

Enhancements:

  • Improve SERP query and Gemini prompt templates to produce strictly valid JSON responses and richer, non-duplicative learnings.
  • Add robust JSON extraction and Gemini response processing that can recover from mixed text/JSON and multiple response shapes.
  • Introduce centralized rate-limit detection helpers and integrate them into research and provider error handling with clearer logging.
  • Cap and sanitize learnings passed into outline, report, summary, and title generation to stay within prompt size limits.
  • Adjust Gemini provider configuration to handle JSON mode vs Google Search grounding compatibility and reduce candidate count defaults.
  • Redirect progress and debug logging to stderr to avoid corrupting MCP stdio JSON-RPC output.
  • Standardize output directory resolution across CLI and MCP server, and generate timestamped, query-based filenames for reports and learnings sidecars.
  • Expose research errors and output metadata (paths, sizes, truncation) through MCP responses for better caller visibility.
  • Clean up debug logging noise and unreachable comments across deep-research and related modules.

Chores:

  • Clean and deduplicate .gitignore entries and update the lockfile to reflect dependency state.

…tput logic

- Remove 43 console.error('[DBG]') debug statements
- Extract isRateLimitError() and rateLimitSummary() to src/utils/errors.ts
- Extract shared output path/write logic to src/utils/output.ts
- Simplify callGeminiProConfigurable() by removing redundant try/catch
- Clean up truncateLearnings() module-level coupling
- Remove unreachable comment in processGeminiResponse()
- Clean .gitignore: deduplicate entries, fix typos, organize sections
- Improve JSON extraction with brace-balanced parser (src/utils/json.ts)
- Simplify SERP query prompt template (src/prompt.ts)
- Redirect progress output to stderr for MCP stdio compliance
@sourcery-ai

sourcery-ai Bot commented Mar 6, 2026

Copy link
Copy Markdown

Reviewer's Guide

Refactors deep research, provider, and MCP CLI/server code to centralize error/output handling, improve JSON extraction and prompt design, surface rate-limit errors, and ensure MCP-compatible logging and file output, while cleaning up debug noise and gitignore entries.

Sequence diagram for MCP deepResearch_run tool with centralized output and error handling

sequenceDiagram
  actor User
  participant Client as MCP_client
  participant MCP as MCPServerModule
  participant DR as DeepResearchModule
  participant Prov as ProvidersModule
  participant UE as UtilsErrors
  participant UO as UtilsOutput
  participant FS as FileSystem

  User->>Client: Invoke tool deepResearch_run
  Client->>MCP: JSON-RPC request

  MCP->>MCP: Build cacheKey
  alt cached result
    MCP-->>Client: Cached MCPResearchResult
  else cache miss
    MCP->>DR: deepResearch(query, depth, breadth, existingLearnings, onProgress)
    DR->>Prov: callGeminiProConfigurable for SERP and research calls
    Prov->>Prov: generateContentInternal
    Prov->>UE: isRateLimitError(apiErr)? (on failure)
    alt API rate limited
      UE-->>Prov: true
      Prov->>UE: rateLimitSummary(apiErr)
      Prov->>Prov: log rate-limit details
      Prov-->>DR: throw apiErr
      DR->>UE: isRateLimitError(err)
      DR->>UE: rateLimitSummary(err)
      DR-->>MCP: ResearchResult with errors and no learnings
    else API ok
      Prov-->>DR: Gemini responses
      DR->>DR: processGeminiResponse, generateOutline, writeReportFromOutline, generateSummary, generateTitle
      DR-->>MCP: ResearchResult (learnings, visitedUrls, errors?)
    end

    alt no learnings and errors present
      MCP->>MCP: Build error-only MCPResearchResult
      MCP-->>Client: Error summary + metadata.errors
    else learnings present
      MCP->>DR: writeFinalReport(prompt, learnings, visitedUrls)
      DR-->>MCP: report markdown

      MCP->>UO: resolveOutputDir(outputDir, defaultPath)
      UO-->>MCP: resolvedDir
      MCP->>UO: writeResearchOutput(resolvedDir, query, report, learnings, visitedUrls, depth, breadth)
      UO->>FS: mkdir(outputDir), writeFile(report, learnings.json)
      FS-->>UO: ok
      UO-->>MCP: { reportPath, learningsPath }

      MCP->>MCP: Build inlineText (truncate if >4000 chars)
      MCP-->>Client: MCPResearchResult with content, metadata.stats, metadata.outputPath, metadata.errors
    end
  end

  Client-->>User: Shows report snippet and output file path
Loading

Class diagram for deep research, provider, and new utils structure

classDiagram
  class DeepResearchModule {
    +generateSerpQueries(query string, numQueries number, learnings string[], researchGoal string, initialQuery string, depth number, breadth number) Promise~SerpQuery[]~
    +truncateLearnings(learnings string[], maxChars number) string[]
    +generateOutline(prompt string, learnings string[]) Promise~string~
    +writeReportFromOutline(outline string, learnings string[]) Promise~string~
    +generateSummary(learnings string[]) Promise~string~
    +generateTitle(prompt string, learnings string[]) Promise~string~
    +deepResearch(query string, depth number, breadth number, existingLearnings string[], onProgress ResearchProgressCallback) Promise~ResearchResult~
    +processGeminiResponse(geminiResponseText string) Promise~ProcessedGeminiResponse~
  }

  class ResearchResult {
    +content string
    +sources string[]
    +methodology string
    +limitations string
    +citations string[]
    +learnings string[]
    +visitedUrls string[]
    +firecrawlResults SearchResponse
    +analysis string
    +errors string[]
  }

  class ProcessResult {
    +analysis string
    +content string
    +methodology string
    +limitations string
    +citations string[]
    +learnings string[]
    +visitedUrls string[]
    +firecrawlResults SearchResponse
    +errors string[]
  }

  class UtilsErrors {
    +isRateLimitError(err unknown) boolean
    +rateLimitSummary(err unknown, fallback string) string
  }

  class UtilsOutput {
    +buildOutputPaths(baseDir string, query string) OutputPaths
    +resolveOutputDir(explicit string, fallback string) string
    +writeResearchOutput(outputDir string, query string, report string, learnings string[], visitedUrls string[], depth number, breadth number) Promise~OutputPaths~
  }

  class OutputPaths {
    +reportPath string
    +learningsPath string
    +timestamp string
  }

  class OutputManager {
    +log(message string, data any) void
    +updateProgress(progress ResearchProgress) void
    +flushLogs() void
    +saveResearchReport(content string) void
    +static logCacheEviction(value unknown) void
  }

  class ProvidersModule {
    +generateContentInternal(prompt ContentArg, extra GenExtra) Promise~GenerateWrapped~
    +callGeminiProConfigurable(prompt string, opts GenExtra) Promise~string~
  }

  class GenerateWrapped {
    +response GenerateWrappedResponse
  }

  class GenerateWrappedResponse {
    +text() Promise~string~
  }

  class PromptModule {
    +serpQueryPromptTemplate string
    +learningPromptTemplate string
    +generateGeminiPrompt(query string, researchGoal string, learnings string[]) string
  }

  class MCPServerModule {
    +deepResearch_run(query string, depth number, breadth number, existingLearnings string[], goal string, outputDir string) Promise~MCPResearchResult~
  }

  class MCPResearchResult {
    +content MCPContentPart[]
    +metadata MCPMetadata
  }

  class MCPMetadata {
    +learnings string[]
    +visitedUrls string[]
    +outputPath string
    +stats MCPStats
    +errors string[]
  }

  class MCPStats {
    +totalLearnings number
    +totalSources number
    +reportLength number
    +truncated boolean
  }

  class RunCliModule {
    +runInteractiveResearch() Promise~void~
  }

  class JsonUtils {
    +extractJsonFromText(text string) any
    +safeParseJSON(text string, fallback any) any
    +isValidJSON(text string) boolean
  }

  UtilsErrors <.. DeepResearchModule : uses
  UtilsErrors <.. ProvidersModule : uses

  UtilsOutput <.. MCPServerModule : uses
  UtilsOutput <.. RunCliModule : uses

  DeepResearchModule --> ResearchResult
  DeepResearchModule --> ProcessResult
  DeepResearchModule --> OutputManager
  DeepResearchModule --> JsonUtils
  DeepResearchModule --> PromptModule
  DeepResearchModule --> ProvidersModule

  ProvidersModule --> GenerateWrapped
  GenerateWrapped --> GenerateWrappedResponse

  MCPServerModule --> MCPResearchResult
  MCPResearchResult --> MCPMetadata
  MCPMetadata --> MCPStats

  RunCliModule --> DeepResearchModule
  RunCliModule --> UtilsOutput

  PromptModule <.. DeepResearchModule : uses
  JsonUtils <.. DeepResearchModule : uses
Loading

File-Level Changes

Change Details Files
Centralize rate-limit classification/logging and propagate rate-limit errors instead of swallowing them.
  • Introduce isRateLimitError() and rateLimitSummary() helpers for Gemini API errors.
  • Use shared helpers in deep-research SERP generation, per-query processing, and provider calls to log structured rate-limit details and rethrow.
  • Ensure deepResearch and MCP tool responses surface rate-limit issues to callers via errors arrays.
src/utils/errors.ts
src/deep-research.ts
src/ai/providers.ts
src/mcp-server.ts
Centralize output directory resolution and report/learnings file writing.
  • Add resolveOutputDir(), buildOutputPaths(), and writeResearchOutput() utilities to handle timestamped filenames and JSON sidecars.
  • Refactor CLI run path to use shared output utilities instead of hard-coded output.md in CWD.
  • Refactor MCP server tool to write report and learnings via shared utils, attaching output path and truncation metadata.
src/utils/output.ts
src/run.ts
src/mcp-server.ts
Improve robustness of JSON extraction/parsing from LLM responses.
  • Replace regex-based JSON extraction with a brace-balanced parser that handles fenced code blocks, nested structures, and raw arrays/objects.
  • Update deep-research Gemini response handlers and SERP query parsing to fall back to extractJsonFromText when direct JSON.parse or safeParseJSON fail.
src/utils/json.ts
src/deep-research.ts
Refine deep research pipeline prompts, learning truncation, and error plumbing.
  • Add truncateLearnings() to cap prompt size and reuse it across outline, report, summary, and title generation with context-appropriate limits.
  • Redesign generateGeminiPrompt and serpQueryPromptTemplate to be explicit JSON-only schemas with clearer instructions and richer context.
  • Adjust deepResearch to handle SERP generation failures explicitly, merge Firecrawl and Gemini learnings/URLs, and return collected errors in the ResearchResult.
  • Avoid final report truncation inside deep-research and remove direct file writing, leaving output handling to callers.
src/deep-research.ts
src/prompt.ts
Clean up provider behavior, debug logging, and MCP compatibility of output.
  • Change default Gemini candidate count from 2 to 1 and auto-strip googleSearch tool when JSON mode is used with Gemini < 3.x.
  • Wrap generateContentInternal in try/catch to log rate-limit vs generic API failures before rethrowing, and normalize provider cache logging.
  • Remove numerous DBG console.error logs, standardize OutputManager logging to stderr for MCP stdio safety, and adjust cache eviction logging to stderr.
src/ai/providers.ts
src/deep-research.ts
src/output-manager.ts
Improve MCP tool API and environment configuration behavior.
  • Rename deepResearch MCP tool from deepResearch.run to deepResearch_run for modern MCP conventions.
  • Extend MCPResearchResult metadata with outputPath, stats (length/truncation), and errors, and short-circuit with a human-readable error report when no learnings are produced due to errors.
  • Change dotenv config load to use override: true and clean up .gitignore entries for dist and other artifacts.
src/mcp-server.ts
.gitignore
package-lock.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai

coderabbitai Bot commented Mar 6, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds centralized rate‑limit detection and API‑key validation; strengthens JSON extraction and JSON‑only prompts; surfaces errors in research result types; introduces timestamped output helpers and writing; routes logs and eviction messages to stderr; and normalizes .gitignore entries.

Changes

Cohort / File(s) Summary
Git ignore normalization
/.gitignore
Reorganized ignore patterns to trailing‑slash form, consolidated build/output under .dist/, moved env files to a dedicated section, added docs-cursorrules/ and a migration doc, and removed numerous .roo/rules/*.md entries.
AI provider & API key validation
src/ai/providers.ts
Added ApiKeyValidation export and validateApiKey(); added rate‑limit detection/logging, JSON‑grounding compatibility checks (strip tools when needed), reduced default candidate count, and wrapped Gemini calls with detailed error handling.
Deep research flow & types
src/deep-research.ts
Extended ResearchResult/ProcessResult and progress types with errors; accumulate and propagate errors (including rate‑limit), handle mixed Gemini JSON/text outputs, truncate learnings for prompts, and add robust JSON parsing fallbacks.
MCP server & tool surface
src/mcp-server.ts
Invoke validateApiKey at startup (logged), rename tool to deepResearch_run, accept outputDir input, write outputs via helpers, return outputPath, include metadata.errors, and add report length/truncated metadata.
Output helpers & integration
src/utils/output.ts, src/run.ts
New utilities: buildOutputPaths, resolveOutputDir, writeResearchOutput; run.ts now uses them to persist timestamped markdown and JSON sidecars and log resulting paths.
JSON extraction utility
src/utils/json.ts
Replace regex‑only extraction with strip‑fences + direct JSON.parse then bracket‑aware nested extraction (handles quotes/escapes), with clearer warnings on failure.
Rate‑limit helpers
src/utils/errors.ts
Add RATE_LIMIT_PATTERNS, isRateLimitError(err) and rateLimitSummary(err, fallback) for centralized detection and readable summaries.
Prompts & assembly
src/prompt.ts
Convert SERP and learning prompts to JSON‑only narrative templates, add error handling instructions, and assemble prompts including trimmed existing learnings.
Logging & output routing
src/logger.ts, src/output-manager.ts
Route pino output to stderr (destination:2 / pino‑pretty); emit flush/cache eviction messages to stderr (console.error) and remove stdout terminal control usage.
Misc wiring & logging tweaks
src/utils/*, src/run.ts
Add cache hit/miss guards and logging, minor logging style adjustments, and replace direct file writes with writeResearchOutput.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant MCP as MCPServer
    participant Validator as ApiKeyValidator
    participant Research as DeepResearch
    participant Gemini as GeminiAPI
    participant Writer as OutputWriter
    participant FS as Filesystem

    Client->>MCP: deepResearch_run(query, outputDir?)
    MCP->>Validator: validateApiKey()
    Validator-->>MCP: tierInfo / error
    MCP->>Research: runResearch(query, depth, breadth)
    Research->>Gemini: generate SERP queries / generate content
    alt rate-limit or error
        Gemini-->>Research: rate-limit / error
        Research-->>MCP: result with errors[]
    else success
        Gemini-->>Research: JSON/text responses
        Research->>Writer: report, learnings, visitedUrls
        Writer->>FS: write markdown + learnings.json
        FS-->>Writer: reportPath, learningsPath
        Writer-->>MCP: output paths
    end
    MCP-->>Client: MCPResearchResult (metadata: outputPath, errors, truncated)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through logs and parsed each nest,
Tuned prompts to JSON and handled each test,
I sniffed rate‑limit breezes and left a small note,
Stamped reports with timestamps and wrote every quote,
A tiny rabbit, proud of this code quest ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: cleanup of debug logs and deduplication of rate-limit handling and output logic, which are the core objectives of this refactoring PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing specific changes across 11 files including new utilities, refactored logic, and cleanup efforts that match the actual modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the codebase's maintainability and robustness by centralizing common functionalities and improving error handling. It streamlines the process of managing API rate limits, standardizes file output operations, and makes JSON extraction more reliable. These changes contribute to a cleaner, more efficient, and more compliant application architecture, particularly in how it interacts with external APIs and handles its own logging.

Highlights

  • Centralized Error Handling: Introduced new utility functions for detecting and summarizing API rate-limit errors, which are now used consistently across providers.ts and deep-research.ts to provide clearer error messages and prevent redundant checks.
  • Consolidated Output Management: Extracted output directory resolution and file writing logic into a new src/utils/output.ts module, deduplicating code previously found in mcp-server.ts and run.ts and allowing for configurable output paths.
  • Improved JSON Extraction: Replaced regex-based JSON extraction with a more robust brace-balancing parser in src/utils/json.ts, enhancing reliability when extracting JSON from mixed text, including responses with markdown code fences.
  • Refactored Prompt Templates: Simplified the SERP query prompt template and updated the generateGeminiPrompt to be more structured, providing clearer instructions and schema for Gemini API calls.
  • Cleaned Debug Logs and .gitignore: Removed numerous console.error('[DBG]') statements and significantly cleaned up the .gitignore file by deduplicating entries, fixing typos, and organizing it into logical sections.
  • MCP Stdio Compliance: Redirected progress and debug output from stdout to stderr in src/output-manager.ts to ensure compliance with the MCP JSON-RPC protocol, which reserves stdout for JSON responses.
Changelog
  • .gitignore
    • Cleaned and reorganized entries into logical sections
    • Removed duplicate dist/ entries and fixed typos
    • Added new entries for output/ and utility scripts
  • package-lock.json
    • Updated project version from 1.0.0 to 0.3.0
  • src/ai/providers.ts
    • Imported isRateLimitError and rateLimitSummary for centralized error handling
    • Adjusted default CANDIDATE_COUNT for Gemini API calls from 2 to 1
    • Implemented logic to auto-detect and handle JSON+grounding compatibility based on Gemini model version
    • Added try/catch block with centralized rate-limit error logging around generateContentInternal API call
    • Removed redundant await keyword from wrapped.response.text() call
  • src/deep-research.ts
    • Imported isRateLimitError and rateLimitSummary for centralized error handling
    • Added errors property to ResearchResult and ProcessResult interfaces to surface collected errors
    • Updated cache logging messages to use single quotes for consistency
    • Modified generateSerpQueries to use json: true option for callGeminiProConfigurable and added rate-limit error handling
    • Enhanced SERP query parsing to handle both array and object formats for raw JSON
    • Moved truncateLearnings function to be a standalone helper, decoupling it from the output object
    • Removed tools: [{ googleSearch: {} }] from callGeminiProConfigurable when json: true is used, aligning with Gemini API compatibility
    • Improved JSON parsing fallbacks in generateOutline, writeReportFromOutline, generateSummary, and generateTitle to use extractJsonFromText
    • Added comprehensive rate-limit error handling and reporting within the deepResearch function, including returning collected errors
    • Adjusted URL and learning collection logic to merge results correctly
    • Removed trimPrompt and output.saveResearchReport calls, delegating final report saving to callers
    • Updated processGeminiResponse to robustly extract JSON from mixed text, handling various response formats
  • src/mcp-server.ts
    • Imported resolveOutputDir and writeResearchOutput from new output utilities
    • Updated config call to use override: true for environment variables
    • Added outputPath, reportLength, truncated, and errors properties to MCPResearchResult metadata
    • Renamed tool registration from deepResearch.run to deepResearch_run
    • Added outputDir parameter to the deepResearch.run tool schema
    • Implemented logic to surface research errors (e.g., rate limits) in the MCP response
    • Integrated resolveOutputDir and writeResearchOutput for standardized file output
    • Added truncation logic for the inline report in the MCP response, with a pointer to the full file
  • src/output-manager.ts
    • Redirected log output from process.stdout to process.stderr for MCP stdio compliance
    • Removed TERMINAL_CONTROLS.savePos and restorePos calls
    • Changed OutputManager.logCacheEviction to use console.error instead of console.log
  • src/prompt.ts
    • Simplified serpQueryPromptTemplate to be more direct and schema-focused, removing verbose descriptions
    • Updated generateGeminiPrompt with a more detailed and structured prompt, including instructions for Google Search and JSON output schema
  • src/run.ts
    • Imported resolveOutputDir and writeResearchOutput from new output utilities
    • Replaced direct fs.writeFile with writeResearchOutput for standardized report and learnings saving
  • src/utils/errors.ts
    • Added new file src/utils/errors.ts
    • Implemented isRateLimitError function to detect API rate-limit errors
    • Implemented rateLimitSummary function to extract human-readable details from rate-limit error messages
  • src/utils/json.ts
    • Refactored extractJsonFromText to use a brace-balancing algorithm for more robust JSON extraction
    • Added logic to strip markdown code fences before attempting JSON parsing
    • Improved error handling and logging within the JSON extraction process
  • src/utils/output.ts
    • Added new file src/utils/output.ts
    • Implemented buildOutputPaths to generate timestamped, slug-suffixed filenames for reports and learnings
    • Implemented resolveOutputDir to determine the output directory based on explicit parameter, environment variable, or default
    • Implemented writeResearchOutput to handle writing both the markdown report and a JSON sidecar for learnings and metadata
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In rateLimitSummary, the model capture regex model: ([\w-]+) will truncate model IDs that contain dots (e.g. gemini-2.5-flashgemini-2); consider broadening the character class (e.g. [\w.-]+) so logs/reporting include the full model name.
  • In writeResearchOutput, the JSON sidecar sets goal: opts.query, but the calling code already distinguishes between query and an optional research goal; consider threading the actual goal through the function signature so it can be persisted correctly instead of duplicating the query.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `rateLimitSummary`, the `model` capture regex `model: ([\w-]+)` will truncate model IDs that contain dots (e.g. `gemini-2.5-flash``gemini-2`); consider broadening the character class (e.g. `[\w.-]+`) so logs/reporting include the full model name.
- In `writeResearchOutput`, the JSON sidecar sets `goal: opts.query`, but the calling code already distinguishes between `query` and an optional research `goal`; consider threading the actual goal through the function signature so it can be persisted correctly instead of duplicating the query.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides an excellent and comprehensive refactoring, significantly improving code quality, robustness, and maintainability through centralized rate-limit error handling, output file logic, and more resilient JSON parsing from LLM responses. However, it introduces a critical security vulnerability by allowing arbitrary file writes via the new outputDir parameter in the MCP server. This parameter lacks validation and sandboxing, creating a path traversal risk that could be exploited via indirect prompt injection to write malicious files. This must be addressed by restricting file writes to a safe, pre-defined directory. For further improvement, consider replacing magic numbers with named constants and ensuring consistent logging practices. Also, the breaking change in the MCP tool name should be communicated to consumers.

Comment thread src/mcp-server.ts
breadth: z.number().min(1).max(5).optional().describe("How broad to make each research level (1-5)"),
existingLearnings: z.array(z.string()).optional().describe("Optional learnings to build upon"),
goal: z.string().optional().describe("Optional goal/brief to steer synthesis"),
outputDir: z.string().optional().describe("Absolute path to save output report and learnings. Falls back to OUTPUT_DIR env var, then ./output/"),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The outputDir parameter allows the caller (which could be an LLM influenced by untrusted web content) to specify an absolute path for writing research results. This introduces an arbitrary file write vulnerability. An attacker could potentially use indirect prompt injection to trick the LLM into writing malicious files to sensitive locations on the user's system, such as configuration files or startup directories, depending on the permissions of the process running the MCP server.

Comment thread src/utils/output.ts
Comment on lines +23 to +24
export function resolveOutputDir(explicit?: string, fallback?: string): string {
if (explicit) return resolve(explicit);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The resolveOutputDir function resolves the explicit path parameter using path.resolve() without any validation or sandboxing. This allows the application to write files to any directory the process has access to. To mitigate this, you should ensure that the resolved path is contained within a designated safe base directory.

Comment thread src/deep-research.ts
// Insert helper functions before writeFinalReport

// Limit learnings to fit within a reasonable prompt window (~30K chars target)
function truncateLearnings(learnings: string[], maxChars = 30000): string[] {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The maxChars default value of 30000 is a magic number. For better maintainability and readability, consider defining this as a named constant at the top of the file (e.g., const MAX_LEARNINGS_CHARS_FOR_PROMPT = 30000;) and then using this constant as the default value.

Comment thread src/deep-research.ts
const titlePrompt = `${systemPrompt()}\n\nReturn JSON with a single 'title' for a research report based on the prompt and learnings:\nPrompt: ${prompt}\nLearnings:\n${learnings.join("\\n")}`;
const json = await callGeminiProConfigurable(titlePrompt, { json: true, schema, tools: [{ googleSearch: {} }] });
// Title only needs a few learnings for context
const truncated = truncateLearnings(learnings, 5000);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value 5000 used for truncating learnings for the title generation is a magic number. It should be defined as a named constant to improve readability and maintainability, for example: const MAX_LEARNINGS_CHARS_FOR_TITLE = 5000;.

Comment thread src/mcp-server.ts
// Define the deep research tool (modern API)
server.registerTool(
"deepResearch.run",
"deepResearch_run",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Renaming the tool from deepResearch.run to deepResearch_run is a good move for robustness, as dots in tool names can sometimes cause issues. However, this is a breaking change for any clients using this tool. It would be beneficial to explicitly mention this breaking change in the pull request description or a changelog to ensure consumers of the tool are aware and can update their integrations accordingly.

Comment thread src/mcp-server.ts
// If research returned 0 learnings due to errors, surface that clearly
if (result.learnings.length === 0 && researchErrors.length > 0) {
const errorSummary = researchErrors.join('\n- ');
const errorText = `Research failed — no learnings were collected.\n\nErrors encountered:\n- ${errorSummary}\n\nSuggestions:\n- Check your API key and billing at https://ai.dev/rate-limit\n- Free tier is limited to 20 requests/day/model\n- Try reducing depth/breadth, or wait and retry`;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The URL https://ai.dev/rate-limit is hardcoded in the error message. It's a good practice to avoid hardcoding external URLs in the source code. Consider moving this URL to a configuration file or an environment variable. This makes it easier to update in the future if the URL changes, without requiring a code modification and redeployment. You can define the URL constant outside this block for better readability.

Suggested change
const errorText = `Research failed — no learnings were collected.\n\nErrors encountered:\n- ${errorSummary}\n\nSuggestions:\n- Check your API key and billing at https://ai.dev/rate-limit\n- Free tier is limited to 20 requests/day/model\n- Try reducing depth/breadth, or wait and retry`;
const errorText = `Research failed — no learnings were collected.\n\nErrors encountered:\n- ${errorSummary}\n\nSuggestions:\n- Check your API key and billing at ${process.env.RATE_LIMIT_INFO_URL || 'https://ai.dev/rate-limit'}\n- Free tier is limited to 20 requests/day/model\n- Try reducing depth/breadth, or wait and retry`;

Comment thread src/utils/json.ts
Comment on lines +45 to +49
console.warn("No valid JSON found in text.");
return null;
} catch (error) {
console.error("Error during JSON extraction:", error);
return null;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function uses console.warn and console.error for logging. This is inconsistent with the rest of the application, which appears to use a centralized logger (e.g., pino via OutputManager). To maintain consistent logging and enable centralized control over log levels and outputs, consider passing a logger instance to this utility function or importing the project's logger if it doesn't create circular dependencies.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/prompt.ts (1)

190-200: ⚠️ Potential issue | 🔴 Critical

Module will crash at load time due to prompt version mismatch.

validatePromptConsistency() is invoked at line 201 during module initialization and requires all templates to match the "Schema Version: X.X.X" pattern. However:

  • systemPrompt contains "Research Protocol: v2.3.1" (not "Schema Version")
  • serpQueryPromptTemplate contains no schema version at all
  • learningPromptTemplate contains "Schema Version: 2.1.0"

This produces versions = [undefined, undefined, "2.1.0"], and the check new Set(versions).size > 1 evaluates to true, throwing an error: Prompt version mismatch: undefined vs undefined vs 2.1.0.

Either add matching "Schema Version: X.X.X" headers to all three templates, or update the validation logic to handle partial matches.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/prompt.ts` around lines 190 - 200, validatePromptConsistency currently
treats missing "Schema Version: X.X.X" matches as undefined and throws on mixed
undefined/defined values; update validatePromptConsistency to extract versions
from systemPrompt, serpQueryPromptTemplate, and learningPromptTemplate, then
filter out undefined/null before comparing (e.g., const defined =
versions.filter(Boolean)); if defined is empty do nothing, otherwise ensure all
defined values are identical and throw a clearer error listing which templates
are missing versions (reference the validatePromptConsistency function and the
systemPrompt, serpQueryPromptTemplate, learningPromptTemplate identifiers).
src/deep-research.ts (2)

637-647: ⚠️ Potential issue | 🟠 Major

Propagate nested branch errors here.

A recursive deepResearch() call can return errors, but this success path drops them. Any rate-limit/failure from lower depths disappears before collectedErrors runs.

♻️ Minimal fix
           return {
             analysis: deeper.analysis,
             content: deeper.content,
             sources: deeper.sources,
             methodology: deeper.methodology,
             limitations: deeper.limitations,
             citations: deeper.citations.map(c => c.reference),
             learnings: [...newLearnings, ...deeper.learnings],
             visitedUrls: [...newUrls, ...deeper.visitedUrls],
             firecrawlResults: deeper.firecrawlResults,
+            errors: deeper.errors,
           };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/deep-research.ts` around lines 637 - 647, The recursive deepResearch()
success return currently omits any errors from the nested call (the `deeper`
object), which prevents lower-depth rate-limit/failure errors from reaching
`collectedErrors`; update the returned object in deepResearch to include an
`errors` field that merges the current invocation's errors with `deeper.errors`
(e.g. combine whatever local/new errors array you use with `deeper.errors`) so
that `collectedErrors` sees propagated errors from nested branches; look for the
return that builds from `deeper` (uses symbols like deeper, newLearnings,
newUrls, firecrawlResults) and add/merge the `errors` property there.

223-239: ⚠️ Potential issue | 🟠 Major

Preserve Gemini's per-query researchGoal.

If the model returns { query, researchGoal } items, Line 236 overwrites every one with the parent researchGoal. That collapses the branch-specific intent that generateGeminiPrompt() uses later.

♻️ Keep the item-level goal when it exists
       .map((rawQuery: unknown) => {
-        const queryValue = ((): unknown => {
+        const { queryValue, researchGoalValue } = (() => {
           if (typeof rawQuery === 'object' && rawQuery !== null && 'query' in rawQuery) {
-            const q = (rawQuery as { query?: unknown }).query;
-            return typeof q === 'string' ? q : q != null ? String(q) : '';
+            const item = rawQuery as { query?: unknown; researchGoal?: unknown };
+            return {
+              queryValue: typeof item.query === 'string' ? item.query : item.query != null ? String(item.query) : '',
+              researchGoalValue: typeof item.researchGoal === 'string' ? item.researchGoal : researchGoal,
+            };
           }
-          return rawQuery;
+          return { queryValue: rawQuery, researchGoalValue: researchGoal };
         })();
         const parsed = SerpQuerySchema.safeParse({
           query: typeof queryValue === 'string' ? queryValue : String(queryValue ?? ''),
-          researchGoal,
+          researchGoal: researchGoalValue,
         });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/deep-research.ts` around lines 223 - 239, The mapping currently always
injects the outer `researchGoal` into each SerpQuery, overwriting any item-level
goal; update the map to detect and preserve a per-item goal when present: inside
the map for `queriesArray` that builds `serpQueries`, derive an
`itemResearchGoal` from `rawQuery` (e.g., if rawQuery is an object and has
`researchGoal`, use that value converted to string, otherwise fallback to the
outer `researchGoal`) and pass that `itemResearchGoal` into
`SerpQuerySchema.safeParse` instead of the parent `researchGoal`; keep using
`queryValue` as computed and validate as before so `generateGeminiPrompt()`
receives the item-specific intent when available.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Around line 6-8: Remove the incorrect `.dist/` entry from .gitignore (only
`dist/` is correct per tsconfig.json and package.json) and, if the intent is to
stop tracking docs-cursorrules/, first run `git rm --cached -r
docs-cursorrules/` to untrack the already-indexed file
docs-cursorrules/code_snippets_deep_research.cursorrules before committing the
.gitignore change so the new ignore will take effect.

In `@src/deep-research.ts`:
- Around line 203-210: The catch block in generateSerpQueries is swallowing
Gemini transport/parse failures by logging and setting jsonString = '{}' which
makes deepResearch think there were no queries; instead, after logging (and
preserving the existing rate-limit throw when isRateLimitError(err)), rethrow
the error so callers like deepResearch can surface the failure (remove or avoid
the jsonString = '{}' fallback); apply the same change to the similar handling
around lines 242-243 so all non-rate-limit Gemini errors are logged and rethrown
rather than normalized to empty results.
- Around line 355-363: The truncateLearnings function currently stops scanning
when it encounters the first learning that would push totalChars over maxChars;
update truncateLearnings to skip individual oversized entries instead of
breaking so later smaller learnings can still be included: iterate over
learnings, if a single learning.length > maxChars skip it (or continue) and if
totalChars + learning.length > maxChars also continue (don’t break), otherwise
push the learning and increment totalChars; keep the function signature and
return type the same so callers of truncateLearnings behave identically.
- Around line 888-900: The fallback currently casts any extracted JSON from
extractJsonFromText into GeminiResponse (used in responseData), which can lead
to non-string payloads causing later sanitization to throw; instead validate and
normalize the extracted value against the expected GeminiResponse schema before
assigning: run the extracted value through the same normalization/validation
used by safeParseJSON (or a small schema check) to ensure response.items is an
array and each item has the expected fields with correct types (coerce or
stringify fields like learnings/summary to strings, discard or skip malformed
entries), and only then set responseData = { items: validatedItems } (keep using
safeParseJSON, extractJsonFromText, GeminiResponse, and responseData names to
locate where to add validation).

In `@src/mcp-server.ts`:
- Around line 74-79: The schema defines optional inputs goal and flags but the
async handler for MCPResearchResult doesn't destructure or use them; update the
handler signature to include goal and flags (e.g., add goal and flags to the
parameter destructuring in the async function) and either use them (pass goal to
synthesis/steering logic and use flags.grounding or flags.urlContext where
appropriate) or remove goal and flags from the zod input schema if they are
truly unused; ensure references include the schema key names goal and flags and
the async handler that currently destructures { query, depth, breadth,
existingLearnings = [], outputDir }.

In `@src/utils/errors.ts`:
- Around line 9-12: The isRateLimitError function uses a case-sensitive includes
check which can miss differently-cased API messages; update isRateLimitError to
perform case-insensitive matching against RATE_LIMIT_PATTERNS by normalizing
both the message and pattern (e.g., use const msg = (err instanceof Error ?
err.message : String(err ?? '')).toLowerCase() and compare against
RATE_LIMIT_PATTERNS.map(p => p.toLowerCase()).some(...)), or if
RATE_LIMIT_PATTERNS contains RegExp entries, use p.test(msg) and ensure those
regexes include the /i flag; refer to isRateLimitError and RATE_LIMIT_PATTERNS
when making the change.

In `@src/utils/output.ts`:
- Around line 10-14: Update the slug generation to strip both leading and
trailing dashes and provide a fallback when the result is empty: change the
chain that builds the slug (the slug variable derived from query) to use a regex
that trims dashes at both ends (for example replace(/^[-]+|[-]+$/g, '')) and
after toLowerCase check if slug === '' and, if so, assign a safe default like
'untitled' (or a sanitized/truncated version of query) so edge-case queries with
only special characters or leading punctuation never produce an empty or
dash-prefixed slug.
- Around line 48-53: The learnings JSON object currently contains both query and
goal set to the same value (opts.query); remove the redundant field by deleting
"goal: opts.query" from the object that builds the learnings payload (the block
with query, depth, breadth, goal, learnings) so only query: opts.query remains,
and update any downstream consumers if they relied on a separate goal property;
reference the fields query, goal, opts.query, and the learnings JSON object when
making the change.

---

Outside diff comments:
In `@src/deep-research.ts`:
- Around line 637-647: The recursive deepResearch() success return currently
omits any errors from the nested call (the `deeper` object), which prevents
lower-depth rate-limit/failure errors from reaching `collectedErrors`; update
the returned object in deepResearch to include an `errors` field that merges the
current invocation's errors with `deeper.errors` (e.g. combine whatever
local/new errors array you use with `deeper.errors`) so that `collectedErrors`
sees propagated errors from nested branches; look for the return that builds
from `deeper` (uses symbols like deeper, newLearnings, newUrls,
firecrawlResults) and add/merge the `errors` property there.
- Around line 223-239: The mapping currently always injects the outer
`researchGoal` into each SerpQuery, overwriting any item-level goal; update the
map to detect and preserve a per-item goal when present: inside the map for
`queriesArray` that builds `serpQueries`, derive an `itemResearchGoal` from
`rawQuery` (e.g., if rawQuery is an object and has `researchGoal`, use that
value converted to string, otherwise fallback to the outer `researchGoal`) and
pass that `itemResearchGoal` into `SerpQuerySchema.safeParse` instead of the
parent `researchGoal`; keep using `queryValue` as computed and validate as
before so `generateGeminiPrompt()` receives the item-specific intent when
available.

In `@src/prompt.ts`:
- Around line 190-200: validatePromptConsistency currently treats missing
"Schema Version: X.X.X" matches as undefined and throws on mixed
undefined/defined values; update validatePromptConsistency to extract versions
from systemPrompt, serpQueryPromptTemplate, and learningPromptTemplate, then
filter out undefined/null before comparing (e.g., const defined =
versions.filter(Boolean)); if defined is empty do nothing, otherwise ensure all
defined values are identical and throw a clearer error listing which templates
are missing versions (reference the validatePromptConsistency function and the
systemPrompt, serpQueryPromptTemplate, learningPromptTemplate identifiers).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b3ff6433-97c1-4977-ba85-870dfe6566ed

📥 Commits

Reviewing files that changed from the base of the PR and between 3b9d717 and 9191533.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (10)
  • .gitignore
  • src/ai/providers.ts
  • src/deep-research.ts
  • src/mcp-server.ts
  • src/output-manager.ts
  • src/prompt.ts
  • src/run.ts
  • src/utils/errors.ts
  • src/utils/json.ts
  • src/utils/output.ts

Comment thread .gitignore
Comment on lines +6 to +8
# Build outputs
dist/
dist/*
dist/**/*
dist/**/**/*
docs-cursorrrules
.dist/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that no .dist directory is created during build and that dist/ is the actual output

# Check if .dist directory exists or is referenced anywhere
fd -HI '^\.dist$' --type d

# Check tsconfig.json for outDir configuration
rg -n '"outDir"' tsconfig.json

# Check if any build scripts or tools reference .dist (with leading dot)
rg -n '\.dist' package.json tsconfig.json --glob '!.gitignore'

Repository: ssdeanx/deep-research-mcp-server

Length of output: 102


🏁 Script executed:

# Search entire codebase for .dist references (not already checked)
rg '\.dist' --type-not json --type-not lock --type-not yaml

# Check if docs-cursorrules/ exists and its git status
fd -HI '^docs-cursorrules$' --type d
git status docs-cursorrules/ 2>/dev/null || echo "Not found or not tracked"

# Also check what's currently in git index for docs-cursorrules
git ls-files | grep -E '^docs-cursorrules' | head -20

Repository: ssdeanx/deep-research-mcp-server

Length of output: 216


Remove .dist/ and verify intent for docs-cursorrules/ entry.

The addition of .dist/ (line 8) is incorrect. According to tsconfig.json, the TypeScript compiler outputs to ./dist, and package.json references dist/mcp-server.js as the main entry point. A comprehensive search of the codebase confirms that .dist (with a leading dot) is never created or referenced anywhere. The patterns dist/ and .dist/ match different directories—only dist/ correctly ignores the actual build output. Remove line 8.

Additionally, docs-cursorrules/ (line 47) is already tracked in git (docs-cursorrules/code_snippets_deep_research.cursorrules is in the index). Adding it to .gitignore will not remove it from tracking. If the intent is to stop tracking this directory, use git rm --cached -r docs-cursorrules/ first before committing this .gitignore change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 6 - 8, Remove the incorrect `.dist/` entry from
.gitignore (only `dist/` is correct per tsconfig.json and package.json) and, if
the intent is to stop tracking docs-cursorrules/, first run `git rm --cached -r
docs-cursorrules/` to untrack the already-indexed file
docs-cursorrules/code_snippets_deep_research.cursorrules before committing the
.gitignore change so the new ignore will take effect.

Comment thread src/deep-research.ts
Comment on lines 203 to 210
} catch (err) {
output.log("Gemini error:", { error: err instanceof Error ? err.message : 'Unknown error' });
output.log('Error in generateSerpQueries:', { error: err instanceof Error ? err.message : 'Unknown error' });
if (isRateLimitError(err)) {
output.log('Rate limit exceeded during SERP query generation', { detail: rateLimitSummary(err) });
throw err;
}
const errMsg = err instanceof Error ? err.message : String(err);
output.log('Gemini error:', { error: errMsg });
jsonString = '{}';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't turn SERP-generation failures into a successful empty run.

These branches normalize Gemini transport/parse failures to []. deepResearch() only emits errors when generateSerpQueries() throws, so auth/timeouts/bad JSON currently look like a clean "no queries" result.

♻️ Surface the failure to the caller
     } catch (err) {
       if (isRateLimitError(err)) {
         output.log('Rate limit exceeded during SERP query generation', { detail: rateLimitSummary(err) });
         throw err;
       }
       const errMsg = err instanceof Error ? err.message : String(err);
       output.log('Gemini error:', { error: errMsg });
-      jsonString = '{}';
+      throw new Error(`SERP query generation failed: ${errMsg}`);
     }
@@
     } else {
-      output.log('Failed to generate or parse SERP queries from Gemini response, using fallback to empty array.');
-      serpQueries = [];
+      throw new Error('Failed to parse SERP queries from Gemini response.');
     }

Also applies to: 242-243

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/deep-research.ts` around lines 203 - 210, The catch block in
generateSerpQueries is swallowing Gemini transport/parse failures by logging and
setting jsonString = '{}' which makes deepResearch think there were no queries;
instead, after logging (and preserving the existing rate-limit throw when
isRateLimitError(err)), rethrow the error so callers like deepResearch can
surface the failure (remove or avoid the jsonString = '{}' fallback); apply the
same change to the similar handling around lines 242-243 so all non-rate-limit
Gemini errors are logged and rethrown rather than normalized to empty results.

Comment thread src/deep-research.ts
Comment on lines +355 to +363
function truncateLearnings(learnings: string[], maxChars = 30000): string[] {
const result: string[] = [];
let totalChars = 0;
for (const learning of learnings) {
if (totalChars + learning.length > maxChars) break;
result.push(learning);
totalChars += learning.length;
}
return result;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't abort on the first oversized learning.

Line 359 breaks the entire scan. If the first learning alone exceeds maxChars, outline/report/summary/title generation runs with an empty context even when later learnings would fit.

♻️ Skip oversized entries instead of stopping the scan
   for (const learning of learnings) {
-    if (totalChars + learning.length > maxChars) break;
+    if (totalChars + learning.length > maxChars) continue;
     result.push(learning);
     totalChars += learning.length;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/deep-research.ts` around lines 355 - 363, The truncateLearnings function
currently stops scanning when it encounters the first learning that would push
totalChars over maxChars; update truncateLearnings to skip individual oversized
entries instead of breaking so later smaller learnings can still be included:
iterate over learnings, if a single learning.length > maxChars skip it (or
continue) and if totalChars + learning.length > maxChars also continue (don’t
break), otherwise push the learning and increment totalChars; keep the function
signature and return type the same so callers of truncateLearnings behave
identically.

Comment thread src/deep-research.ts
Comment on lines +888 to +900
let responseData = safeParseJSON<GeminiResponse>(geminiResponseText, { items: [] });
if (!responseData.items || responseData.items.length === 0) {
const extracted = extractJsonFromText(geminiResponseText);
if (extracted) {
// Handle both { items: [...] } and raw array [...] formats
if (Array.isArray(extracted)) {
responseData = { items: extracted };
} else if (extracted.items && Array.isArray(extracted.items)) {
responseData = extracted as GeminiResponse;
} else {
// Single object response — wrap it
responseData = { items: [extracted] };
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate extracted JSON before casting it to GeminiResponse.

This fallback trusts arbitrary extracted JSON as GeminiResponse. If Gemini emits items: [{ learnings: [1] }] or any other non-string payload, the later string sanitization throws and the whole query is discarded.

♻️ Normalize with a schema before consuming the payload
 async function processGeminiResponse(geminiResponseText: string): Promise<ProcessedGeminiResponse> {
+  const GeminiResponseSchema = z.object({
+    items: z.array(z.object({
+      learning: z.string().optional(),
+      learnings: z.array(z.string()).optional(),
+      url: z.string().optional(),
+    })),
+  });
   // Try direct parse first, then fall back to JSON extraction from mixed text
   let responseData = safeParseJSON<GeminiResponse>(geminiResponseText, { items: [] });
   if (!responseData.items || responseData.items.length === 0) {
     const extracted = extractJsonFromText(geminiResponseText);
     if (extracted) {
-      // Handle both { items: [...] } and raw array [...] formats
-      if (Array.isArray(extracted)) {
-        responseData = { items: extracted };
-      } else if (extracted.items && Array.isArray(extracted.items)) {
-        responseData = extracted as GeminiResponse;
-      } else {
-        // Single object response — wrap it
-        responseData = { items: [extracted] };
-      }
+      const normalized =
+        Array.isArray(extracted)
+          ? { items: extracted }
+          : (typeof extracted === 'object' &&
+             extracted !== null &&
+             'items' in extracted &&
+             Array.isArray((extracted as { items?: unknown }).items))
+            ? extracted
+            : { items: [extracted] };
+      const parsed = GeminiResponseSchema.safeParse(normalized);
+      responseData = parsed.success ? parsed.data : { items: [] };
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/deep-research.ts` around lines 888 - 900, The fallback currently casts
any extracted JSON from extractJsonFromText into GeminiResponse (used in
responseData), which can lead to non-string payloads causing later sanitization
to throw; instead validate and normalize the extracted value against the
expected GeminiResponse schema before assigning: run the extracted value through
the same normalization/validation used by safeParseJSON (or a small schema
check) to ensure response.items is an array and each item has the expected
fields with correct types (coerce or stringify fields like learnings/summary to
strings, discard or skip malformed entries), and only then set responseData = {
items: validatedItems } (keep using safeParseJSON, extractJsonFromText,
GeminiResponse, and responseData names to locate where to add validation).

Comment thread src/mcp-server.ts Outdated
Comment on lines +74 to +79
goal: z.string().optional().describe("Optional goal/brief to steer synthesis"),
outputDir: z.string().optional().describe("Absolute path to save output report and learnings. Falls back to OUTPUT_DIR env var, then ./output/"),
flags: z.object({ grounding: z.boolean().optional(), urlContext: z.boolean().optional() }).optional(),
}
},
async ({ query, depth, breadth, existingLearnings = [] }): Promise<MCPResearchResult> => {
async ({ query, depth, breadth, existingLearnings = [], outputDir }): Promise<MCPResearchResult> => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused input parameters: goal and flags.

The input schema defines goal (line 74) and flags (line 76), but the handler (line 79) doesn't destructure or use them. Either remove from the schema or implement their functionality.

🔧 Option A: Remove unused schema fields
       existingLearnings: z.array(z.string()).optional().describe("Optional learnings to build upon"),
-      goal: z.string().optional().describe("Optional goal/brief to steer synthesis"),
       outputDir: z.string().optional().describe("Absolute path to save output report and learnings. Falls back to OUTPUT_DIR env var, then ./output/"),
-      flags: z.object({ grounding: z.boolean().optional(), urlContext: z.boolean().optional() }).optional(),
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
goal: z.string().optional().describe("Optional goal/brief to steer synthesis"),
outputDir: z.string().optional().describe("Absolute path to save output report and learnings. Falls back to OUTPUT_DIR env var, then ./output/"),
flags: z.object({ grounding: z.boolean().optional(), urlContext: z.boolean().optional() }).optional(),
}
},
async ({ query, depth, breadth, existingLearnings = [] }): Promise<MCPResearchResult> => {
async ({ query, depth, breadth, existingLearnings = [], outputDir }): Promise<MCPResearchResult> => {
outputDir: z.string().optional().describe("Absolute path to save output report and learnings. Falls back to OUTPUT_DIR env var, then ./output/"),
}
},
async ({ query, depth, breadth, existingLearnings = [], outputDir }): Promise<MCPResearchResult> => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mcp-server.ts` around lines 74 - 79, The schema defines optional inputs
goal and flags but the async handler for MCPResearchResult doesn't destructure
or use them; update the handler signature to include goal and flags (e.g., add
goal and flags to the parameter destructuring in the async function) and either
use them (pass goal to synthesis/steering logic and use flags.grounding or
flags.urlContext where appropriate) or remove goal and flags from the zod input
schema if they are truly unused; ensure references include the schema key names
goal and flags and the async handler that currently destructures { query, depth,
breadth, existingLearnings = [], outputDir }.

Comment thread src/utils/errors.ts
Comment on lines +9 to +12
export function isRateLimitError(err: unknown): boolean {
const msg = err instanceof Error ? err.message : String(err ?? '');
return RATE_LIMIT_PATTERNS.some(p => msg.includes(p));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider case-insensitive matching for broader error coverage.

The includes() check is case-sensitive, but error messages from APIs can vary in casing (e.g., "Quota exceeded" vs "quota exceeded"). The regex in rateLimitSummary already uses the /i flag.

♻️ Optional: Case-insensitive pattern matching
 export function isRateLimitError(err: unknown): boolean {
   const msg = err instanceof Error ? err.message : String(err ?? '');
-  return RATE_LIMIT_PATTERNS.some(p => msg.includes(p));
+  const msgLower = msg.toLowerCase();
+  return RATE_LIMIT_PATTERNS.some(p => msgLower.includes(p.toLowerCase()));
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/errors.ts` around lines 9 - 12, The isRateLimitError function uses
a case-sensitive includes check which can miss differently-cased API messages;
update isRateLimitError to perform case-insensitive matching against
RATE_LIMIT_PATTERNS by normalizing both the message and pattern (e.g., use const
msg = (err instanceof Error ? err.message : String(err ?? '')).toLowerCase() and
compare against RATE_LIMIT_PATTERNS.map(p => p.toLowerCase()).some(...)), or if
RATE_LIMIT_PATTERNS contains RegExp entries, use p.test(msg) and ensure those
regexes include the /i flag; refer to isRateLimitError and RATE_LIMIT_PATTERNS
when making the change.

Comment thread src/utils/output.ts
Comment on lines +10 to +14
const slug = query
.slice(0, 60)
.replace(/[^a-zA-Z0-9]+/g, '-')
.replace(/-+$/, '')
.toLowerCase();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Slug may start with dashes or be empty for edge-case queries.

The regex only removes trailing dashes (line 13: /-+$/), so queries starting with special characters produce slugs with leading dashes. Queries containing only special characters result in an empty slug.

🛡️ Proposed fix to handle edge cases
   const slug = query
     .slice(0, 60)
     .replace(/[^a-zA-Z0-9]+/g, '-')
+    .replace(/^-+/, '')
     .replace(/-+$/, '')
-    .toLowerCase();
+    .toLowerCase() || 'research';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/output.ts` around lines 10 - 14, Update the slug generation to
strip both leading and trailing dashes and provide a fallback when the result is
empty: change the chain that builds the slug (the slug variable derived from
query) to use a regex that trims dashes at both ends (for example
replace(/^[-]+|[-]+$/g, '')) and after toLowerCase check if slug === '' and, if
so, assign a safe default like 'untitled' (or a sanitized/truncated version of
query) so edge-case queries with only special characters or leading punctuation
never produce an empty or dash-prefixed slug.

Comment thread src/utils/output.ts
- Add validateApiKey() in providers.ts that calls models.get() to verify key
- Probe quota tier with minimal generateContent call, detect free-tier from error
- Log tier warning on startup (free/paid/unknown) with actionable guidance
- Server still starts even if validation fails (graceful degradation)
Logger was writing to stdout, which is reserved for MCP JSON-RPC.
All log output (including API key validation) was being swallowed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/mcp-server.ts (1)

70-80: ⚠️ Potential issue | 🟡 Minor

goal and flags are still dead API surface.

Lines 75-77 advertise steering inputs, but Line 80 does not destructure them and nothing downstream uses them. Either wire them into the research/report flow or remove them from the schema so the tool contract matches actual behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mcp-server.ts` around lines 70 - 80, The inputSchema declares goal and
flags but the async handler signature async ({ query, depth, breadth,
existingLearnings = [], outputDir }): Promise<MCPResearchResult> does not
destructure or use them, leaving dead API surface; either remove goal and flags
from inputSchema, or update the handler to destructure goal and flags (e.g.,
include goal and flags in the parameter object) and propagate those values into
the research/report flow (pass them into the downstream functions that build the
research pipeline or report generation so they affect behavior), ensuring no
unused parameters remain and the declared contract matches actual usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai/providers.ts`:
- Around line 554-580: The probe currently marks any successful
ai.models.generateContent call as 'unknown', making 'paid' unreachable; change
the success branch in the generate probe to set tier = 'paid' and tierDetail to
a message like "Key is functional and can generate content — likely a
paid/quota-enabled key; verify quotas in AI Studio/GCP Console.", while keeping
the existing catch logic (using isRateLimitError and rateLimitSummary) for
free/limited/error cases so free-tier rate-limited keys are still detected and
other errors remain 'unknown'; update the variables tier and tierDetail in the
success path of the ai.models.generateContent probe accordingly.

In `@src/mcp-server.ts`:
- Around line 80-82: The cache key generation for the research worker misses
outputDir so cached MCPResearchResult metadata.outputPath from a prior run can
be returned to a caller requesting a different outputDir; update the cacheKey
creation (the hashKey call used to produce cacheKey) to include outputDir (or,
alternatively, bypass caching when outputDir is set) and ensure the same change
is applied to the other equivalent worker code path referenced around the second
hashKey usage (lines handling existingLearnings -> MCPResearchResult
metadata.outputPath) so cached results never leak file paths to a different
output directory.
- Around line 9-19: The module imports (including validateApiKey from
./ai/providers.js) are being evaluated before environment variables are loaded,
causing providers.js to read process.env.GEMINI_API_KEY at module scope and
fail; move the dotenv config() call (the call currently using resolve(__dirname,
'../.env.local') and config) to run before any imports that depend on env vars
(i.e., before the import of validateApiKey / ./ai/providers.js) so that
providers.js sees the loaded .env.local values at module initialization.

---

Duplicate comments:
In `@src/mcp-server.ts`:
- Around line 70-80: The inputSchema declares goal and flags but the async
handler signature async ({ query, depth, breadth, existingLearnings = [],
outputDir }): Promise<MCPResearchResult> does not destructure or use them,
leaving dead API surface; either remove goal and flags from inputSchema, or
update the handler to destructure goal and flags (e.g., include goal and flags
in the parameter object) and propagate those values into the research/report
flow (pass them into the downstream functions that build the research pipeline
or report generation so they affect behavior), ensuring no unused parameters
remain and the declared contract matches actual usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fa7076aa-8c54-4d61-88c3-1c956d4d7996

📥 Commits

Reviewing files that changed from the base of the PR and between 9191533 and 21bc90e.

📒 Files selected for processing (2)
  • src/ai/providers.ts
  • src/mcp-server.ts

Comment thread src/ai/providers.ts Outdated
Comment on lines +554 to +580
// Step 2: Probe quota tier with a minimal generate call
let tier: 'free' | 'paid' | 'unknown' = 'unknown';
let tierDetail: string | undefined;
try {
await ai.models.generateContent({
model,
contents: 'ping',
config: { maxOutputTokens: 1 },
});
// If we get here, the key works for generation. We can't definitively
// determine tier from a success, but at least it's functional.
tier = 'unknown';
tierDetail = 'Key is functional. Tier cannot be determined from a successful call — check AI Studio or GCP Console for quota details.';
} catch (err: unknown) {
const msg = err instanceof Error ? err.message : String(err);
if (msg.includes('free_tier') || msg.includes('FreeTier')) {
tier = 'free';
tierDetail = 'API key is on FREE TIER. Rate limits: ~20 req/day/model. Link billing in GCP Console to upgrade.';
} else if (isRateLimitError(err)) {
// Rate limited but not explicitly free-tier — could be either tier
tier = 'unknown';
tierDetail = `Rate limited on probe: ${rateLimitSummary(err)}. This may indicate free-tier or heavy usage.`;
} else {
// Some other error on generate — key itself validated via models.get above
tier = 'unknown';
tierDetail = `Generation probe returned: ${msg.slice(0, 200)}`;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The quota probe never identifies a working free-tier or paid key.

Lines 563-566 classify every successful generateContent probe as unknown, so paid is unreachable and a normal free-tier key also stays unknown until it is already rate-limited. That means the startup warning misses the exact case this API is trying to surface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai/providers.ts` around lines 554 - 580, The probe currently marks any
successful ai.models.generateContent call as 'unknown', making 'paid'
unreachable; change the success branch in the generate probe to set tier =
'paid' and tierDetail to a message like "Key is functional and can generate
content — likely a paid/quota-enabled key; verify quotas in AI Studio/GCP
Console.", while keeping the existing catch logic (using isRateLimitError and
rateLimitSummary) for free/limited/error cases so free-tier rate-limited keys
are still detected and other errors remain 'unknown'; update the variables tier
and tierDetail in the success path of the ai.models.generateContent probe
accordingly.

Comment thread src/mcp-server.ts Outdated
The SDK swallows response headers, so tier detection via generateContent
always returned 'unknown'. Now makes a raw fetch to the REST API and
inspects x-ratelimit-limit-requests headers to classify free (<= 50)
vs paid tier. Falls back to error body parsing for exhausted quotas.
…efully

Google's Gemini API does not return x-ratelimit-* headers on responses.
Updated probe logic to:
- Future-proof header detection (if Google adds them later)
- Detect free tier from error body keywords (free_tier/FreeTier)
- Detect expired/invalid keys from REST probe even when SDK works
- Parse usageMetadata from successful responses
- Provide clear guidance to check AI Studio for tier info

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/mcp-server.ts (2)

80-82: ⚠️ Potential issue | 🟠 Major

Include outputDir in the cache key to avoid path mismatches.

The cache key excludes outputDir, so a second request with identical query parameters but a different outputDir will return the cached metadata.outputPath pointing to the first caller's directory — and no files will be written to the requested location.

🔧 Proposed fix
  async ({ query, depth, breadth, existingLearnings = [], outputDir }): Promise<MCPResearchResult> => {
+    // Resolve output dir early so it can be included in cache key
+    const resolvedOutputDir = resolveOutputDir(outputDir, resolve(__dirname, '../output'));
+
    // 1. Create cache key
-    const cacheKey = hashKey({ query, depth, breadth, existingLearnings });
+    const cacheKey = hashKey({ query, depth, breadth, existingLearnings, outputDir: resolvedOutputDir });

Then remove the duplicate resolveOutputDir call at line 138.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mcp-server.ts` around lines 80 - 82, The cache key created by hashKey({
query, depth, breadth, existingLearnings }) omits outputDir, causing cached
metadata.outputPath to point to the wrong directory; update the cacheKey to
include outputDir (i.e., hashKey({ query, depth, breadth, existingLearnings,
outputDir })) and then remove the duplicated resolveOutputDir call (the second
resolveOutputDir invocation) so output path resolution only happens once; ensure
references to metadata.outputPath are consistent with the resolved outputDir.

75-80: ⚠️ Potential issue | 🟡 Minor

Unused input parameters: goal and flags.

The schema defines goal (line 75) and flags (line 77), but the handler on line 80 does not destructure or use them. Either remove them from the schema or implement their functionality.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mcp-server.ts` around lines 75 - 80, The handler signature for the async
function returning Promise<MCPResearchResult> declares and uses { query, depth,
breadth, existingLearnings = [], outputDir } but the zod schema also defines
goal and flags which are unused; either add goal and flags to the handler
destructuring (async ({ query, depth, breadth, existingLearnings = [],
outputDir, goal, flags }): Promise<MCPResearchResult> => ...) and propagate them
into whatever synthesis/research functions or result construction that needs
steering or runtime flags, or remove goal and flags from the schema entirely if
they are not required; make sure any downstream calls that rely on goal/flags
are updated to accept those parameters when you add them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/mcp-server.ts`:
- Around line 9-19: The import of validateApiKey from providers.js runs before
dotenv is loaded because ESM imports are hoisted; fix by loading env first and
then dynamically importing the providers module (e.g., call
config({...})/resolveOutputDir/whatever env setup, then await
import('./ai/providers.js') to obtain validateApiKey) so GEMINI_API_KEY is
present when providers initializes; alternatively, remove module-scope throws in
providers.ts and switch to lazy initialization (e.g., export a function that
reads/validates the key on first use) and update usages to call that function
instead.

---

Duplicate comments:
In `@src/mcp-server.ts`:
- Around line 80-82: The cache key created by hashKey({ query, depth, breadth,
existingLearnings }) omits outputDir, causing cached metadata.outputPath to
point to the wrong directory; update the cacheKey to include outputDir (i.e.,
hashKey({ query, depth, breadth, existingLearnings, outputDir })) and then
remove the duplicated resolveOutputDir call (the second resolveOutputDir
invocation) so output path resolution only happens once; ensure references to
metadata.outputPath are consistent with the resolved outputDir.
- Around line 75-80: The handler signature for the async function returning
Promise<MCPResearchResult> declares and uses { query, depth, breadth,
existingLearnings = [], outputDir } but the zod schema also defines goal and
flags which are unused; either add goal and flags to the handler destructuring
(async ({ query, depth, breadth, existingLearnings = [], outputDir, goal, flags
}): Promise<MCPResearchResult> => ...) and propagate them into whatever
synthesis/research functions or result construction that needs steering or
runtime flags, or remove goal and flags from the schema entirely if they are not
required; make sure any downstream calls that rely on goal/flags are updated to
accept those parameters when you add them.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 940d329c-6c71-462c-b538-7d44ab7371a8

📥 Commits

Reviewing files that changed from the base of the PR and between e6e9c0f and 956770e.

📒 Files selected for processing (2)
  • src/ai/providers.ts
  • src/mcp-server.ts

Comment thread src/mcp-server.ts Outdated
Comment on lines +9 to +19
import { validateApiKey } from './ai/providers.js';
import { LRUCache } from 'lru-cache';
import { logger } from './logger.js';
import { resolveOutputDir, writeResearchOutput } from './utils/output.js';


// Get the directory name of the current module
const __dirname = fileURLToPath(new URL('.', import.meta.url));

// Load environment variables from .env.local
config({ path: resolve(__dirname, '../.env.local') });
config({ path: resolve(__dirname, '../.env.local'), override: true });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

ESM hoisting defeats static config() placement — use dynamic import.

The past review correctly identified that providers.ts throws at module scope if GEMINI_API_KEY is missing. However, simply moving config() before the import statement won't work in ESM — all import declarations are hoisted and evaluated before any module code runs.

To ensure .env.local is loaded before providers.ts reads GEMINI_API_KEY, use a dynamic import:

🔧 Proposed fix using dynamic import
 import { config } from 'dotenv';
 import { resolve } from 'node:path';
 import { fileURLToPath } from 'node:url';
+
+const __dirname = fileURLToPath(new URL('.', import.meta.url));
+
+// Load environment variables BEFORE importing modules that depend on them
+config({ path: resolve(__dirname, '../.env.local'), override: true });
+
 import { createHash } from 'node:crypto';
 import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
 import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
 import { z } from "zod";
 import { research, writeFinalReport, type ResearchProgress, type ResearchOptions } from "./deep-research.js";
-import { validateApiKey } from './ai/providers.js';
+const { validateApiKey } = await import('./ai/providers.js');
 import { LRUCache } from 'lru-cache';
 import { logger } from './logger.js';
 import { resolveOutputDir, writeResearchOutput } from './utils/output.js';
-
-
-// Get the directory name of the current module
-const __dirname = fileURLToPath(new URL('.', import.meta.url));
-
-// Load environment variables from .env.local
-config({ path: resolve(__dirname, '../.env.local'), override: true });

Alternatively, refactor providers.ts to lazy-initialize the API key check instead of throwing at module scope.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mcp-server.ts` around lines 9 - 19, The import of validateApiKey from
providers.js runs before dotenv is loaded because ESM imports are hoisted; fix
by loading env first and then dynamically importing the providers module (e.g.,
call config({...})/resolveOutputDir/whatever env setup, then await
import('./ai/providers.js') to obtain validateApiKey) so GEMINI_API_KEY is
present when providers initializes; alternatively, remove module-scope throws in
providers.ts and switch to lazy initialization (e.g., export a function that
reads/validates the key on first use) and update usages to call that function
instead.

…ver .env.local

When the MCP host (VS Code) passes GEMINI_API_KEY via mcp.json env,
it should take precedence over .env.local which may have stale keys.
Changed from override:true to override:false so .env.local is only
a fallback when no env var is already set.
- Fix feedback crash on empty conductResearch analysis
- Fix impossible validateAcademicOutput check in feedback path
- Fix MCP cache key missing goal/outputDir fields
- Fix URL dedupe comparing query strings instead of URLs
- Fix double progress counting in recursive research
- Skip no-op Firecrawl processSerpResult on empty data
- Fix output-manager timer never resetting after flush
- Fix divide-by-zero in progress bars (terminal-utils, output-manager)
- Align feedbackPromptTemplate schema with Zod/JSON schema (strings not objects)
- Relax node engine requirement to >=20
… learnings dump

- Remove 'Key Learnings' section that duplicated report body content
- Remove boilerplate 'Methodology' and 'Limitations' sections
- Instruct LLM to produce inline numbered citations [1], [2] etc.
- Pass source URLs to report writer for proper citation linking
- Generate formatted bibliography from cited sources
- Update system prompt to guide academic paper structure
- Update outline generation for thematic academic sections
- Add strict writing style constraints to report body prompt
- Add paragraph-only instruction to summary generation
- Update system prompt with mandatory style rules
- Prefer commas, semicolons, parentheses over dashes
- Add md-to-pdf dependency for automated PDF conversion
- Prepend YAML frontmatter (A4, Georgia serif header/footer) to reports
- Auto-copy report-style.css to output directory
- Generate PDF alongside markdown after each research run
- Add pdfPath to MCP response metadata
- Move report-style.css to tracked assets/ directory
- PDF generation is best-effort (non-blocking on failure)
Add SourcedLearning interface (text + sourceUrl) to preserve the
link between extracted learnings and their source URLs throughout
the research pipeline.

- processGeminiResponse pairs each learning with its source URL
- ProcessResult and ResearchResult carry sourcedLearnings arrays
- writeReportFromOutline annotates findings with [source: N] markers
  and builds a deduplicated numbered source list for the LLM
- All entry points (MCP server, HTTP server, CLI) pass sourcedLearnings
  through to writeFinalReport
- Falls back to flat learnings + URL list when no sourcedLearnings exist
Comment thread src/mcp-http-server.ts
breadth: z.number().min(1).max(5).optional().describe("How broad to make each research level (1-5)"),
existingLearnings: z.array(z.string()).optional().describe("Optional learnings to build upon"),
goal: z.string().optional().describe("Optional goal/brief to steer synthesis"),
outputDir: z.string().optional().describe("Absolute path to save output report and learnings. Falls back to OUTPUT_DIR env var, then ./output/"),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL: Path traversal vulnerability - outputDir allows arbitrary absolute paths without validation

The outputDir parameter accepts any string and passes it directly to resolve(), enabling path traversal attacks. An attacker could specify paths like '../../../etc/passwd' to write files outside intended directories. Add path validation to ensure the resolved path stays within allowed directories.

Comment thread src/mcp-http-server.ts

const httpServer = createServer(async (req, res) => {
// CORS
res.setHeader('Access-Control-Allow-Origin', process.env.CORS_ORIGIN || '*');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Overly permissive CORS configuration

Setting Access-Control-Allow-Origin to '*' allows requests from any origin, which could be a security risk if the server is exposed beyond localhost. Consider restricting to specific allowed origins or using environment variable validation.

@kilo-code-bot

kilo-code-bot Bot commented May 13, 2026

Copy link
Copy Markdown

Code Review Summary

Status: Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 1
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

CRITICAL

File Line Issue
src/mcp-http-server.ts 66 Path traversal vulnerability - outputDir allows arbitrary absolute paths without validation

WARNING

File Line Issue
src/mcp-http-server.ts 239 Overly permissive CORS configuration
Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
src/mcp-server.ts 79 Similar path traversal vulnerability in stdio MCP server
Files Reviewed (14 files)
  • src/deep-research.ts - Major changes to report generation and Gemini response processing
  • src/feedback.ts - Updates to progress handling and validation
  • src/logger.ts - Modified to use stderr for logs
  • src/mcp-http-server.ts - New HTTP transport server with security concerns
  • src/mcp-server.ts - Enhanced with language support and file output
  • src/output-manager.ts - Changes to use stderr for terminal output
  • src/prompt.ts - Restructured prompts and system instructions
  • src/run.ts - Updated to handle sourced learnings
  • src/terminal-utils.ts - Added safe division in progress calculations
  • src/types.ts - Added SourcedLearning interface
  • src/utils/errors.ts - New error classification utilities
  • src/utils/json.ts - Improved JSON extraction logic
  • src/utils/output.ts - New file output and PDF generation utilities

Fix these issues in Kilo Cloud


Reviewed by grok-code-fast-1:optimized:free · 433,028 tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant