Skip to content

adomi-io/mcp-global-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Global Search: Docs → Meilisearch → MCP

A self‑hosted Meilisearch powered contextual search for AI agents. Allows AI agents to search in simple terms for documentation, files, and examples that you have provided.

An end‑to‑end, container‑friendly pipeline that:

  • Downloads documentation and files from multiple sources into a local output/ folder
  • Indexes those files in Meilisearch for fast, flexible search
  • Exposes a minimal Model Context Protocol (MCP) server so AI agents can reliably query “user‑loaded” documents and fetch exact source files for grounding

Highlights

  • 📥 Unified downloader: Git and HTTP sources merged into a single tree (output/)
  • 🔎 Meilisearch indexing with smart content handling (frontmatter Markdown, YAML/JSON/CSV as structured data)
  • 🧭 Safe, explicit scope: indexes are derived from your top‑level folders; optional allow‑list restricts searches and file fetches
  • 🔌 MCP server over HTTP or stdio: list indexes, search, and fetch exact files for answer grounding
  • 🐳 Batteries included: one docker compose up -d --build runs the whole stack
  • 🛠️ Extensible by design: adjust loader rules, file filters, and environment without rebuilding images in most cases

Getting started

Warning

This project is designed to run via Docker. Install Docker Desktop if you’re on Windows or macOS.

https://www.docker.com/products/docker-desktop/

Create an .env

At minimum you must set the Meilisearch master key so dependent services can authenticate.

echo "MEILISEARCH_MASTER_KEY=$(openssl rand -hex 32)" >> .env

Optional variables you can add now or later:

# Restrict which Meilisearch indexes the MCP server will expose (space/comma/newline separated)
MEILISEARCH_ALLOWED_INDEXES="docs guides examples"

# Run containers as your host user (helps with file ownership on ./output)
UID=1000
GID=1000

Define your sources

Edit data-sources.yml to describe what to download. Use the unified config: shape. A minimal example:

config:
  sources:
    - type: git
      repo: https://github.com/example/docs.git
      subpath: docs
      ref: main
      destination: docs

    - type: http
      url: https://example.com/guide.md
      filename: guide.md
      destination: examples

Per-source filtering can be applied using include/exclude on individual source entries. For example, to exclude a lockfile from a Git source:

config:
  sources:
    - type: git
      repo: https://github.com/nitrojs/nitro.git
      subpath: docs
      destination: nitro
      exclude:
        - "pnpm-lock.yaml"

See the Downloader README for the full schema and filtering rules: src/downloader_web/README.md.

Configuration schema

config:
  sources: []
  loaders: []
  destinations: {}
  collections: {}

Full configuration example

config:
  destinations:
    docs:
      description: |
        Docs for the main project
    guides:
      description: |
        Guides and tutorials

  collections:
    project:
      description: |
        Core project documentation
      destinations:
        - docs
    learning:
      description: |
        Guides and tutorials
      destinations:
        - guides

  loaders:
    - path: guides
      type: frontmatter

  sources:
    - type: git
      repo: https://github.com/example/docs.git
      subpath: docs
      ref: main
      destination: docs
      include:
        - "**/*.md"

    - type: git
      repo: https://github.com/example/guides.git
      subpath: content
      destination: guides
      exclude:
        - "**/pnpm-lock.yaml"

    - type: http
      url: https://example.com/guide.md
      filename: getting-started.md
      destination: guides

Start the stack

docker compose up -d --build

Services and default ports:

First run will download sources, write into ./output, index them, and then expose them via the MCP server.

Tip

If you edit data-sources.yml, you can refresh downloads without restarting:

curl -X POST http://localhost:8080/refresh

Services overview

Fetches files from the internet (HTTP, Git, etc.) into ./output.

Indexes files from ./output into Meilisearch.

Exposes your indexes to AI tooling via MCP.

Compose file: docker-compose.yml ties everything together.

Typical data flow

  • downloader_web populates ./output from your configured sources
  • file_loader performs an initial full index into Meilisearch, then watches for changes
  • mcp_server lists/searches those indexes and can fetch the exact file content under ./output

Updating

  • Refresh downloads after changing data-sources.yml:
curl -X POST http://localhost:8080/refresh
  • Restart services if you change environment variables:
docker compose restart

Troubleshooting

  • Meilisearch not healthy: docker compose logs -f meilisearch
  • Downloader not ready (/health 503): check docker compose logs -f downloader_web
  • Files not indexed: verify file extensions/size limits in file_loader README and that files live under a top‑level folder in ./output
  • MCP search shows no indexes: confirm MEILISEARCH_ALLOWED_INDEXES (if set) and that file_loader created indexes in Meilisearch
  • File fetch denied from MCP: path traversal is blocked and, if an allow‑list is set, only first‑segment matches are allowed (e.g., docs/...)

About

A self‑hosted global search which allows your agents to make semantic, contextual, and hybrid searches. Allows AI agents to search in simple terms for documentation, files, and examples that you have provided. Provides a folder to sync files into your agents memory.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors