Skip to content

Conversation

@rdimitrov
Copy link
Member

Motivation and Context

The following PR adds a query parameter for searching by the distribution type (remote, npm, nuget, etc). This is so registry clients supporting nuget can query just for those (less load on the registry server + less load on the client to go through all results).

Note that this is not a breaking change.

How Has This Been Tested?

Locally + unit tests

Breaking Changes

No

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Fixes: #781

@rdimitrov rdimitrov requested a review from domdomegg November 19, 2025 10:59
description: |
Filter by distribution type. Allows discovering servers based on how they are made available to users.
Example values include: 'remote', 'npm', 'pypi', 'oci', 'nuget', 'mcpb'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth changing remote to sse and streamable-http? I know that at Anthropic we maybe have plans to deprecate sse so being able to filter by the actual remote type might be useful - and it also aligns with filtering by the package type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's reasonable 👍 Would it be a problem if we continue to have remote though as an option that aggregates all remote servers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a slight preference for making this composable in some way, e.g. you can specify the type multiple times or you can comma-separate types or something? then we don't need remote, you can just specify streamable-http,sse

@domdomegg
Copy link
Member

Meta: I might want one extra reviewer to check this as well - adding an endpoint is something that is hard to take back later so better to double check. Maybe we can discuss later today in maintainers meeting?

@rdimitrov
Copy link
Member Author

Meta: I might want one extra reviewer to check this as well - adding an endpoint is something that is hard to take back later so better to double check. Maybe we can discuss later today in maintainers meeting?

Just to chime in that this adds a property to an existing endpoint and not a new one 👍

Copy link
Member

@tadasant tadasant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In concept, I am supportive of adding "filter" functionality to the registry spec. It's clear that for ETL processes pulling down entries from the Registry or a sub-registry, they will often just want to pull down a subset based on downstream support of features.

However, I'm not a fan of adding a type query parameter here. If we do this for type, there's an explosion of other filters we would create a precedent for adding. For example uses_template_variables, has_git_repository, uses_semantic_versioning, etc etc. It wouldn't be obvious to me why we would support type and not those others.

To land this, I think we need to make the feature more generic. Rather then encoding every possible "filter" into the spec, the caller should be able to express a filter that the server then applies on its response.

One option is to take a page from WoT Discovery and use JSONPath. The queries it would create are ugly, but it'd be a nice standardized way to allow consumers to filter whatever fields they care about. My main concern with this approach would be whether the server side implementation is very nonperformant, and whether we would need to lock down what is allowed. A solution for that: the generic schema could support JSONPath, but the public registry API could just support very specific predefined JSONPath queries (this type one being one of them).

I think I'd be good with a JSONPath solution, though also am curious for more/alternative ideas that would fulfill the "flexible for consumers" kind of requirement here.

@claude do you have any ideas? Propose a few alternatives to JSONPath, and share your opinion on whether JSONPath is a good solution here for a generic filtering capability.

@claude
Copy link
Contributor

claude bot commented Nov 19, 2025

Claude finished @rdimitrov's task —— View job


Generic Filtering Approaches for Registry API

I've analyzed the PR and filtering requirements. Here are several alternatives to JSONPath, along with my evaluation:

Proposed Alternatives

1. OData-style Query Parameters (Recommended)

Uses query string syntax inspired by OData but simplified for this use case.

Examples:

?filter=packages/any(p: p/registryType eq 'npm')
?filter=version eq '1.0.0' and remotes ne null
?filter=repository/source eq 'github'

Pros:

  • Industry standard (Microsoft, SAP, etc.)
  • Human-readable and URL-friendly
  • Easily validated and rate-limited
  • Can be incrementally adopted (start with simple equality, add operators later)
  • Good tooling support in many languages

Cons:

  • Custom parser needed (though libraries exist)
  • Slightly verbose syntax
  • Need to define supported operators carefully

2. GraphQL-style Field Selection

Allow clients to specify what they want using GraphQL-like syntax in query params.

Examples:

?where={packages:{registryType:"npm"}}
?where={repository:{source:"github"},version:"1.0.0"}

Pros:

  • Modern, widely understood syntax
  • JSON-based so familiar to developers
  • Easy to parse (just JSON)
  • Can express complex nested conditions

Cons:

  • Can become large in URLs
  • Requires URL encoding which hurts readability
  • May encourage overly complex queries

3. RSQL/FIQL (RESTful Service Query Language)

A query language designed specifically for REST APIs.

Examples:

?filter=packages.registryType==npm
?filter=version==1.0.0;repository.source==github
?filter=packages.registryType=in=(npm,pypi)

Pros:

  • Designed for REST/URLs specifically
  • Clean, compact syntax
  • Well-defined operators (==, !=, =lt=, =gt=, =in=, etc.)
  • Libraries available in multiple languages

Cons:

  • Less well-known than OData or GraphQL
  • Operators like =in= may look unusual to some developers

4. Predefined Named Filters

Create a set of common filter presets that can be extended over time.

Examples:

?filter=nuget-only
?filter=npm-with-github
?filter=has-remotes

Pros:

  • Dead simple for common cases
  • Easy to implement and maintain
  • Can be optimized per filter
  • Great developer experience for common patterns
  • Backwards compatible when adding new filters

Cons:

  • Not flexible for arbitrary queries
  • Need to add new filters over time
  • Doesn't solve the "explosion of filters" problem, just organizes it

5. Simplified JSONPath (Restricted Subset)

Use JSONPath but limit to simple expressions only.

Examples:

?filter=$.packages[?(@.registryType=='npm')]
?filter=$.repository.source=='github'

Pros:

  • Standard specification (RFC 9535)
  • Powerful when needed
  • Good library support

Cons:

  • Complex syntax, poor readability in URLs
  • Overkill for most filtering needs
  • Harder to validate and secure
  • Performance concerns with complex expressions
  • The examples from WoT are indeed ugly in practice

My Recommendation

I recommend a hybrid approach:

  1. Short term (for this PR): Use simple query parameters for the most common filters:

    • ?type=npm (current PR approach)
    • ?type=npm,pypi (comma-separated for OR)
    • Keep it simple and solve the immediate NuGet client need
  2. Medium term (next few months): Implement RSQL/FIQL or simplified OData for generic filtering:

    • Both are designed for REST APIs
    • URL-friendly
    • Can be incrementally adopted
    • Easy to whitelist/blacklist certain fields or operators
    • RSQL slight edge for simplicity; OData for familiarity
  3. Include an escape hatch: For complex ETL needs, consider:

    • Allowing bulk export/streaming endpoints
    • Supporting filter presets for complex queries
    • Providing client SDKs that handle complexity

Evaluation of JSONPath

Is JSONPath a good solution here?

No, I don't think so, for these reasons:

  1. URL readability: JSONPath expressions become unwieldy in URLs. Compare:

    • JSONPath: ?filter=$.packages[?(@.registryType=='npm')]
    • RSQL: ?filter=packages.registryType==npm
    • Simple: ?type=npm
  2. Overkill: JSONPath's power (selecting array elements, recursive descent, complex predicates) isn't needed for filtering tabular results. The registry is returning a flat list of servers, not navigating complex document structures.

  3. Security: JSONPath can be expensive to evaluate and hard to limit. RSQL/OData make it easier to whitelist operators and fields.

  4. Developer experience: Most developers won't be familiar with JSONPath syntax, whereas simple equality/comparison operators are universal.

  5. Performance: JSONPath expressions may encourage inefficient queries that scan entire datasets rather than using indexes.


Implementation Path Forward

For this specific PR, I suggest:

  1. Keep the simple type parameter but enhance it to accept multiple values:

    ?type=npm           # Single type
    ?type=npm,pypi      # Multiple types (OR logic)
    
  2. Document this as v0.1 of filtering with a note that more sophisticated filtering will come in future versions

  3. Create a follow-up issue to design the generic filtering approach using RSQL or OData

  4. In the meantime, clients like NuGet can use ?type=nuget to solve their immediate needs

This approach:

  • ✅ Solves the immediate problem (NuGet filtering)
  • ✅ Doesn't create bad precedent (multiple values via comma-separation is extensible)
  • ✅ Respects the API freeze (additive change only)
  • ✅ Buys time to design generic filtering properly
  • ✅ Aligns with discussion in review comments about making it composable

References:

Analysis of Current PR Implementation

The current PR adds a simple type query parameter that filters by distribution type. The implementation:

  • ✅ Is non-breaking (optional parameter)
  • ✅ Validates input against known types
  • ✅ Works at the database level for efficiency
  • ✅ Solves the immediate NuGet client need
  • ⚠️ Could be enhanced to support multiple types (comma-separated)
  • ⚠️ Needs decision on whether to keep as-is or evolve to generic filtering

@tadasant
Copy link
Member

After poking around some more, I can't find any good precedent for introducing something like what I'm suggesting into a major piece of ecosystem infra. In fact it looks like NuGet used to use OData and migrated away in their v3, and no other reasonable precedent exists.

I actually wonder if 4. Predefined Named Filters from Claude's response might be the best way to go. No performance concerns, still flexible and every sub-registry can have their own "predefined" sets that they care about. It's actually not that different from your original proposal conceptually, where you are introducing a new notion of type that could mean remote, could mean local npm, etc. We just tweak by:

  • Stepping up the abstraction layer (filter instead of type as the key)
  • Only define the key and not the values in the sub-registry spec (we don't need to make decisions on whether 'remote', 'npm', 'pypi', 'oci', 'nuget', 'mcpb' etc are the right values - any sub-registry can come up with any filters they please; maybe the official registry can offer some popular ones that folks might standardize around, but we could tweak them without breaking API changes over time)

What do you think @rdimitrov? Would that still fulfill your needs with the benefit of avoiding the explosion of modeling for us?

@claude what's your deeper take on this path for the MCP Registry?

@claude
Copy link
Contributor

claude bot commented Nov 20, 2025

Claude finished @tadasant's task —— View job


Deep Analysis: Predefined Named Filters for MCP Registry

After reviewing the PR implementation and ecosystem architecture, I have strong opinions on the predefined filters approach. TL;DR: I recommend it highly, with some refinements.


Why Predefined Named Filters Work Well Here

The MCP Registry has unique characteristics that make predefined filters particularly suitable:

1. Subregistry ETL Is Core to the Architecture

From the ecosystem vision docs, the registry is explicitly designed for subregistries to ETL from it:

"Additionally, we expect clients pull from subregistries. These subregistries add value to the registry ecosystem by providing curation, or extending it with additional metadata. The Official MCP registry expects a lot of API requests from ETL jobs from these subregistries."

Key insight: Subregistries will be pulling the entire dataset or large subsets regularly. They're not doing one-off queries that need complex filtering. The pattern is:

  • ETL job runs periodically (hourly, daily)
  • Pulls entire dataset or broad category (e.g., "all npm packages")
  • Processes locally with their own custom logic
  • Adds curation, ratings, enhanced metadata

This means the registry doesn't need to support arbitrary complex queries. It needs to support efficient bulk operations with a few common filters.

2. Finite, Stable Set of Distribution Types

Looking at the current implementation, there are exactly 6 distribution types:

  • remote (SSE, streamable-http)
  • npm
  • pypi
  • oci
  • nuget
  • mcpb

This is not an unbounded problem space. Package ecosystems are stable and well-defined. We're not likely to add 50 more distribution types. Even if we do add a few over time (e.g., cargo, gem, composer), this remains manageable.

3. API Freeze Makes Extensibility Critical

You're in an API freeze (v0.1). The spec says:

"For the next month or more, the API will remain stable with no breaking changes, allowing integrators to confidently implement support."

Predefined filters are additive and non-breaking. You can:

  • Start with: ?filter=nuget-only
  • Add later: ?filter=npm-with-github, ?filter=has-remote-transport, ?filter=uses-semver
  • Never break existing clients

Generic filter languages (JSONPath, RSQL, OData) would be hard to evolve without breaking changes because:

  • Parser behavior changes are breaking
  • Adding/removing operator support is breaking
  • Field name changes break queries

Recommended Implementation: Hybrid Approach

I recommend a two-tier system that combines the benefits of both approaches:

Tier 1: Simple Field Filters (What This PR Does)

Keep simple query params for common, stable fields:

?type=npm              # Single type
?type=npm,pypi         # Multiple types (OR semantics)
?version=latest        # Already exists
?search=filesystem     # Already exists

Why this tier?

  • These fields are fundamental to the data model
  • They're stable (won't change)
  • They map directly to database queries (efficient)
  • They're composable with each other

Enhancement for this PR: Support comma-separated values for type:

// Handle type parameter with comma-separated support
if input.Type != "" {
    types := strings.Split(input.Type, ",")
    for _, t := range types {
        t = strings.TrimSpace(t)
        if !model.IsValidDistributionType(t) {
            return nil, huma.Error400BadRequest(fmt.Sprintf("Invalid type parameter: '%s' is not a valid distribution type", t))
        }
    }
    filter.ConfigTypes = types  // plural, pass as array
}

Tier 2: Named Composite Filters (Future Addition)

Add a preset or profile parameter for complex, curated queries:

?preset=official-recommended    # Servers from verified publishers
?preset=github-only            # Only servers with GitHub repos
?preset=production-ready       # Servers with semver, docs, tests
?preset=enterprise-safe        # Servers with security audits, SLAs

Why this tier?

  • Solves the "explosion of filters" problem through curation
  • Allows complex logic without exposing it in the API
  • Sub-registries can define their own presets
  • Can be optimized server-side (materialized views, caching)
  • Provides semantic meaning ("production-ready" is clearer than a complex filter expression)

Addressing the "Explosion of Filters" Concern

@tadasant raised a valid concern:

"If we do this for type, there's an explosion of other filters we would create a precedent for adding. For example uses_template_variables, has_git_repository, uses_semantic_versioning, etc etc."

My take: There are actually two categories of filters:

Category A: Core Structural Filters (Bounded Set)

These map to fundamental data model properties:

  • type - distribution type (6 values, rarely changes)
  • version - version filtering (already exists)
  • search - name search (already exists)
  • updated_since - temporal filtering (already exists)

These should be simple query params because:

  • They're intrinsic to what a server is
  • They're efficiently indexed in the database
  • They're stable across versions
  • They compose well: ?type=npm,pypi&version=latest&search=weather

Category B: Semantic/Quality Filters (Unbounded Set)

These represent judgments or complex conditions:

  • uses_template_variables - requires parsing package config
  • has_git_repository - requires checking repository field
  • production_ready - subjective, requires multiple conditions
  • enterprise_safe - highly subjective, requires external data

These should be named presets because:

  • They're subjective or complex
  • They may require external data or computation
  • They change over time as standards evolve
  • They benefit from curation

Concrete Proposal for This PR

For immediate merge:

  1. Enhance the type parameter to support multiple values:

    ?type=npm              # Single type
    ?type=npm,pypi,nuget   # Multiple types (OR logic)
    
  2. Document this as the "simple field filter" pattern:

    # In OpenAPI spec
    - name: type
      in: query
      description: |
        Filter by distribution type. Supports multiple comma-separated values for OR logic.
        
        Example values: 'remote', 'npm', 'pypi', 'oci', 'nuget', 'mcpb'
        
        Examples:
        - ?type=npm                    # Only npm packages
        - ?type=npm,pypi               # npm OR pypi packages
        - ?type=remote                 # Only remote servers
        - ?type=npm,pypi,nuget         # Any of npm, pypi, or nuget
  3. Add implementation note in docs/design/:

    "The registry supports two types of filtering: simple field filters (like type, version) that map directly to data model properties, and named presets (future) for complex composite queries. This keeps the API simple while allowing extensibility."

For follow-up issues:

  1. Create issue: "Design named preset/profile filter system"
  2. Create issue: "Should remote expand to sse,streamable-http?" (per review comment)

Performance and Scaling Considerations

One concern with generic filtering is performance. The predefined approach handles this well:

Simple field filters:

  • Map to database indexes
  • Can be optimized in SQL
  • Predictable query patterns

Named presets:

  • Can be materialized views
  • Can be cached aggressively
  • Can be optimized per-preset
  • Can evolve without API changes

Generic filter languages (JSONPath, RSQL):

  • Unpredictable query patterns
  • Difficult to index effectively
  • May require full table scans
  • Hard to rate-limit fairly

Given that the registry expects "a lot of API requests from ETL jobs," performance predictability is important.


Does This Meet @rdimitrov's Needs?

Looking at the original motivation:

"The following PR adds a query parameter for searching by the distribution type (remote, npm, nuget, etc). This is so registry clients supporting nuget can query just for those (less load on the registry server + less load on the client to go through all results)."

Yes, this approach fully solves the problem:

  • NuGet clients can use ?type=nuget
  • It's efficient (database-level filtering)
  • It reduces load on both server and client
  • It's simple and obvious

The enhancement to support ?type=npm,pypi,nuget makes it even more flexible for clients that support multiple ecosystems.


Comparison to Other Registry APIs

Let me check what precedents exist in practice:

NPM Registry (v3):

  • Removed OData filtering
  • Uses simple query params: ?text=search, ?size=20
  • No complex filtering - clients ETL and process locally

PyPI (JSON API):

  • Simple endpoints: /pypi/{package}/json
  • No filtering API - clients scrape the entire simple index
  • Focus on bulk downloads

Docker Hub:

  • Simple params: ?name=foo, ?is_official=true
  • No complex query language
  • Pagination-focused

NuGet:

  • Migrated away from OData in v3
  • Search API with simple parameters
  • Focus on text search and pagination

Pattern: Major package registries converged on simple parameters + client-side processing. None use complex filter languages at scale.


My Recommendation Summary

For this PR: ✅ Approve with enhancement

  • Add comma-separated value support: ?type=npm,pypi
  • Keep validation of individual types
  • Document the pattern clearly

Future direction: 🎯 Two-tier filtering

  • Tier 1: Simple field filters for core properties (type, version, search)
  • Tier 2: Named presets for complex queries (preset=production-ready)

Rationale:

  1. ✅ Solves immediate NuGet client need
  2. ✅ Aligns with ecosystem architecture (bulk ETL, not ad-hoc queries)
  3. ✅ Matches patterns from successful package registries
  4. ✅ Non-breaking and extensible
  5. ✅ Performance-predictable
  6. ✅ Simple developer experience
  7. ✅ Respects API freeze constraints

What I don't recommend:

  • ❌ Generic filter languages (JSONPath, RSQL, OData) - overkill and performance unpredictable
  • ❌ GraphQL - wrong tool for this use case
  • ❌ Unlimited field filtering - creates precedent problems

Implementation Notes

If you enhance this PR to support comma-separated types, here's what needs to change:

  1. Update internal/database/database.go - Change ConfigType *string to ConfigTypes []string
  2. Update internal/database/postgres.go - Use = ANY($n) instead of = $n in SQL
  3. Update internal/api/handlers/v0/servers.go - Parse comma-separated values
  4. Update tests - Add test cases for multiple types
  5. Update OpenAPI spec - Document comma-separated format

This is a small, isolated change that significantly improves composability without adding complexity.


Does this analysis help clarify the path forward? I'm happy to dive deeper into any aspect or help with implementation details if needed.


@tadasant
Copy link
Member

I don't agree with Claude's suggestion to split into 2 functionalities (would rather we just focus on filter as a key), FWIW.

@jonathanhefner
Copy link
Member

Rather than type, perhaps category? Then servers could automatically be placed in zero or more categories (e.g., ["remote", "sse", "streamable-http"]). And we could add arbitrary categories in the future.

Though that does introduce the question of whether ?category= with multiple categories should be a union versus an intersection.

@tadasant
Copy link
Member

Rather than type, perhaps category? Then servers could automatically be placed in zero or more categories (e.g., ["remote", "sse", "streamable-http"]). And we could add arbitrary categories in the future.

Though that does introduce the question of whether ?category= with multiple categories should be a union versus an intersection.

@jonathanhefner did you see my suggestion of filter instead of type? Curious what you think of filter.

I don't love category personally, because category is typically a mutually exclusive designator (items can only be in one category). tags would be more typically like what you are proposing (item can be in multiple), but indeed the comma-separation being union or intersection is a very real problem.

I do think making the key filter and allowing custom values solves all these problems (a sub-registry could implement remote-and-npm as a filter just as easily as remote-or-npm). Maybe eventually (or even out of the gate) we make a list of "recommended filter values to implement".

@jonathanhefner
Copy link
Member

did you see my suggestion of filter instead of type? Curious what you think of filter.

I don't love category personally, because category is typically a mutually exclusive designator (items can only be in one category).

I was interpreting filter as mutually exclusive. 😆 But also, I was interpreting filter as manually defined rather than automatically defined. For example, if we add support for a new package registry, would we automatically have a new filter for that registry? For category or tag, I would assume "yes". If the answer for filter is also "yes", then I think that sounds good! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discover remote servers (filter by type)

5 participants