Implement support for searching by type #787

rdimitrov · 2025-11-19T10:24:20Z

Motivation and Context

The following PR adds a query parameter for searching by the distribution type (remote, npm, nuget, etc). This is so registry clients supporting nuget can query just for those (less load on the registry server + less load on the client to go through all results).

Note that this is not a breaking change.

How Has This Been Tested?

Locally + unit tests

Breaking Changes

No

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

Fixes: #781

Signed-off-by: Radoslav Dimitrov <[email protected]>

domdomegg · 2025-11-19T12:42:17Z

docs/reference/api/openapi.yaml

+          description: |
+            Filter by distribution type. Allows discovering servers based on how they are made available to users.
+
+            Example values include: 'remote', 'npm', 'pypi', 'oci', 'nuget', 'mcpb'


Would it be worth changing remote to sse and streamable-http? I know that at Anthropic we maybe have plans to deprecate sse so being able to filter by the actual remote type might be useful - and it also aligns with filtering by the package type.

I think that's reasonable 👍 Would it be a problem if we continue to have remote though as an option that aggregates all remote servers?

maybe a slight preference for making this composable in some way, e.g. you can specify the type multiple times or you can comma-separate types or something? then we don't need remote, you can just specify streamable-http,sse

domdomegg · 2025-11-19T13:09:51Z

Meta: I might want one extra reviewer to check this as well - adding an endpoint is something that is hard to take back later so better to double check. Maybe we can discuss later today in maintainers meeting?

rdimitrov · 2025-11-19T15:15:30Z

Meta: I might want one extra reviewer to check this as well - adding an endpoint is something that is hard to take back later so better to double check. Maybe we can discuss later today in maintainers meeting?

Just to chime in that this adds a property to an existing endpoint and not a new one 👍

tadasant

In concept, I am supportive of adding "filter" functionality to the registry spec. It's clear that for ETL processes pulling down entries from the Registry or a sub-registry, they will often just want to pull down a subset based on downstream support of features.

However, I'm not a fan of adding a type query parameter here. If we do this for type, there's an explosion of other filters we would create a precedent for adding. For example uses_template_variables, has_git_repository, uses_semantic_versioning, etc etc. It wouldn't be obvious to me why we would support type and not those others.

To land this, I think we need to make the feature more generic. Rather then encoding every possible "filter" into the spec, the caller should be able to express a filter that the server then applies on its response.

One option is to take a page from WoT Discovery and use JSONPath. The queries it would create are ugly, but it'd be a nice standardized way to allow consumers to filter whatever fields they care about. My main concern with this approach would be whether the server side implementation is very nonperformant, and whether we would need to lock down what is allowed. A solution for that: the generic schema could support JSONPath, but the public registry API could just support very specific predefined JSONPath queries (this type one being one of them).

I think I'd be good with a JSONPath solution, though also am curious for more/alternative ideas that would fulfill the "flexible for consumers" kind of requirement here.

@claude do you have any ideas? Propose a few alternatives to JSONPath, and share your opinion on whether JSONPath is a good solution here for a generic filtering capability.

claude · 2025-11-19T17:22:14Z

Claude finished @rdimitrov's task —— View job

Generic Filtering Approaches for Registry API

I've analyzed the PR and filtering requirements. Here are several alternatives to JSONPath, along with my evaluation:

Proposed Alternatives

1. OData-style Query Parameters (Recommended)

Uses query string syntax inspired by OData but simplified for this use case.

Examples:

?filter=packages/any(p: p/registryType eq 'npm')
?filter=version eq '1.0.0' and remotes ne null
?filter=repository/source eq 'github'

Pros:

Industry standard (Microsoft, SAP, etc.)
Human-readable and URL-friendly
Easily validated and rate-limited
Can be incrementally adopted (start with simple equality, add operators later)
Good tooling support in many languages

Cons:

Custom parser needed (though libraries exist)
Slightly verbose syntax
Need to define supported operators carefully

2. GraphQL-style Field Selection

Allow clients to specify what they want using GraphQL-like syntax in query params.

Examples:

?where={packages:{registryType:"npm"}}
?where={repository:{source:"github"},version:"1.0.0"}

Pros:

Modern, widely understood syntax
JSON-based so familiar to developers
Easy to parse (just JSON)
Can express complex nested conditions

Cons:

Can become large in URLs
Requires URL encoding which hurts readability
May encourage overly complex queries

3. RSQL/FIQL (RESTful Service Query Language)

A query language designed specifically for REST APIs.

Examples:

?filter=packages.registryType==npm
?filter=version==1.0.0;repository.source==github
?filter=packages.registryType=in=(npm,pypi)

Pros:

Designed for REST/URLs specifically
Clean, compact syntax
Well-defined operators (==, !=, =lt=, =gt=, =in=, etc.)
Libraries available in multiple languages

Cons:

Less well-known than OData or GraphQL
Operators like =in= may look unusual to some developers

4. Predefined Named Filters

Create a set of common filter presets that can be extended over time.

Examples:

?filter=nuget-only
?filter=npm-with-github
?filter=has-remotes

Pros:

Dead simple for common cases
Easy to implement and maintain
Can be optimized per filter
Great developer experience for common patterns
Backwards compatible when adding new filters

Cons:

Not flexible for arbitrary queries
Need to add new filters over time
Doesn't solve the "explosion of filters" problem, just organizes it

5. Simplified JSONPath (Restricted Subset)

Use JSONPath but limit to simple expressions only.

Examples:

?filter=$.packages[?(@.registryType=='npm')]
?filter=$.repository.source=='github'

Pros:

Standard specification (RFC 9535)
Powerful when needed
Good library support

Cons:

Complex syntax, poor readability in URLs
Overkill for most filtering needs
Harder to validate and secure
Performance concerns with complex expressions
The examples from WoT are indeed ugly in practice

My Recommendation

I recommend a hybrid approach:

Short term (for this PR): Use simple query parameters for the most common filters:
- ?type=npm (current PR approach)
- ?type=npm,pypi (comma-separated for OR)
- Keep it simple and solve the immediate NuGet client need
Medium term (next few months): Implement RSQL/FIQL or simplified OData for generic filtering:
- Both are designed for REST APIs
- URL-friendly
- Can be incrementally adopted
- Easy to whitelist/blacklist certain fields or operators
- RSQL slight edge for simplicity; OData for familiarity
Include an escape hatch: For complex ETL needs, consider:
- Allowing bulk export/streaming endpoints
- Supporting filter presets for complex queries
- Providing client SDKs that handle complexity

Evaluation of JSONPath

Is JSONPath a good solution here?

No, I don't think so, for these reasons:

URL readability: JSONPath expressions become unwieldy in URLs. Compare:
- JSONPath: ?filter=$.packages[?(@.registryType=='npm')]
- RSQL: ?filter=packages.registryType==npm
- Simple: ?type=npm
Overkill: JSONPath's power (selecting array elements, recursive descent, complex predicates) isn't needed for filtering tabular results. The registry is returning a flat list of servers, not navigating complex document structures.
Security: JSONPath can be expensive to evaluate and hard to limit. RSQL/OData make it easier to whitelist operators and fields.
Developer experience: Most developers won't be familiar with JSONPath syntax, whereas simple equality/comparison operators are universal.
Performance: JSONPath expressions may encourage inefficient queries that scan entire datasets rather than using indexes.

Implementation Path Forward

For this specific PR, I suggest:

Keep the simple type parameter but enhance it to accept multiple values:

?type=npm           # Single type
?type=npm,pypi      # Multiple types (OR logic)

Document this as v0.1 of filtering with a note that more sophisticated filtering will come in future versions
Create a follow-up issue to design the generic filtering approach using RSQL or OData
In the meantime, clients like NuGet can use ?type=nuget to solve their immediate needs

This approach:

✅ Solves the immediate problem (NuGet filtering)
✅ Doesn't create bad precedent (multiple values via comma-separation is extensible)
✅ Respects the API freeze (additive change only)
✅ Buys time to design generic filtering properly
✅ Aligns with discussion in review comments about making it composable

References:

RSQL/FIQL: https://github.com/jirutka/rsql-parser
OData: https://www.odata.org/documentation/odata-version-3-0/url-conventions/
JSONPath RFC: https://datatracker.ietf.org/doc/rfc9535/

Analysis of Current PR Implementation

The current PR adds a simple type query parameter that filters by distribution type. The implementation:

✅ Is non-breaking (optional parameter)
✅ Validates input against known types
✅ Works at the database level for efficiency
✅ Solves the immediate NuGet client need
⚠️ Could be enhanced to support multiple types (comma-separated)
⚠️ Needs decision on whether to keep as-is or evolve to generic filtering

tadasant · 2025-11-20T14:49:48Z

After poking around some more, I can't find any good precedent for introducing something like what I'm suggesting into a major piece of ecosystem infra. In fact it looks like NuGet used to use OData and migrated away in their v3, and no other reasonable precedent exists.

I actually wonder if 4. Predefined Named Filters from Claude's response might be the best way to go. No performance concerns, still flexible and every sub-registry can have their own "predefined" sets that they care about. It's actually not that different from your original proposal conceptually, where you are introducing a new notion of type that could mean remote, could mean local npm, etc. We just tweak by:

Stepping up the abstraction layer (filter instead of type as the key)
Only define the key and not the values in the sub-registry spec (we don't need to make decisions on whether 'remote', 'npm', 'pypi', 'oci', 'nuget', 'mcpb' etc are the right values - any sub-registry can come up with any filters they please; maybe the official registry can offer some popular ones that folks might standardize around, but we could tweak them without breaking API changes over time)

What do you think @rdimitrov? Would that still fulfill your needs with the benefit of avoiding the explosion of modeling for us?

@claude what's your deeper take on this path for the MCP Registry?

claude · 2025-11-20T14:50:05Z

Claude finished @tadasant's task —— View job

Deep Analysis: Predefined Named Filters for MCP Registry

After reviewing the PR implementation and ecosystem architecture, I have strong opinions on the predefined filters approach. TL;DR: I recommend it highly, with some refinements.

Why Predefined Named Filters Work Well Here

The MCP Registry has unique characteristics that make predefined filters particularly suitable:

1. Subregistry ETL Is Core to the Architecture

From the ecosystem vision docs, the registry is explicitly designed for subregistries to ETL from it:

"Additionally, we expect clients pull from subregistries. These subregistries add value to the registry ecosystem by providing curation, or extending it with additional metadata. The Official MCP registry expects a lot of API requests from ETL jobs from these subregistries."

Key insight: Subregistries will be pulling the entire dataset or large subsets regularly. They're not doing one-off queries that need complex filtering. The pattern is:

ETL job runs periodically (hourly, daily)
Pulls entire dataset or broad category (e.g., "all npm packages")
Processes locally with their own custom logic
Adds curation, ratings, enhanced metadata

This means the registry doesn't need to support arbitrary complex queries. It needs to support efficient bulk operations with a few common filters.

2. Finite, Stable Set of Distribution Types

Looking at the current implementation, there are exactly 6 distribution types:

remote (SSE, streamable-http)
npm
pypi
oci
nuget
mcpb

This is not an unbounded problem space. Package ecosystems are stable and well-defined. We're not likely to add 50 more distribution types. Even if we do add a few over time (e.g., cargo, gem, composer), this remains manageable.

3. API Freeze Makes Extensibility Critical

You're in an API freeze (v0.1). The spec says:

"For the next month or more, the API will remain stable with no breaking changes, allowing integrators to confidently implement support."

Predefined filters are additive and non-breaking. You can:

Start with: ?filter=nuget-only
Add later: ?filter=npm-with-github, ?filter=has-remote-transport, ?filter=uses-semver
Never break existing clients

Generic filter languages (JSONPath, RSQL, OData) would be hard to evolve without breaking changes because:

Parser behavior changes are breaking
Adding/removing operator support is breaking
Field name changes break queries

Recommended Implementation: Hybrid Approach

I recommend a two-tier system that combines the benefits of both approaches:

Tier 1: Simple Field Filters (What This PR Does)

Keep simple query params for common, stable fields:

?type=npm              # Single type
?type=npm,pypi         # Multiple types (OR semantics)
?version=latest        # Already exists
?search=filesystem     # Already exists

Why this tier?

These fields are fundamental to the data model
They're stable (won't change)
They map directly to database queries (efficient)
They're composable with each other

Enhancement for this PR: Support comma-separated values for type:

// Handle type parameter with comma-separated support
if input.Type != "" {
    types := strings.Split(input.Type, ",")
    for _, t := range types {
        t = strings.TrimSpace(t)
        if !model.IsValidDistributionType(t) {
            return nil, huma.Error400BadRequest(fmt.Sprintf("Invalid type parameter: '%s' is not a valid distribution type", t))
        }
    }
    filter.ConfigTypes = types  // plural, pass as array
}

Tier 2: Named Composite Filters (Future Addition)

Add a preset or profile parameter for complex, curated queries:

?preset=official-recommended    # Servers from verified publishers
?preset=github-only            # Only servers with GitHub repos
?preset=production-ready       # Servers with semver, docs, tests
?preset=enterprise-safe        # Servers with security audits, SLAs

Why this tier?

Solves the "explosion of filters" problem through curation
Allows complex logic without exposing it in the API
Sub-registries can define their own presets
Can be optimized server-side (materialized views, caching)
Provides semantic meaning ("production-ready" is clearer than a complex filter expression)

Addressing the "Explosion of Filters" Concern

@tadasant raised a valid concern:

"If we do this for type, there's an explosion of other filters we would create a precedent for adding. For example uses_template_variables, has_git_repository, uses_semantic_versioning, etc etc."

My take: There are actually two categories of filters:

Category A: Core Structural Filters (Bounded Set)

These map to fundamental data model properties:

type - distribution type (6 values, rarely changes)
version - version filtering (already exists)
search - name search (already exists)
updated_since - temporal filtering (already exists)

These should be simple query params because:

They're intrinsic to what a server is
They're efficiently indexed in the database
They're stable across versions
They compose well: ?type=npm,pypi&version=latest&search=weather

Category B: Semantic/Quality Filters (Unbounded Set)

These represent judgments or complex conditions:

uses_template_variables - requires parsing package config
has_git_repository - requires checking repository field
production_ready - subjective, requires multiple conditions
enterprise_safe - highly subjective, requires external data

These should be named presets because:

They're subjective or complex
They may require external data or computation
They change over time as standards evolve
They benefit from curation

Concrete Proposal for This PR

For immediate merge:

Enhance the type parameter to support multiple values:

?type=npm              # Single type
?type=npm,pypi,nuget   # Multiple types (OR logic)

Document this as the "simple field filter" pattern:

# In OpenAPI spec
- name: type
  in: query
  description: |
    Filter by distribution type. Supports multiple comma-separated values for OR logic.
    
    Example values: 'remote', 'npm', 'pypi', 'oci', 'nuget', 'mcpb'
    
    Examples:
    - ?type=npm                    # Only npm packages
    - ?type=npm,pypi               # npm OR pypi packages
    - ?type=remote                 # Only remote servers
    - ?type=npm,pypi,nuget         # Any of npm, pypi, or nuget

Add implementation note in docs/design/:

"The registry supports two types of filtering: simple field filters (like type, version) that map directly to data model properties, and named presets (future) for complex composite queries. This keeps the API simple while allowing extensibility."

For follow-up issues:

Create issue: "Design named preset/profile filter system"
Create issue: "Should remote expand to sse,streamable-http?" (per review comment)

Performance and Scaling Considerations

One concern with generic filtering is performance. The predefined approach handles this well:

Simple field filters:

Map to database indexes
Can be optimized in SQL
Predictable query patterns

Named presets:

Can be materialized views
Can be cached aggressively
Can be optimized per-preset
Can evolve without API changes

Generic filter languages (JSONPath, RSQL):

Unpredictable query patterns
Difficult to index effectively
May require full table scans
Hard to rate-limit fairly

Given that the registry expects "a lot of API requests from ETL jobs," performance predictability is important.

Does This Meet @rdimitrov's Needs?

Looking at the original motivation:

"The following PR adds a query parameter for searching by the distribution type (remote, npm, nuget, etc). This is so registry clients supporting nuget can query just for those (less load on the registry server + less load on the client to go through all results)."

Yes, this approach fully solves the problem:

NuGet clients can use ?type=nuget
It's efficient (database-level filtering)
It reduces load on both server and client
It's simple and obvious

The enhancement to support ?type=npm,pypi,nuget makes it even more flexible for clients that support multiple ecosystems.

Comparison to Other Registry APIs

Let me check what precedents exist in practice:

NPM Registry (v3):

Removed OData filtering
Uses simple query params: ?text=search, ?size=20
No complex filtering - clients ETL and process locally

PyPI (JSON API):

Simple endpoints: /pypi/{package}/json
No filtering API - clients scrape the entire simple index
Focus on bulk downloads

Docker Hub:

Simple params: ?name=foo, ?is_official=true
No complex query language
Pagination-focused

NuGet:

Migrated away from OData in v3
Search API with simple parameters
Focus on text search and pagination

Pattern: Major package registries converged on simple parameters + client-side processing. None use complex filter languages at scale.

My Recommendation Summary

For this PR: ✅ Approve with enhancement

Add comma-separated value support: ?type=npm,pypi
Keep validation of individual types
Document the pattern clearly

Future direction: 🎯 Two-tier filtering

Tier 1: Simple field filters for core properties (type, version, search)
Tier 2: Named presets for complex queries (preset=production-ready)

Rationale:

✅ Solves immediate NuGet client need
✅ Aligns with ecosystem architecture (bulk ETL, not ad-hoc queries)
✅ Matches patterns from successful package registries
✅ Non-breaking and extensible
✅ Performance-predictable
✅ Simple developer experience
✅ Respects API freeze constraints

What I don't recommend:

❌ Generic filter languages (JSONPath, RSQL, OData) - overkill and performance unpredictable
❌ GraphQL - wrong tool for this use case
❌ Unlimited field filtering - creates precedent problems

Implementation Notes

If you enhance this PR to support comma-separated types, here's what needs to change:

Update internal/database/database.go - Change ConfigType *string to ConfigTypes []string
Update internal/database/postgres.go - Use = ANY($n) instead of = $n in SQL
Update internal/api/handlers/v0/servers.go - Parse comma-separated values
Update tests - Add test cases for multiple types
Update OpenAPI spec - Document comma-separated format

This is a small, isolated change that significantly improves composability without adding complexity.

Does this analysis help clarify the path forward? I'm happy to dive deeper into any aspect or help with implementation details if needed.

tadasant · 2025-11-20T14:53:31Z

I don't agree with Claude's suggestion to split into 2 functionalities (would rather we just focus on filter as a key), FWIW.

jonathanhefner · 2025-11-20T15:11:08Z

Rather than type, perhaps category? Then servers could automatically be placed in zero or more categories (e.g., ["remote", "sse", "streamable-http"]). And we could add arbitrary categories in the future.

Though that does introduce the question of whether ?category= with multiple categories should be a union versus an intersection.

tadasant · 2025-11-21T14:22:34Z

Rather than type, perhaps category? Then servers could automatically be placed in zero or more categories (e.g., ["remote", "sse", "streamable-http"]). And we could add arbitrary categories in the future.

Though that does introduce the question of whether ?category= with multiple categories should be a union versus an intersection.

@jonathanhefner did you see my suggestion of filter instead of type? Curious what you think of filter.

I don't love category personally, because category is typically a mutually exclusive designator (items can only be in one category). tags would be more typically like what you are proposing (item can be in multiple), but indeed the comma-separation being union or intersection is a very real problem.

I do think making the key filter and allowing custom values solves all these problems (a sub-registry could implement remote-and-npm as a filter just as easily as remote-or-npm). Maybe eventually (or even out of the gate) we make a list of "recommended filter values to implement".

jonathanhefner · 2025-11-21T16:14:00Z

did you see my suggestion of filter instead of type? Curious what you think of filter.

I don't love category personally, because category is typically a mutually exclusive designator (items can only be in one category).

I was interpreting filter as mutually exclusive. 😆 But also, I was interpreting filter as manually defined rather than automatically defined. For example, if we add support for a new package registry, would we automatically have a new filter for that registry? For category or tag, I would assume "yes". If the answer for filter is also "yes", then I think that sounds good! 😃

rdimitrov added 3 commits November 19, 2025 12:18

Implement support for searching by type

33c13f9

Signed-off-by: Radoslav Dimitrov <[email protected]>

Add support for searching by type in the reference API spec

456cf13

Signed-off-by: Radoslav Dimitrov <[email protected]>

Resolve linting errors

eec73f4

Signed-off-by: Radoslav Dimitrov <[email protected]>

rdimitrov requested a review from domdomegg November 19, 2025 10:59

domdomegg reviewed Nov 19, 2025

View reviewed changes

tadasant requested changes Nov 19, 2025

View reviewed changes

Implement support for searching by type #787

Are you sure you want to change the base?

Implement support for searching by type #787

Conversation

rdimitrov commented Nov 19, 2025

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

domdomegg Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

rdimitrov Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

domdomegg Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

domdomegg commented Nov 19, 2025

Uh oh!

rdimitrov commented Nov 19, 2025

Uh oh!

tadasant left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generic Filtering Approaches for Registry API

Proposed Alternatives

1. OData-style Query Parameters (Recommended)

2. GraphQL-style Field Selection

3. RSQL/FIQL (RESTful Service Query Language)

4. Predefined Named Filters

5. Simplified JSONPath (Restricted Subset)

My Recommendation

Evaluation of JSONPath

Implementation Path Forward

Uh oh!

tadasant commented Nov 20, 2025

Uh oh!

claude bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deep Analysis: Predefined Named Filters for MCP Registry

Why Predefined Named Filters Work Well Here

1. Subregistry ETL Is Core to the Architecture

2. Finite, Stable Set of Distribution Types

3. API Freeze Makes Extensibility Critical

Recommended Implementation: Hybrid Approach

Tier 1: Simple Field Filters (What This PR Does)

Tier 2: Named Composite Filters (Future Addition)

Addressing the "Explosion of Filters" Concern

Category A: Core Structural Filters (Bounded Set)

Category B: Semantic/Quality Filters (Unbounded Set)

Concrete Proposal for This PR

Performance and Scaling Considerations

Does This Meet @rdimitrov's Needs?

Comparison to Other Registry APIs

My Recommendation Summary

Implementation Notes

Uh oh!

tadasant commented Nov 20, 2025

Uh oh!

jonathanhefner commented Nov 20, 2025

Uh oh!

tadasant commented Nov 21, 2025

Uh oh!

jonathanhefner commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

claude bot commented Nov 19, 2025 •

edited

Loading

claude bot commented Nov 20, 2025 •

edited

Loading