-
Notifications
You must be signed in to change notification settings - Fork 490
Implement support for searching by type #787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Radoslav Dimitrov <[email protected]>
Signed-off-by: Radoslav Dimitrov <[email protected]>
Signed-off-by: Radoslav Dimitrov <[email protected]>
| description: | | ||
| Filter by distribution type. Allows discovering servers based on how they are made available to users. | ||
| Example values include: 'remote', 'npm', 'pypi', 'oci', 'nuget', 'mcpb' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth changing remote to sse and streamable-http? I know that at Anthropic we maybe have plans to deprecate sse so being able to filter by the actual remote type might be useful - and it also aligns with filtering by the package type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's reasonable 👍 Would it be a problem if we continue to have remote though as an option that aggregates all remote servers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a slight preference for making this composable in some way, e.g. you can specify the type multiple times or you can comma-separate types or something? then we don't need remote, you can just specify streamable-http,sse
|
Meta: I might want one extra reviewer to check this as well - adding an endpoint is something that is hard to take back later so better to double check. Maybe we can discuss later today in maintainers meeting? |
Just to chime in that this adds a property to an existing endpoint and not a new one 👍 |
tadasant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In concept, I am supportive of adding "filter" functionality to the registry spec. It's clear that for ETL processes pulling down entries from the Registry or a sub-registry, they will often just want to pull down a subset based on downstream support of features.
However, I'm not a fan of adding a type query parameter here. If we do this for type, there's an explosion of other filters we would create a precedent for adding. For example uses_template_variables, has_git_repository, uses_semantic_versioning, etc etc. It wouldn't be obvious to me why we would support type and not those others.
To land this, I think we need to make the feature more generic. Rather then encoding every possible "filter" into the spec, the caller should be able to express a filter that the server then applies on its response.
One option is to take a page from WoT Discovery and use JSONPath. The queries it would create are ugly, but it'd be a nice standardized way to allow consumers to filter whatever fields they care about. My main concern with this approach would be whether the server side implementation is very nonperformant, and whether we would need to lock down what is allowed. A solution for that: the generic schema could support JSONPath, but the public registry API could just support very specific predefined JSONPath queries (this type one being one of them).
I think I'd be good with a JSONPath solution, though also am curious for more/alternative ideas that would fulfill the "flexible for consumers" kind of requirement here.
@claude do you have any ideas? Propose a few alternatives to JSONPath, and share your opinion on whether JSONPath is a good solution here for a generic filtering capability.
|
Claude finished @rdimitrov's task —— View job Generic Filtering Approaches for Registry APII've analyzed the PR and filtering requirements. Here are several alternatives to JSONPath, along with my evaluation: Proposed Alternatives1. OData-style Query Parameters (Recommended)Uses query string syntax inspired by OData but simplified for this use case. Examples: Pros:
Cons:
2. GraphQL-style Field SelectionAllow clients to specify what they want using GraphQL-like syntax in query params. Examples: Pros:
Cons:
3. RSQL/FIQL (RESTful Service Query Language)A query language designed specifically for REST APIs. Examples: Pros:
Cons:
4. Predefined Named FiltersCreate a set of common filter presets that can be extended over time. Examples: Pros:
Cons:
5. Simplified JSONPath (Restricted Subset)Use JSONPath but limit to simple expressions only. Examples: Pros:
Cons:
My RecommendationI recommend a hybrid approach:
Evaluation of JSONPathIs JSONPath a good solution here? No, I don't think so, for these reasons:
Implementation Path ForwardFor this specific PR, I suggest:
This approach:
References:
Analysis of Current PR ImplementationThe current PR adds a simple
|
|
After poking around some more, I can't find any good precedent for introducing something like what I'm suggesting into a major piece of ecosystem infra. In fact it looks like NuGet used to use OData and migrated away in their v3, and no other reasonable precedent exists. I actually wonder if
What do you think @rdimitrov? Would that still fulfill your needs with the benefit of avoiding the explosion of modeling for us? @claude what's your deeper take on this path for the MCP Registry? |
|
Claude finished @tadasant's task —— View job Deep Analysis: Predefined Named Filters for MCP RegistryAfter reviewing the PR implementation and ecosystem architecture, I have strong opinions on the predefined filters approach. TL;DR: I recommend it highly, with some refinements. Why Predefined Named Filters Work Well HereThe MCP Registry has unique characteristics that make predefined filters particularly suitable: 1. Subregistry ETL Is Core to the ArchitectureFrom the ecosystem vision docs, the registry is explicitly designed for subregistries to ETL from it:
Key insight: Subregistries will be pulling the entire dataset or large subsets regularly. They're not doing one-off queries that need complex filtering. The pattern is:
This means the registry doesn't need to support arbitrary complex queries. It needs to support efficient bulk operations with a few common filters. 2. Finite, Stable Set of Distribution TypesLooking at the current implementation, there are exactly 6 distribution types:
This is not an unbounded problem space. Package ecosystems are stable and well-defined. We're not likely to add 50 more distribution types. Even if we do add a few over time (e.g., 3. API Freeze Makes Extensibility CriticalYou're in an API freeze (v0.1). The spec says:
Predefined filters are additive and non-breaking. You can:
Generic filter languages (JSONPath, RSQL, OData) would be hard to evolve without breaking changes because:
Recommended Implementation: Hybrid ApproachI recommend a two-tier system that combines the benefits of both approaches: Tier 1: Simple Field Filters (What This PR Does)Keep simple query params for common, stable fields: Why this tier?
Enhancement for this PR: Support comma-separated values for // Handle type parameter with comma-separated support
if input.Type != "" {
types := strings.Split(input.Type, ",")
for _, t := range types {
t = strings.TrimSpace(t)
if !model.IsValidDistributionType(t) {
return nil, huma.Error400BadRequest(fmt.Sprintf("Invalid type parameter: '%s' is not a valid distribution type", t))
}
}
filter.ConfigTypes = types // plural, pass as array
}Tier 2: Named Composite Filters (Future Addition)Add a Why this tier?
Addressing the "Explosion of Filters" Concern@tadasant raised a valid concern:
My take: There are actually two categories of filters: Category A: Core Structural Filters (Bounded Set)These map to fundamental data model properties:
These should be simple query params because:
Category B: Semantic/Quality Filters (Unbounded Set)These represent judgments or complex conditions:
These should be named presets because:
Concrete Proposal for This PRFor immediate merge:
For follow-up issues:
Performance and Scaling ConsiderationsOne concern with generic filtering is performance. The predefined approach handles this well: Simple field filters:
Named presets:
Generic filter languages (JSONPath, RSQL):
Given that the registry expects "a lot of API requests from ETL jobs," performance predictability is important. Does This Meet @rdimitrov's Needs?Looking at the original motivation:
Yes, this approach fully solves the problem:
The enhancement to support Comparison to Other Registry APIsLet me check what precedents exist in practice: NPM Registry (v3):
PyPI (JSON API):
Docker Hub:
NuGet:
Pattern: Major package registries converged on simple parameters + client-side processing. None use complex filter languages at scale. My Recommendation SummaryFor this PR: ✅ Approve with enhancement
Future direction: 🎯 Two-tier filtering
Rationale:
What I don't recommend:
Implementation NotesIf you enhance this PR to support comma-separated types, here's what needs to change:
This is a small, isolated change that significantly improves composability without adding complexity. Does this analysis help clarify the path forward? I'm happy to dive deeper into any aspect or help with implementation details if needed. |
|
I don't agree with Claude's suggestion to split into 2 functionalities (would rather we just focus on |
|
Rather than Though that does introduce the question of whether |
@jonathanhefner did you see my suggestion of I don't love category personally, because category is typically a mutually exclusive designator (items can only be in one category). I do think making the key |
I was interpreting |
Motivation and Context
The following PR adds a query parameter for searching by the distribution type (remote, npm, nuget, etc). This is so registry clients supporting nuget can query just for those (less load on the registry server + less load on the client to go through all results).
Note that this is not a breaking change.
How Has This Been Tested?
Locally + unit tests
Breaking Changes
No
Types of changes
Checklist
Additional context
Fixes: #781