Skip to content

Suboptimal Searches (Dev, Trial and any unindexed releases not to be included here) #1231

Open
@oalders

Description

@oalders

I'm opening this issue as a place to collect searches which could be improved. Individual searches can be broken into issues as they are tackled, but this is essentially a place to get the conversation started.

Activity

oalders

oalders commented on Jun 16, 2014

@oalders
MemberAuthor
changed the title [-]Suboptimal Searches[/-] [+]Suboptimal Searches (Dev, Trial and any unindexed releases not to be included here)[/+] on Jun 17, 2014
ribasushi

ribasushi commented on Jun 29, 2014

@ribasushi

Thanks for kicking this off

http://search.cpan.org/search?query=mop&mode=all vs https://metacpan.org/search?q=mop (known issue, but the most painful manifestation of it)

http://search.cpan.org/search?query=dbix+helper&mode=all vs https://metacpan.org/search?q=dbix+helper (note how the only thing coming up is the deprecated one)

oalders

oalders commented on Jun 30, 2014

@oalders
MemberAuthor

@ribasushi I think the main issue with the dbix+helper search is that the MetaCPAN search results are collapsed. If you follow through on the link for more results you get https://metacpan.org/search?q=distribution:DBIx-Class-Helpers+dbix%20helper which is much more helpful. I'm not invalidating your comment. I'm just trying to work through what we're seeing. Obviously showing a deprecated module as the first result is not helpful. We should look at tweaking the collapsed search in this kind of case.

One other problem may be that the search is for "helper" and not "helpers". The collapsed results for "helpers" look better: https://metacpan.org/search?q=dbix+helpers

ribasushi

ribasushi commented on Jul 1, 2014

@ribasushi

@oalders Does ES provide a way to calculate a "churn coefficient"? In other words - can it rank the entries by "most changes since" and thus give you a sane collapse criteria?

ghost

ghost commented on Jul 10, 2014

@ghost

You need a way to specify a search by module name -- you effectively have this for the search box autocomplete, but something like module:MooseX ought to give all dists with MooseX in the name rather than a full-text search. SCO has had this feature forever and it's a major gap in MetaCPAN.

ranguard

ranguard commented on Jul 10, 2014

@ranguard
Member

More smarts on start matching...

I want to find all Plack::Middleware::** modules that have 'time'

https://metacpan.org/search?q=plack%3A%3Amiddleware+time

This might be a new feature rather than a suboptimal search but thought I'd mention it here

oalders

oalders commented on Jul 10, 2014

@oalders
MemberAuthor

What @dagolden is proposing is something we can do relatively easily, so I think we should make that a priority. We'd just need to sort out the syntax. The single colon is part of lucene's search syntax. Also we just need to advertise that you can use lucene's syntax to constrain searches. A good example is https://metacpan.org/search?q=plack+author%3ADAGOLDEN

tsibley

tsibley commented on Jul 11, 2014

@tsibley
Contributor

module.name:MooseX is accepted, but I don't get why it only returns "MooseX" and not any of the subclasses. I thought term queries/filters were contains not equals?

rwstauner

rwstauner commented on Jul 11, 2014

@rwstauner
Contributor

Putting field:val in the search box ends up doing a query_string search (that's what recognizes the operators), not a term filter.

To clarify term filters, they are for exact values (like not_analyzed strings).

The reference docs do use the word "contain" (which isn't very clear) but they also say "not_analyzed":

Matches documents that have fields that contain a term (not analyzed).

which means it won't be tokenized (hence the exact match requirement).

The book ("definitive guide") is slightly more specific:

The term filter is used to filter by exact values, be they numbers, dates, booleans, or not_analyzed exact value string fields".

Also note that the "term" operator doesn't analyze the input, so for example
{"filter": {"term": {"file.module.name.analyzed": "MooseX"}}} returns no results, but
{"filter": {"term": {"file.module.name.analyzed": "moosex"}}} returns several relevant matches.

However you can't see that difference using the search box because of the query_string query (which does analyze the input). So, since we have several "fields" for module name, using an analyzed field can get you what you want: module.name.analyzed:MooseX

ghost

ghost commented on Jul 11, 2014

@ghost

This is not user friendly.

Instead of making us jump hoops to know, understand and remember your data model and search engine behaviors, why not just intercept the search box contents before it goes to Lucene and create the right search for us?

module:Foo  → match modules names containing "Foo"
module:^Foo  → match modules names starting with "Foo"

Or, if you don't like colon separators, do something like DDG: !module Foo

oalders

oalders commented on Jul 11, 2014

@oalders
MemberAuthor

My preference here would be to go with the colon separators because that's what people are used to. We could use some other character for stuff that people want to pass directly to ES/Lucene. Aside from the distribution search, I don't think we use this syntax at all. Nobody really seems to be aware of it and it would follow that really nobody is taking advantage of this. Also, you really need to know a fair bit about the internals to take advantage of this.

So, I'd say, let's make this as friendly as possible. If someone wants the old behaviour, they can preface the query with some syntax that doesn't get in the way.

rwstauner

rwstauner commented on Jul 11, 2014

@rwstauner
Contributor

I wasn't suggesting that people should know how to work that (or that it was good enough), I was just trying to clarify what Thomas was experiencing.

We actually do have some special casing for author: and dist: (and distribution:) and I agree we should add some more (like module:). Intercepting these is fairly easy and continuing to let other fields that we don't capture pass through to lucene will continue to work.

rwstauner

rwstauner commented on Jul 11, 2014

@rwstauner
Contributor

There is also some DDG-like operator in there, but I'm not sure how that works.

We obviously could use a page to explain what's available and how it works.

@oalders FWIW, In the search results there's a link that says "search in distribution" which just redoes the current search with an added dist:blah on it. I'm not implying that anybody knows how to use it directly, but the site itself actually does make use of it :-)

oalders

oalders commented on Jul 11, 2014

@oalders
MemberAuthor

@rwstauner Yeah, that's what I meant with "Aside from the distribution search, I don't think we use this syntax at all". :)

23 remaining items

oalders

oalders commented on Apr 3, 2016

@oalders
MemberAuthor

[09:10:03] <kentnl> [07:16:19] https://metacpan.org/search?q=JSON&search_type=modules # I'm not sure what to say here, but for some reason, JSON::MaybeXS doesn't rank, despite having a 5-star review rating and 26 ++'s

pink-mist

pink-mist commented on Apr 12, 2016

@pink-mist

If you search for either perlvar or perlrun you get a result from PodSimplify from 1996 instead of the latest perl release as first result; perl's perlvar and perlrun pages are the second result for their respective searches.

Grinnz

Grinnz commented on Dec 8, 2016

@Grinnz
Contributor

https://metacpan.org/search?q=overload In a search for overload, the first result is the correct overload module in core, but its link https://metacpan.org/pod/overload goes to a very unrelated module.

moved this to Priority in PTS 2017on Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @shlomif@frioux@mattp-@haarg@ranguard

      Issue actions

        Suboptimal Searches (Dev, Trial and any unindexed releases not to be included here) · Issue #1231 · metacpan/metacpan-web