You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm opening this issue as a place to collect searches which could be improved. Individual searches can be broken into issues as they are tackled, but this is essentially a place to get the conversation started.
@ribasushi I think the main issue with the dbix+helper search is that the MetaCPAN search results are collapsed. If you follow through on the link for more results you get https://metacpan.org/search?q=distribution:DBIx-Class-Helpers+dbix%20helper which is much more helpful. I'm not invalidating your comment. I'm just trying to work through what we're seeing. Obviously showing a deprecated module as the first result is not helpful. We should look at tweaking the collapsed search in this kind of case.
@oalders Does ES provide a way to calculate a "churn coefficient"? In other words - can it rank the entries by "most changes since" and thus give you a sane collapse criteria?
You need a way to specify a search by module name -- you effectively have this for the search box autocomplete, but something like module:MooseX ought to give all dists with MooseX in the name rather than a full-text search. SCO has had this feature forever and it's a major gap in MetaCPAN.
What @dagolden is proposing is something we can do relatively easily, so I think we should make that a priority. We'd just need to sort out the syntax. The single colon is part of lucene's search syntax. Also we just need to advertise that you can use lucene's syntax to constrain searches. A good example is https://metacpan.org/search?q=plack+author%3ADAGOLDEN
module.name:MooseX is accepted, but I don't get why it only returns "MooseX" and not any of the subclasses. I thought term queries/filters were contains not equals?
Putting field:val in the search box ends up doing a query_string search (that's what recognizes the operators), not a term filter.
To clarify term filters, they are for exact values (like not_analyzed strings).
The reference docs do use the word "contain" (which isn't very clear) but they also say "not_analyzed":
Matches documents that have fields that contain a term (not analyzed).
which means it won't be tokenized (hence the exact match requirement).
The book ("definitive guide") is slightly more specific:
The term filter is used to filter by exact values, be they numbers, dates, booleans, or not_analyzed exact value string fields".
Also note that the "term" operator doesn't analyze the input, so for example {"filter": {"term": {"file.module.name.analyzed": "MooseX"}}} returns no results, but {"filter": {"term": {"file.module.name.analyzed": "moosex"}}} returns several relevant matches.
However you can't see that difference using the search box because of the query_string query (which does analyze the input). So, since we have several "fields" for module name, using an analyzed field can get you what you want: module.name.analyzed:MooseX
Instead of making us jump hoops to know, understand and remember your data model and search engine behaviors, why not just intercept the search box contents before it goes to Lucene and create the right search for us?
module:Foo → match modules names containing "Foo"
module:^Foo → match modules names starting with "Foo"
Or, if you don't like colon separators, do something like DDG: !module Foo
My preference here would be to go with the colon separators because that's what people are used to. We could use some other character for stuff that people want to pass directly to ES/Lucene. Aside from the distribution search, I don't think we use this syntax at all. Nobody really seems to be aware of it and it would follow that really nobody is taking advantage of this. Also, you really need to know a fair bit about the internals to take advantage of this.
So, I'd say, let's make this as friendly as possible. If someone wants the old behaviour, they can preface the query with some syntax that doesn't get in the way.
I wasn't suggesting that people should know how to work that (or that it was good enough), I was just trying to clarify what Thomas was experiencing.
We actually do have some special casing for author: and dist: (and distribution:) and I agree we should add some more (like module:). Intercepting these is fairly easy and continuing to let other fields that we don't capture pass through to lucene will continue to work.
There is also some DDG-like operator in there, but I'm not sure how that works.
We obviously could use a page to explain what's available and how it works.
@oalders FWIW, In the search results there's a link that says "search in distribution" which just redoes the current search with an added dist:blah on it. I'm not implying that anybody knows how to use it directly, but the site itself actually does make use of it :-)
[09:10:03] <kentnl> [07:16:19] https://metacpan.org/search?q=JSON&search_type=modules # I'm not sure what to say here, but for some reason, JSON::MaybeXS doesn't rank, despite having a 5-star review rating and 26 ++'s
If you search for either perlvar or perlrun you get a result from PodSimplify from 1996 instead of the latest perl release as first result; perl's perlvar and perlrun pages are the second result for their respective searches.
Activity
oalders commentedon Jun 16, 2014
I'm looking for File::Temp.
https://metacpan.org/search?q=tmpfile
(result 8)
vs
http://search.cpan.org/search?query=tmpfile&mode=all
(result 2)
[-]Suboptimal Searches[/-][+]Suboptimal Searches (Dev, Trial and any unindexed releases not to be included here)[/+]ribasushi commentedon Jun 29, 2014
Thanks for kicking this off
http://search.cpan.org/search?query=mop&mode=all vs https://metacpan.org/search?q=mop (known issue, but the most painful manifestation of it)
http://search.cpan.org/search?query=dbix+helper&mode=all vs https://metacpan.org/search?q=dbix+helper (note how the only thing coming up is the deprecated one)
oalders commentedon Jun 30, 2014
@ribasushi I think the main issue with the dbix+helper search is that the MetaCPAN search results are collapsed. If you follow through on the link for more results you get https://metacpan.org/search?q=distribution:DBIx-Class-Helpers+dbix%20helper which is much more helpful. I'm not invalidating your comment. I'm just trying to work through what we're seeing. Obviously showing a deprecated module as the first result is not helpful. We should look at tweaking the collapsed search in this kind of case.
One other problem may be that the search is for "helper" and not "helpers". The collapsed results for "helpers" look better: https://metacpan.org/search?q=dbix+helpers
ribasushi commentedon Jul 1, 2014
@oalders Does ES provide a way to calculate a "churn coefficient"? In other words - can it rank the entries by "most changes since" and thus give you a sane collapse criteria?
ghost commentedon Jul 10, 2014
You need a way to specify a search by module name -- you effectively have this for the search box autocomplete, but something like
module:MooseX
ought to give all dists withMooseX
in the name rather than a full-text search. SCO has had this feature forever and it's a major gap in MetaCPAN.ranguard commentedon Jul 10, 2014
More smarts on start matching...
I want to find all Plack::Middleware::** modules that have 'time'
https://metacpan.org/search?q=plack%3A%3Amiddleware+time
This might be a new feature rather than a suboptimal search but thought I'd mention it here
oalders commentedon Jul 10, 2014
What @dagolden is proposing is something we can do relatively easily, so I think we should make that a priority. We'd just need to sort out the syntax. The single colon is part of lucene's search syntax. Also we just need to advertise that you can use lucene's syntax to constrain searches. A good example is https://metacpan.org/search?q=plack+author%3ADAGOLDEN
tsibley commentedon Jul 11, 2014
module.name:MooseX
is accepted, but I don't get why it only returns "MooseX" and not any of the subclasses. I thoughtterm
queries/filters were contains not equals?rwstauner commentedon Jul 11, 2014
Putting
field:val
in the search box ends up doing aquery_string
search (that's what recognizes the operators), not a term filter.To clarify term filters, they are for exact values (like not_analyzed strings).
The reference docs do use the word "contain" (which isn't very clear) but they also say "not_analyzed":
which means it won't be tokenized (hence the exact match requirement).
The book ("definitive guide") is slightly more specific:
Also note that the "term" operator doesn't analyze the input, so for example
{"filter": {"term": {"file.module.name.analyzed": "MooseX"}}}
returns no results, but{"filter": {"term": {"file.module.name.analyzed": "moosex"}}}
returns several relevant matches.However you can't see that difference using the search box because of the query_string query (which does analyze the input). So, since we have several "fields" for module name, using an analyzed field can get you what you want: module.name.analyzed:MooseX
ghost commentedon Jul 11, 2014
This is not user friendly.
Instead of making us jump hoops to know, understand and remember your data model and search engine behaviors, why not just intercept the search box contents before it goes to Lucene and create the right search for us?
Or, if you don't like colon separators, do something like DDG:
!module Foo
oalders commentedon Jul 11, 2014
My preference here would be to go with the colon separators because that's what people are used to. We could use some other character for stuff that people want to pass directly to ES/Lucene. Aside from the distribution search, I don't think we use this syntax at all. Nobody really seems to be aware of it and it would follow that really nobody is taking advantage of this. Also, you really need to know a fair bit about the internals to take advantage of this.
So, I'd say, let's make this as friendly as possible. If someone wants the old behaviour, they can preface the query with some syntax that doesn't get in the way.
rwstauner commentedon Jul 11, 2014
I wasn't suggesting that people should know how to work that (or that it was good enough), I was just trying to clarify what Thomas was experiencing.
We actually do have some special casing for
author:
anddist:
(anddistribution:
) and I agree we should add some more (likemodule:
). Intercepting these is fairly easy and continuing to let other fields that we don't capture pass through to lucene will continue to work.rwstauner commentedon Jul 11, 2014
There is also some DDG-like operator in there, but I'm not sure how that works.
We obviously could use a page to explain what's available and how it works.
@oalders FWIW, In the search results there's a link that says "search in distribution" which just redoes the current search with an added
dist:blah
on it. I'm not implying that anybody knows how to use it directly, but the site itself actually does make use of it :-)oalders commentedon Jul 11, 2014
@rwstauner Yeah, that's what I meant with "Aside from the distribution search, I don't think we use this syntax at all". :)
23 remaining items
oalders commentedon Apr 3, 2016
[09:10:03] <kentnl> [07:16:19] https://metacpan.org/search?q=JSON&search_type=modules # I'm not sure what to say here, but for some reason, JSON::MaybeXS doesn't rank, despite having a 5-star review rating and 26 ++'s
pink-mist commentedon Apr 12, 2016
If you search for either
perlvar
orperlrun
you get a result fromPodSimplify
from 1996 instead of the latest perl release as first result; perl's perlvar and perlrun pages are the second result for their respective searches.Grinnz commentedon Dec 8, 2016
https://metacpan.org/search?q=overload In a search for
overload
, the first result is the correctoverload
module in core, but its link https://metacpan.org/pod/overload goes to a very unrelated module.