Skip to content

Unified getter for the relevance level #254

Open
@TheMrSheldon

Description

@TheMrSheldon

Is your feature request related to a problem? Please describe.
ir_datasets centralizes a lot of information about datasets. However, when using evaluation measures with binary levels (like MAP, MRR, ...), one sometimes needs to find the correct relevance level, which may be missed easily. Is it correct that ir_datasets currently does not track the minimum relevance level?

Describe the solution you'd like
Would it be possible to add a function document.get_relevance_level() -> int that returns the minimum relevance level for the dataset (e.g., 1 for TREC DL '19 doc and 2 for TREC DL '19 passage)? Some datasets (e.g., ANTIQUE) also recommend a remapping of the graded relevance labels. Could this be automatically performed. For example that during the download of ANTIQUE the qrels get remapped from the 1-4 range to 0-3 and for ANTIQUE the relevance level would be returned as 2 (the standard relevance level of 3 also reduced by 1).

Describe alternatives you've considered
To my knowledge, this currently has to be done manually.

Additional context
Such a function could then be used in conjunction with pyterrier or pytrec_eval such that the user does not need to manually find and hardcode the relevance_level for every dataset they use. Such a feature could greatly reduce the risk of incomparable evaluation results if some people forget to set the correct relevance_level and others don't.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions