Extend and refactor `example-queries` command, rename to `benchmark-queries`#126

Merged

hannahbast merged 65 commits intoqlever-dev:mainfrom

tanmay-9:add-evaluation-web-app

Jul 9, 2025

Collaborator

tanmay-9 commented Feb 3, 2025 •

edited by hannahbast

Loading

Support input file (with SPARQL queries and a description for each) in both TSV and YML; refactor the parsing code accordingly
Add option to generate a YML result file suitable for processing with our evaluation web app; see New Web Evaluation Application #171
Add unit tests
Rename from example-queries to benchmark-queries because that is really what this command is doing; the old functionality, which was a special case, is still there using the --example-queries option

tanmay-9 force-pushed the add-evaluation-web-app branch from 93f30bc to 168ad9a Compare

February 12, 2025 01:00

tanmay-9 added 14 commits

February 20, 2025 01:11


          Added ability to read yaml queries file for example_queries command

08d86e3


          Replaced PyYaml with ruamel.yaml and added function to write to yaml …

c08f90c

…file in example-queries


          Generate yaml records by reading the result_file for qlever(qlever-re…

7e3e403

…sults+json) and not qlever(tsv) in example-queries


          Incorporate changes to example-queries from extract-queries commit

20ba9ba


          Add the generated yaml file in example-queries to evaluation/output f…

714b2d3

…older (gitignored)


          Add www_queries_mode folder from qlever-evaluation as www

6a3675f


          Add serve-evaluation-app command to serve the evaluation webapp using…

9c76ed2

… http.server


          Make the comparison web app work with new yaml results file

19b2d43


          Add checkboxes to main page to select which engines to use for compar…

95c62ef

…ison when compareResults button is clicked


          Record query runtime even when the query errors out

132a0d8


          Added header row and result_size to query results table

36c4015


          Compare only the selected engines and remove hide & reset buttons and…

8a02fba

… checkboxes from comparison page. Fix the hover sparql text by escaping HTML text properly


          Add full error msg to output yaml file in example-queries

11fca70


          Removed some code and fixed some bugs

aeb646f

tanmay-9 force-pushed the add-evaluation-web-app branch from 168ad9a to aeb646f Compare

February 20, 2025 00:34

Hannah Bast and others added 11 commits

February 20, 2025 18:48


          Add rdflib dependency and --host argument

4d59fac


          Merge remote-tracking branch 'origin/main' into add-evaluation-web-app

268e153


          Merge remote-tracking branch 'origin/main' into add-evaluation-web-app

31bc18c


          No automatic AUTO for option --generate-output-file

7c7337c


          Merge remote-tracking branch 'origin/main' into add-evaluation-web-app


          Add statistics

264484d


          Added ability to generate yaml results from sparql-results+json

44753f4


          Make the tables in eval webapp more compact and make the comparison t…

5f5200b

…able-header sticky


          Fix spacing and some other bugs

fc5161e


          Add best runtime, error and result_size warning tooltip to comparison…

e4156d5

… table


          Added result size bs popover to comparison Table

c51bd68

tanmay-9 force-pushed the add-evaluation-web-app branch from 13e71ab to c51bd68 Compare

April 7, 2025 22:04

tanmay-9 and others added 2 commits

April 11, 2025 13:18


          Add show result size checkbox to comparisonModal

bb4f835


          Merge remote-tracking branch 'origin/main' into add-evaluation-web-app

106623e

tanmay-9 force-pushed the add-evaluation-web-app branch from 78d2d4b to 9ca6b99 Compare

April 26, 2025 15:24


          Set default widht-error-message to 50 (from other PR)

4af9295

Collaborator

hannahbast commented May 1, 2025

@tanmay-9 Thanks a lot for the changes. One minor issue that is hopefully easy to fix: Right now, a "warning sign" is shown when the result size for one engine differs from the result size of the majority. However, for many of our benchmarks, the queries have only one result, which is a count. It would be very useful, if for those queries the warning sign would also appear if for one engine that count is different fron the majority.

hannahbast marked this pull request as ready for review

May 2, 2025 16:11

Hannah Bast and others added 8 commits

May 2, 2025 18:13


          Merge remote-tracking branch 'origin/main' into add-evaluation-web-app

b9bc031


          Bump version to 0.5.23

476e811


          Merge remote-tracking branch 'origin/main' into add-evaluation-web-app

c748038


          Fix yaml_data path and augment performanceDataPerKb for ease of use i…

…n comparison


          Add result_size to tooltip and no consensus warning symbol

24c938c


          Update example-queries and serve-evaluation-app commands to use p…

46a78b9

…yyaml instead of ruamel.yaml


          Fix path to yaml_data

ccbd9e2


          Remove commented out line

21f47fe

hannahbast requested changes

View reviewed changes

Collaborator

hannahbast left a comment

1-1 with Tanmay, reviewing until (and including) example_queries.py

src/qlever/commands/example_queries.py Outdated

Comment on lines 64 to 75

+                          help="Command to get example queries as TSV (description, query)",
+                      )
+                      subparser.add_argument(
+                          "--queries-file",
+                          type=str,
+                          help=(
+                              "Path to a YAML file containing queries.  "
+                              "The YAML file should have a top-level "
+                              "key called 'queries', which is a list of dictionaries. "
+                              "Each dictionary should contain 'query' for the query name "
+                              "and 'sparql' for the SPARQL query."
+                          ),

Collaborator

hannahbast May 9, 2025

We agreed to have two arguments now (only of of which should be used at a time): --queries-yml, which reads from a YAML file, and --queries-tsv, which reads from a TSV file. The --get-queries-cmd option should go. It's functionality would still be there using --queries-tsv <(command that produces a TSV)

src/qlever/commands/example_queries.py Outdated

+                          type=str,
+                          default=None,
+                          help=(
+                              "Name that would be used for result yaml file. "

Collaborator

hannahbast May 9, 2025

Suggested change

      
                            "Name that would be used for result yaml file. "
          
                            "Name that would be used for result YML file. "

src/qlever/commands/example_queries.py Outdated

Comment on lines 214 to 215

		@staticmethod
		def parse_queries_file(queries_file: str) -> dict[str, list[str, str]]:

Collaborator

hannahbast May 9, 2025

Should be called parse_queries_yml, and there should be an analog method parse_queries_tsv

Please double-check that dict is always ordered and if yes, add a comment to clarify that

src/qlever/commands/example_queries.py Outdated


		return data

		def get_example_queries(

Collaborator

hannahbast May 9, 2025

This should now have a different signature. It looks strange to me that this function is called the parse_.... function (of which there are now two). Instead, the calling code should call one of the parse_... functions and then call this function, which should then have a different name (and signature). Or should this function be a help function that is called by the parse_... functions`

src/qlever/commands/example_queries.py

                           log.error("Cannot have both --remove-offset-and-limit and --limit")
                           return False
+                      dataset, engine = None, None

Collaborator

hannahbast May 9, 2025

Add a short explanatory comment before this block (which i.p. explains what dataset and engine are)

src/qlever/commands/example_queries.py Outdated

Comment on lines 789 to 799

+                  def get_record_for_yaml(
+                      self,
+                      query: str,
+                      sparql: str,
+                      client_time: float,
+                      result: str | dict[str, str],
+                      result_size: int | None,
+                      accept_header: str,
+                  ) -> dict[str, Any]:
+                      """
+                      Construct a dictionary with query information for yaml file

Collaborator

hannahbast May 9, 2025

This function should have a better name and comment. In particular, the input file can be a YML as well, so yaml is not a unique reference

src/qlever/commands/example_queries.py Outdated

               from qlever.log import log, mute_log
               from qlever.util import run_command, run_curl_command
+              MAX_RESULT_SIZE = 20

Collaborator

hannahbast May 9, 2025

Should be an argument, e.g. max-results-output-file

src/qlever/commands/example_queries.py Outdated

Comment on lines 835 to 839

+                  def get_query_results(
+                      self, result_file: str, result_size: int, accept_header: str
+                  ) -> tuple[list[str], list[list[str]]]:
+                      """
+                      Return headers and results as a tuple

Collaborator

hannahbast May 9, 2025

Improve name and comment. Since we have rather many helper functions now, they should have good and consistent names and clear comments (which does not mean long comments)

src/qlever/commands/example_queries.py Outdated

Comment on lines 885 to 886

		graph = Graph()
		graph.parse(result_file, format="turtle")

Collaborator

hannahbast May 9, 2025

Make prefix rdflib explicit

src/qlever/commands/example_queries.py

Comment on lines 895 to 901

+                  @staticmethod
+                  def write_query_data_to_yaml(
+                      query_data: dict[str, list[dict[str, Any]]], out_file: Path
+                  ) -> None:
+                      """
+                      Write yaml record for all queries to output yaml file
+                      """

Collaborator

hannahbast May 9, 2025

Dito (regarding name and comment)

tanmay-9 added 2 commits

May 16, 2025 19:21


          Make pull request changes

e7a05ab


          Some code formatting fixes

e0cd4ba

hannahbast requested changes

View reviewed changes

Collaborator

hannahbast left a comment

Another round of reviewing

src/qlever/commands/example_queries.py Outdated

Comment on lines 230 to 248

+                  def construct_get_queries_cmd(
+                      queries_file: str, query_ids: str, query_regex: str, ui_config: str
+                  ) -> str:
                       """
-                      Parse a YAML file and validate its structure.
+                      Construct get_queries_cmd from queries_tsv file if present or use
+                      example queries by using ui_config. Use query_ids and query_regex to
+                      filter the queries
+                      """
+                      get_queries_cmd = (
+                          f"cat {queries_file}"
+                          if queries_file
+                          else f"curl -sv https://qlever.cs.uni-freiburg.de/"
+                          f"api/examples/{ui_config}"
+                      )
+                      sed_arg = query_ids.replace(",", "p;").replace("-", ",") + "p"
+                      get_queries_cmd += f" | sed -n '{sed_arg}'"
+                      if query_regex:
+                          get_queries_cmd += f" | grep -Pi {shlex.quote(query_regex)}"
+                      return get_queries_cmd

Collaborator

hannahbast May 23, 2025

We discussed that there should be an additional option --example-queries´ (and the whole command should be renamed to benchmark-queries, so that the code looks them same no matter whether you parse from TSV or YAML (just the parsing is different). The user should then *either* specify --example-queriesor one of thequeries-...`, and there should be an informative error message otherwise

src/qlever/commands/example_queries.py Outdated

Comment on lines 325 to 346

+                          tsv_queries = []
+                          for query_idx in query_indices:
+                              if query_idx >= total_queries:
+                                  log.error(
+                                      "Make sure --query-ids don't exceed the total "
+                                      "queries in the YML file"
+                                  )
+                                  return []
+                              query = data["queries"][query_idx]
+                              # Only include queries that match the query_regex if present
+                              if query_regex:
+                                  pattern = re.compile(query_regex, re.IGNORECASE)
+                                  if not any(
+                                      [
+                                          pattern.search(query["query"]),
+                                          pattern.search(query["sparql"]),
+                                      ]
+                                  ):
+                                      continue
+                              tsv_queries.append(f"{query['query']}\t{query['sparql']}")

Collaborator

hannahbast May 23, 2025

This code should come after the parsing, to avoid code duplication. The two parsers provide the same intermediate representation of the queries, and that one can then filter based on query-ids and query-regex.

src/qlever/commands/example_queries.py Outdated

Comment on lines 387 to 410

+                          if accept_header == "text/tab-separated-values":
+                              result_size = run_command(
+                                  f"sed 1d {result_file}", return_output=True
                               )
-                              if len(example_query_lines) == 0:
-                                  return []
-                              example_query_lines = example_query_lines.splitlines()
-                              return example_query_lines
-                          except Exception as e:
-                              log.error(f"Failed to get example queries: {e}")
-                              return []
-                      return []
+                          elif accept_header == "application/qlever-results+json":
+                              try:
+                                  # sed cmd to get the number between 2nd and 3rd double_quotes
+                                  result_size = run_command(
+                                      f"jq '.res[0]' {result_file}"
+                                      " | sed 's/[^0-9]*\\([0-9]*\\).*/\\1/'",
+                                      return_output=True,
+                                  )
+                              except Exception as e:
+                                  error_msg = get_json_error_msg(e)
+                          else:
+                              try:
+                                  result_size = run_command(
+                                      f'jq -r ".results.bindings[0]'
+                                      f" | to_entries[0].value.value"
+                                      f' | tonumber" {result_file}',
+                                      return_output=True,
+                                  )
+                              except Exception as e:
+                                  error_msg = get_json_error_msg(e)

Collaborator

hannahbast May 23, 2025

Looks like this should be a separate function + it should be tested.

src/qlever/commands/example_queries.py Outdated

Comment on lines 414 to 450

+                          if (
+                              accept_header == "text/tab-separated-values"
+                              or accept_header == "text/csv"
+                          ):
+                              result_size = run_command(
+                                  f"sed 1d {result_file} | wc -l", return_output=True
+                              )
+                          elif accept_header == "text/turtle":
+                              result_size = run_command(
+                                  f"sed '1d;/^@prefix/d;/^\\s*$/d' {result_file} | wc -l",
+                                  return_output=True,
+                              )
+                          elif accept_header == "application/qlever-results+json":
+                              result_size = run_command(
+                                  f'jq -r ".resultsize" {result_file}',
+                                  return_output=True,
+                              )
+                          else:
+                              try:
+                                  result_size = int(
+                                      run_command(
+                                          f'jq -r ".results.bindings | length"'
+                                          f" {result_file}",
+                                          return_output=True,
+                                      ).rstrip()
+                                  )
+                              except Exception as e:
+                                  error_msg = get_json_error_msg(e)
+                              if result_size == 1:
+                                  try:
+                                      single_int_result = int(
+                                          run_command(
+                                              f'jq -e -r ".results.bindings[0][] | .value"'
+                                              f" {result_file}",
+                                              return_output=True,
+                                          ).rstrip()
+                                      )

Collaborator

hannahbast May 23, 2025

Looks like this should be a separate function + it should be tested.

Hannah Bast and others added 4 commits

May 23, 2025 22:45


          Minor revision of help text

3a17f05


          Change example-queries to benchmark-queries and some changes for …

49cc9b7

…the pr


          Separate get_single_int_result method and added docstrings

8aa666d


          Add tests for get_result_size and get_single_int_result

hannahbast requested changes

View reviewed changes

Collaborator

hannahbast left a comment

1-1 with Tanmay, round 3. Some minor comments left. The web app should be factored out of this. Thanks a lot.

src/qlever/commands/benchmark_queries.py Outdated

+                          action="store_true",
+                          default=False,
+                          help=(
+                              "Run the example-queries for the given --ui-config "

Collaborator

hannahbast Jun 13, 2025

Suggested change

      
                            "Run the example-queries for the given --ui-config "
          
                            "Run the example queries for the given --ui-config "

src/qlever/commands/benchmark_queries.py Outdated

+                          default=False,
+                          help=(
+                              "Run the example-queries for the given --ui-config "
+                              "instead of the benchmark queries from a tsv/yml file"

Collaborator

hannahbast Jun 13, 2025

Suggested change

      
                            "instead of the benchmark queries from a tsv/yml file"
          
                            "instead of the benchmark queries from a TSV or YML file"

src/qlever/commands/benchmark_queries.py Outdated

Comment on lines 296 to 297

		Parse the queries_tsv file and return a list of tab-separated queries
		(query_description, full_sparql_query)

Collaborator

hannahbast Jun 13, 2025

The internal representation should be more abstract, like a list of tuples.

src/qlever/commands/benchmark_queries.py Outdated

Comment on lines 242 to 243

		Given a tab-separated list of queries, filter them and keep the
		ones which are a part of query_ids or match with query_regex

Collaborator

hannahbast Jun 13, 2025

The input should be a list of tuples, see below. Change name to filter_queries

src/qlever/commands/benchmark_queries.py Outdated

Comment on lines 280 to 281

		Execute the given bash command to fetch tsv queries and return a
		list of tab-separated queries (query_description, full_sparql_query)

Collaborator

hannahbast Jun 13, 2025

List of tuples as well

src/qlever/commands/benchmark_queries.py Outdated

+                      """
+                      When downloading the full result of a query with accept header as
+                      application/sparql-results+json and result_size == 1, get the single
+                      integer result value (if any)

Collaborator

hannahbast Jun 13, 2025

Suggested change

      
                    integer result value (if any)
          
                    integer result value (if any).

hannahbast changed the title ~~Add evaluation web app~~ Extend and refactor example-queries command, rename to benchmark-queries

tanmay-9 and others added 6 commits

June 26, 2025 15:04


          Changed tsv query representation to tuple[str, str] and some pr changes

e89b9b7


          Merge remote-tracking branch 'origin/main' into add-evaluation-web-app


          Deleted web-app from this branch for pr

2f54648


          Updated gitignore and README

c38253c


          Merge remote-tracking branch 'origin/main' into add-evaluation-web-app

617e8b6


          Use qlever query in README

9f35c36

hannahbast approved these changes

View reviewed changes

Collaborator

hannahbast left a comment

This looks very good now. I will wait for all the checks become green, revise the description, and then merge this

hannahbast merged commit 97f0855 into qlever-dev:main

9 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet