Extend and refactor example-queries command, rename to benchmark-queries#126
Extend and refactor example-queries command, rename to benchmark-queries#126hannahbast merged 65 commits intoqlever-dev:mainfrom
example-queries command, rename to benchmark-queries#126Conversation
93f30bc to
168ad9a
Compare
…file in example-queries
…sults+json) and not qlever(tsv) in example-queries
…older (gitignored)
…ison when compareResults button is clicked
… checkboxes from comparison page. Fix the hover sparql text by escaping HTML text properly
168ad9a to
aeb646f
Compare
…able-header sticky
13e71ab to
c51bd68
Compare
78d2d4b to
9ca6b99
Compare
|
@tanmay-9 Thanks a lot for the changes. One minor issue that is hopefully easy to fix: Right now, a "warning sign" is shown when the result size for one engine differs from the result size of the majority. However, for many of our benchmarks, the queries have only one result, which is a count. It would be very useful, if for those queries the warning sign would also appear if for one engine that count is different fron the majority. |
…yyaml instead of ruamel.yaml
hannahbast
left a comment
There was a problem hiding this comment.
1-1 with Tanmay, reviewing until (and including) example_queries.py
| help="Command to get example queries as TSV (description, query)", | ||
| ) | ||
| subparser.add_argument( | ||
| "--queries-file", | ||
| type=str, | ||
| help=( | ||
| "Path to a YAML file containing queries. " | ||
| "The YAML file should have a top-level " | ||
| "key called 'queries', which is a list of dictionaries. " | ||
| "Each dictionary should contain 'query' for the query name " | ||
| "and 'sparql' for the SPARQL query." | ||
| ), |
There was a problem hiding this comment.
We agreed to have two arguments now (only of of which should be used at a time): --queries-yml, which reads from a YAML file, and --queries-tsv, which reads from a TSV file. The --get-queries-cmd option should go. It's functionality would still be there using --queries-tsv <(command that produces a TSV)
| type=str, | ||
| default=None, | ||
| help=( | ||
| "Name that would be used for result yaml file. " |
There was a problem hiding this comment.
| "Name that would be used for result yaml file. " | |
| "Name that would be used for result YML file. " |
| @staticmethod | ||
| def parse_queries_file(queries_file: str) -> dict[str, list[str, str]]: |
There was a problem hiding this comment.
Should be called parse_queries_yml, and there should be an analog method parse_queries_tsv
Please double-check that dict is always ordered and if yes, add a comment to clarify that
|
|
||
| return data | ||
|
|
||
| def get_example_queries( |
There was a problem hiding this comment.
This should now have a different signature. It looks strange to me that this function is called the parse_.... function (of which there are now two). Instead, the calling code should call one of the parse_... functions and then call this function, which should then have a different name (and signature). Or should this function be a help function that is called by the parse_... functions`
| log.error("Cannot have both --remove-offset-and-limit and --limit") | ||
| return False | ||
|
|
||
| dataset, engine = None, None |
There was a problem hiding this comment.
Add a short explanatory comment before this block (which i.p. explains what dataset and engine are)
| def get_record_for_yaml( | ||
| self, | ||
| query: str, | ||
| sparql: str, | ||
| client_time: float, | ||
| result: str | dict[str, str], | ||
| result_size: int | None, | ||
| accept_header: str, | ||
| ) -> dict[str, Any]: | ||
| """ | ||
| Construct a dictionary with query information for yaml file |
There was a problem hiding this comment.
This function should have a better name and comment. In particular, the input file can be a YML as well, so yaml is not a unique reference
| from qlever.log import log, mute_log | ||
| from qlever.util import run_command, run_curl_command | ||
|
|
||
| MAX_RESULT_SIZE = 20 |
There was a problem hiding this comment.
Should be an argument, e.g. max-results-output-file
| def get_query_results( | ||
| self, result_file: str, result_size: int, accept_header: str | ||
| ) -> tuple[list[str], list[list[str]]]: | ||
| """ | ||
| Return headers and results as a tuple |
There was a problem hiding this comment.
Improve name and comment. Since we have rather many helper functions now, they should have good and consistent names and clear comments (which does not mean long comments)
| graph = Graph() | ||
| graph.parse(result_file, format="turtle") |
There was a problem hiding this comment.
Make prefix rdflib explicit
| @staticmethod | ||
| def write_query_data_to_yaml( | ||
| query_data: dict[str, list[dict[str, Any]]], out_file: Path | ||
| ) -> None: | ||
| """ | ||
| Write yaml record for all queries to output yaml file | ||
| """ |
There was a problem hiding this comment.
Dito (regarding name and comment)
hannahbast
left a comment
There was a problem hiding this comment.
Another round of reviewing
| def construct_get_queries_cmd( | ||
| queries_file: str, query_ids: str, query_regex: str, ui_config: str | ||
| ) -> str: | ||
| """ | ||
| Parse a YAML file and validate its structure. | ||
| Construct get_queries_cmd from queries_tsv file if present or use | ||
| example queries by using ui_config. Use query_ids and query_regex to | ||
| filter the queries | ||
| """ | ||
| get_queries_cmd = ( | ||
| f"cat {queries_file}" | ||
| if queries_file | ||
| else f"curl -sv https://qlever.cs.uni-freiburg.de/" | ||
| f"api/examples/{ui_config}" | ||
| ) | ||
| sed_arg = query_ids.replace(",", "p;").replace("-", ",") + "p" | ||
| get_queries_cmd += f" | sed -n '{sed_arg}'" | ||
| if query_regex: | ||
| get_queries_cmd += f" | grep -Pi {shlex.quote(query_regex)}" | ||
| return get_queries_cmd |
There was a problem hiding this comment.
We discussed that there should be an additional option --example-queries´ (and the whole command should be renamed to benchmark-queries, so that the code looks them same no matter whether you parse from TSV or YAML (just the parsing is different). The user should then *either* specify --example-queriesor one of thequeries-...`, and there should be an informative error message otherwise
| tsv_queries = [] | ||
| for query_idx in query_indices: | ||
| if query_idx >= total_queries: | ||
| log.error( | ||
| "Make sure --query-ids don't exceed the total " | ||
| "queries in the YML file" | ||
| ) | ||
| return [] | ||
| query = data["queries"][query_idx] | ||
|
|
||
| # Only include queries that match the query_regex if present | ||
| if query_regex: | ||
| pattern = re.compile(query_regex, re.IGNORECASE) | ||
| if not any( | ||
| [ | ||
| pattern.search(query["query"]), | ||
| pattern.search(query["sparql"]), | ||
| ] | ||
| ): | ||
| continue | ||
|
|
||
| tsv_queries.append(f"{query['query']}\t{query['sparql']}") |
There was a problem hiding this comment.
This code should come after the parsing, to avoid code duplication. The two parsers provide the same intermediate representation of the queries, and that one can then filter based on query-ids and query-regex.
| if accept_header == "text/tab-separated-values": | ||
| result_size = run_command( | ||
| f"sed 1d {result_file}", return_output=True | ||
| ) | ||
| if len(example_query_lines) == 0: | ||
| return [] | ||
| example_query_lines = example_query_lines.splitlines() | ||
| return example_query_lines | ||
| except Exception as e: | ||
| log.error(f"Failed to get example queries: {e}") | ||
| return [] | ||
| return [] | ||
| elif accept_header == "application/qlever-results+json": | ||
| try: | ||
| # sed cmd to get the number between 2nd and 3rd double_quotes | ||
| result_size = run_command( | ||
| f"jq '.res[0]' {result_file}" | ||
| " | sed 's/[^0-9]*\\([0-9]*\\).*/\\1/'", | ||
| return_output=True, | ||
| ) | ||
| except Exception as e: | ||
| error_msg = get_json_error_msg(e) | ||
| else: | ||
| try: | ||
| result_size = run_command( | ||
| f'jq -r ".results.bindings[0]' | ||
| f" | to_entries[0].value.value" | ||
| f' | tonumber" {result_file}', | ||
| return_output=True, | ||
| ) | ||
| except Exception as e: | ||
| error_msg = get_json_error_msg(e) |
There was a problem hiding this comment.
Looks like this should be a separate function + it should be tested.
| if ( | ||
| accept_header == "text/tab-separated-values" | ||
| or accept_header == "text/csv" | ||
| ): | ||
| result_size = run_command( | ||
| f"sed 1d {result_file} | wc -l", return_output=True | ||
| ) | ||
| elif accept_header == "text/turtle": | ||
| result_size = run_command( | ||
| f"sed '1d;/^@prefix/d;/^\\s*$/d' {result_file} | wc -l", | ||
| return_output=True, | ||
| ) | ||
| elif accept_header == "application/qlever-results+json": | ||
| result_size = run_command( | ||
| f'jq -r ".resultsize" {result_file}', | ||
| return_output=True, | ||
| ) | ||
| else: | ||
| try: | ||
| result_size = int( | ||
| run_command( | ||
| f'jq -r ".results.bindings | length"' | ||
| f" {result_file}", | ||
| return_output=True, | ||
| ).rstrip() | ||
| ) | ||
| except Exception as e: | ||
| error_msg = get_json_error_msg(e) | ||
| if result_size == 1: | ||
| try: | ||
| single_int_result = int( | ||
| run_command( | ||
| f'jq -e -r ".results.bindings[0][] | .value"' | ||
| f" {result_file}", | ||
| return_output=True, | ||
| ).rstrip() | ||
| ) |
There was a problem hiding this comment.
Looks like this should be a separate function + it should be tested.
hannahbast
left a comment
There was a problem hiding this comment.
1-1 with Tanmay, round 3. Some minor comments left. The web app should be factored out of this. Thanks a lot.
| action="store_true", | ||
| default=False, | ||
| help=( | ||
| "Run the example-queries for the given --ui-config " |
There was a problem hiding this comment.
| "Run the example-queries for the given --ui-config " | |
| "Run the example queries for the given --ui-config " |
| default=False, | ||
| help=( | ||
| "Run the example-queries for the given --ui-config " | ||
| "instead of the benchmark queries from a tsv/yml file" |
There was a problem hiding this comment.
| "instead of the benchmark queries from a tsv/yml file" | |
| "instead of the benchmark queries from a TSV or YML file" |
| Parse the queries_tsv file and return a list of tab-separated queries | ||
| (query_description, full_sparql_query) |
There was a problem hiding this comment.
The internal representation should be more abstract, like a list of tuples.
| Given a tab-separated list of queries, filter them and keep the | ||
| ones which are a part of query_ids or match with query_regex |
There was a problem hiding this comment.
The input should be a list of tuples, see below. Change name to filter_queries
| Execute the given bash command to fetch tsv queries and return a | ||
| list of tab-separated queries (query_description, full_sparql_query) |
There was a problem hiding this comment.
List of tuples as well
| """ | ||
| When downloading the full result of a query with accept header as | ||
| application/sparql-results+json and result_size == 1, get the single | ||
| integer result value (if any) |
There was a problem hiding this comment.
| integer result value (if any) | |
| integer result value (if any). |
example-queries command, rename to benchmark-queries
hannahbast
left a comment
There was a problem hiding this comment.
This looks very good now. I will wait for all the checks become green, revise the description, and then merge this
example-queriestobenchmark-queriesbecause that is really what this command is doing; the old functionality, which was a special case, is still there using the--example-queriesoption