Skip to content

Extend and refactor example-queries command, rename to benchmark-queries#126

Merged
hannahbast merged 65 commits intoqlever-dev:mainfrom
tanmay-9:add-evaluation-web-app
Jul 9, 2025
Merged

Extend and refactor example-queries command, rename to benchmark-queries#126
hannahbast merged 65 commits intoqlever-dev:mainfrom
tanmay-9:add-evaluation-web-app

Conversation

@tanmay-9
Copy link
Collaborator

@tanmay-9 tanmay-9 commented Feb 3, 2025

  1. Support input file (with SPARQL queries and a description for each) in both TSV and YML; refactor the parsing code accordingly
  2. Add option to generate a YML result file suitable for processing with our evaluation web app; see New Web Evaluation Application #171
  3. Add unit tests
  4. Rename from example-queries to benchmark-queries because that is really what this command is doing; the old functionality, which was a special case, is still there using the --example-queries option

@tanmay-9 tanmay-9 force-pushed the add-evaluation-web-app branch from 93f30bc to 168ad9a Compare February 12, 2025 01:00
@tanmay-9 tanmay-9 force-pushed the add-evaluation-web-app branch from 168ad9a to aeb646f Compare February 20, 2025 00:34
@tanmay-9 tanmay-9 force-pushed the add-evaluation-web-app branch from 13e71ab to c51bd68 Compare April 7, 2025 22:04
@tanmay-9 tanmay-9 force-pushed the add-evaluation-web-app branch from 78d2d4b to 9ca6b99 Compare April 26, 2025 15:24
@hannahbast
Copy link
Collaborator

@tanmay-9 Thanks a lot for the changes. One minor issue that is hopefully easy to fix: Right now, a "warning sign" is shown when the result size for one engine differs from the result size of the majority. However, for many of our benchmarks, the queries have only one result, which is a count. It would be very useful, if for those queries the warning sign would also appear if for one engine that count is different fron the majority.

@hannahbast hannahbast marked this pull request as ready for review May 2, 2025 16:11
Copy link
Collaborator

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1-1 with Tanmay, reviewing until (and including) example_queries.py

Comment on lines 64 to 75
help="Command to get example queries as TSV (description, query)",
)
subparser.add_argument(
"--queries-file",
type=str,
help=(
"Path to a YAML file containing queries. "
"The YAML file should have a top-level "
"key called 'queries', which is a list of dictionaries. "
"Each dictionary should contain 'query' for the query name "
"and 'sparql' for the SPARQL query."
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to have two arguments now (only of of which should be used at a time): --queries-yml, which reads from a YAML file, and --queries-tsv, which reads from a TSV file. The --get-queries-cmd option should go. It's functionality would still be there using --queries-tsv <(command that produces a TSV)

type=str,
default=None,
help=(
"Name that would be used for result yaml file. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Name that would be used for result yaml file. "
"Name that would be used for result YML file. "

Comment on lines 214 to 215
@staticmethod
def parse_queries_file(queries_file: str) -> dict[str, list[str, str]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be called parse_queries_yml, and there should be an analog method parse_queries_tsv

Please double-check that dict is always ordered and if yes, add a comment to clarify that


return data

def get_example_queries(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now have a different signature. It looks strange to me that this function is called the parse_.... function (of which there are now two). Instead, the calling code should call one of the parse_... functions and then call this function, which should then have a different name (and signature). Or should this function be a help function that is called by the parse_... functions`

log.error("Cannot have both --remove-offset-and-limit and --limit")
return False

dataset, engine = None, None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a short explanatory comment before this block (which i.p. explains what dataset and engine are)

Comment on lines 789 to 799
def get_record_for_yaml(
self,
query: str,
sparql: str,
client_time: float,
result: str | dict[str, str],
result_size: int | None,
accept_header: str,
) -> dict[str, Any]:
"""
Construct a dictionary with query information for yaml file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should have a better name and comment. In particular, the input file can be a YML as well, so yaml is not a unique reference

from qlever.log import log, mute_log
from qlever.util import run_command, run_curl_command

MAX_RESULT_SIZE = 20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an argument, e.g. max-results-output-file

Comment on lines 835 to 839
def get_query_results(
self, result_file: str, result_size: int, accept_header: str
) -> tuple[list[str], list[list[str]]]:
"""
Return headers and results as a tuple
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve name and comment. Since we have rather many helper functions now, they should have good and consistent names and clear comments (which does not mean long comments)

Comment on lines 885 to 886
graph = Graph()
graph.parse(result_file, format="turtle")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make prefix rdflib explicit

Comment on lines 895 to 901
@staticmethod
def write_query_data_to_yaml(
query_data: dict[str, list[dict[str, Any]]], out_file: Path
) -> None:
"""
Write yaml record for all queries to output yaml file
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito (regarding name and comment)

Copy link
Collaborator

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of reviewing

Comment on lines 230 to 248
def construct_get_queries_cmd(
queries_file: str, query_ids: str, query_regex: str, ui_config: str
) -> str:
"""
Parse a YAML file and validate its structure.
Construct get_queries_cmd from queries_tsv file if present or use
example queries by using ui_config. Use query_ids and query_regex to
filter the queries
"""
get_queries_cmd = (
f"cat {queries_file}"
if queries_file
else f"curl -sv https://qlever.cs.uni-freiburg.de/"
f"api/examples/{ui_config}"
)
sed_arg = query_ids.replace(",", "p;").replace("-", ",") + "p"
get_queries_cmd += f" | sed -n '{sed_arg}'"
if query_regex:
get_queries_cmd += f" | grep -Pi {shlex.quote(query_regex)}"
return get_queries_cmd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed that there should be an additional option --example-queries´ (and the whole command should be renamed to benchmark-queries, so that the code looks them same no matter whether you parse from TSV or YAML (just the parsing is different). The user should then *either* specify --example-queriesor one of thequeries-...`, and there should be an informative error message otherwise

Comment on lines 325 to 346
tsv_queries = []
for query_idx in query_indices:
if query_idx >= total_queries:
log.error(
"Make sure --query-ids don't exceed the total "
"queries in the YML file"
)
return []
query = data["queries"][query_idx]

# Only include queries that match the query_regex if present
if query_regex:
pattern = re.compile(query_regex, re.IGNORECASE)
if not any(
[
pattern.search(query["query"]),
pattern.search(query["sparql"]),
]
):
continue

tsv_queries.append(f"{query['query']}\t{query['sparql']}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should come after the parsing, to avoid code duplication. The two parsers provide the same intermediate representation of the queries, and that one can then filter based on query-ids and query-regex.

Comment on lines 387 to 410
if accept_header == "text/tab-separated-values":
result_size = run_command(
f"sed 1d {result_file}", return_output=True
)
if len(example_query_lines) == 0:
return []
example_query_lines = example_query_lines.splitlines()
return example_query_lines
except Exception as e:
log.error(f"Failed to get example queries: {e}")
return []
return []
elif accept_header == "application/qlever-results+json":
try:
# sed cmd to get the number between 2nd and 3rd double_quotes
result_size = run_command(
f"jq '.res[0]' {result_file}"
" | sed 's/[^0-9]*\\([0-9]*\\).*/\\1/'",
return_output=True,
)
except Exception as e:
error_msg = get_json_error_msg(e)
else:
try:
result_size = run_command(
f'jq -r ".results.bindings[0]'
f" | to_entries[0].value.value"
f' | tonumber" {result_file}',
return_output=True,
)
except Exception as e:
error_msg = get_json_error_msg(e)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this should be a separate function + it should be tested.

Comment on lines 414 to 450
if (
accept_header == "text/tab-separated-values"
or accept_header == "text/csv"
):
result_size = run_command(
f"sed 1d {result_file} | wc -l", return_output=True
)
elif accept_header == "text/turtle":
result_size = run_command(
f"sed '1d;/^@prefix/d;/^\\s*$/d' {result_file} | wc -l",
return_output=True,
)
elif accept_header == "application/qlever-results+json":
result_size = run_command(
f'jq -r ".resultsize" {result_file}',
return_output=True,
)
else:
try:
result_size = int(
run_command(
f'jq -r ".results.bindings | length"'
f" {result_file}",
return_output=True,
).rstrip()
)
except Exception as e:
error_msg = get_json_error_msg(e)
if result_size == 1:
try:
single_int_result = int(
run_command(
f'jq -e -r ".results.bindings[0][] | .value"'
f" {result_file}",
return_output=True,
).rstrip()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this should be a separate function + it should be tested.

Copy link
Collaborator

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1-1 with Tanmay, round 3. Some minor comments left. The web app should be factored out of this. Thanks a lot.

action="store_true",
default=False,
help=(
"Run the example-queries for the given --ui-config "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Run the example-queries for the given --ui-config "
"Run the example queries for the given --ui-config "

default=False,
help=(
"Run the example-queries for the given --ui-config "
"instead of the benchmark queries from a tsv/yml file"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"instead of the benchmark queries from a tsv/yml file"
"instead of the benchmark queries from a TSV or YML file"

Comment on lines 296 to 297
Parse the queries_tsv file and return a list of tab-separated queries
(query_description, full_sparql_query)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal representation should be more abstract, like a list of tuples.

Comment on lines 242 to 243
Given a tab-separated list of queries, filter them and keep the
ones which are a part of query_ids or match with query_regex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input should be a list of tuples, see below. Change name to filter_queries

Comment on lines 280 to 281
Execute the given bash command to fetch tsv queries and return a
list of tab-separated queries (query_description, full_sparql_query)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List of tuples as well

"""
When downloading the full result of a query with accept header as
application/sparql-results+json and result_size == 1, get the single
integer result value (if any)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
integer result value (if any)
integer result value (if any).

@hannahbast hannahbast changed the title Add evaluation web app Extend and refactor example-queries command, rename to benchmark-queries Jun 13, 2025
Copy link
Collaborator

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good now. I will wait for all the checks become green, revise the description, and then merge this

@hannahbast hannahbast merged commit 97f0855 into qlever-dev:main Jul 9, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants