Enhancements to user interface when using QL with row MultiIndex #152

ilumsden · 2024-11-08T03:09:13Z

In #76, I added a new multi_index_mode parameter to GraphFrame.filter and the query language to allow us to apply queries to GraphFrames where we have a row MultiIndex. However, since then, there's been a lot of confusion about the parameter, what is does, and how to use it, especially in Thicket.

This PR improves naming, simplifies default use, and enhances functionality of this feature. More specifically, this PR does 3 things:

Renames multi_index_mode to predicate_row_aggregator, which more clearly indicates that the argument is used to aggregate per-row outputs from predicates
Expands the acceptable values to predicate_row_aggregator
Adds a new mechanism that allows the query classes (i.e., Query, ObjectQuery, StringQuery) to define a default aggregator
Moves logic for applying aggregators to QueryEngine, which allows us to bypass all of this if we don't have a row MultiIndex

With this PR, the predicate_row_aggregator argument now accepts the following:

None: tells Hatchet to use the default aggregator for the type of query
"off": tells Hatchet to not use any aggregators (note: this will result in errors if there is a row MultiIndex)
"all": applies an aggregator that returns true if and only if the predicate returned true for all rows associated with a node
"any": applies an aggregator that returns true if the predicate returned true for any row associated with a node
Callable that takes a pandas.Series of booleans as input and returns a boolean as output: applies the user-provided function as an aggregator

When using predicate_row_aggregator=None, the aggregators used will be:

"off" if using a base syntax query (corresponds to the Query class)
"all" if using a object or string dialect query (corresponds to the ObjectQuery and StringQuery classes)
the default aggregators for each subquery if using a compound query

…o-understand predicate_row_aggregator argument

ilumsden · 2024-11-13T18:08:02Z

To clarify, the reason we need multi_index_mode/predicate_row_aggregator is because the graph algorithm-part of the query language needs predicates to provide a single boolean for each node. When we do not have a row MultiIndex (i.e., the standard case for Hatchet), this requirement is always satisfied. However, when we do have a row MultiIndex (i.e., the standard case for Thicket), this requirement is never satisfied because we have multiple rows in the DataFrame per node. As a result, predicates will return a pandas.Series of booleans when we have a row MultiIndex. The multi_index_mode/predicate_row_aggregator argument provides a mechanism to aggregate that Series of booleans into a single boolean.

michaelmckinsey1 · 2024-11-13T23:03:28Z

An example of where this aggregation argument is relevant. Example base-syntax query to match nodes with name "my_node" where aggregation does not need to be specified due to .all()

query = th.query.Query().match(
    "*",
    lambda row: row["name"].apply(
        lambda tn: tn == "my_node"
    ).all()
)
tkq = tk.query(query)

Equivalent string syntax query where specifying aggregation is necessary

query = """
MATCH ("*")->(n) WHERE n."name"="my_node"
"""
filt = tk.query(query, predicate_row_aggregator="all")

michaelmckinsey1 · 2024-11-15T17:36:55Z

Matching a single node with name my_node

query = th.query.Query().match(
    1,
    lambda row: row["name"].apply(
        lambda tn: tn == "my_node"
    ).all()
)
tkq = tk.query(query)

or

query = th.query.Query().match(
    ".",
    lambda row: row["name"].apply(
        lambda tn: tn == "my_node"
    ).all()
)
tkq = tk.query(query)

…ter so that existing code does not break

ilumsden · 2025-03-14T19:08:53Z

@slabasan I'm removing this from the upcoming release. The enhancements added by this PR complicate the process of building queries from the string dialect because I need to know whether or not the DataFrame has a multi-index. Given the other work I have to do, trying to get this done in time for the release is likely not feasible.

Replaces multi_index_mode in QL with a more customizable and easier-t…

12ed4d1

…o-understand predicate_row_aggregator argument

ilumsden self-assigned this Nov 8, 2024

ilumsden added 9 commits November 7, 2024 22:17

Fixes unit tests

f65fb5b

Formatting

8e07660

Removes MultiIndexModeMismatch

86da99c

Fixes logic for handling string values of predicate_row_aggregator

7ad292e

Formatting

9159da0

Fixes a condition to properly parse the default aggregators

3be6276

Fixes a few testing bugs

0c5c52b

Formatting

cae50b5

Restores special logic for multi-index in the string dialect

bad0410

ilumsden added this to the 2025.1.0 milestone Mar 11, 2025

ilumsden added 10 commits March 14, 2025 11:57

Updates GraphFrame.filter's docstring

db68dbc

Fixes formatting

42219a1

Restores multi_index_mode as a deprecated parameter to GraphFrame.fil…

216afea

…ter so that existing code does not break

Yet more formatting

c851fde

Fixes unit tests to properly pass predicate_row_aggregator

994f317

Even more formatting because Black sucks

a1ccdf4

Removes some spaces that accidentally got left over

e1eb8b9

Fixes a few bugs in testing

0fcdb09

Formatting, yet again

241a823

More formatting that magically appeared

d96d8f2

ilumsden removed this from the 2025.1.0 milestone Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancements to user interface when using QL with row MultiIndex #152

Enhancements to user interface when using QL with row MultiIndex #152

Uh oh!

ilumsden commented Nov 8, 2024

Uh oh!

ilumsden commented Nov 13, 2024

Uh oh!

michaelmckinsey1 commented Nov 13, 2024

Uh oh!

michaelmckinsey1 commented Nov 15, 2024

Uh oh!

ilumsden commented Mar 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enhancements to user interface when using QL with row MultiIndex #152

Are you sure you want to change the base?

Enhancements to user interface when using QL with row MultiIndex #152

Uh oh!

Conversation

ilumsden commented Nov 8, 2024

Uh oh!

ilumsden commented Nov 13, 2024

Uh oh!

michaelmckinsey1 commented Nov 13, 2024

Uh oh!

michaelmckinsey1 commented Nov 15, 2024

Uh oh!

ilumsden commented Mar 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants