-
Notifications
You must be signed in to change notification settings - Fork 19
Enhancements to user interface when using QL with row MultiIndex #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…o-understand predicate_row_aggregator argument
|
To clarify, the reason we need |
|
An example of where this aggregation argument is relevant. Example base-syntax query to match nodes with name "my_node" where aggregation does not need to be specified due to Equivalent string syntax query where specifying aggregation is necessary |
|
Matching a single node with name or |
…ter so that existing code does not break
|
@slabasan I'm removing this from the upcoming release. The enhancements added by this PR complicate the process of building queries from the string dialect because I need to know whether or not the DataFrame has a multi-index. Given the other work I have to do, trying to get this done in time for the release is likely not feasible. |
In #76, I added a new
multi_index_modeparameter toGraphFrame.filterand the query language to allow us to apply queries to GraphFrames where we have a rowMultiIndex. However, since then, there's been a lot of confusion about the parameter, what is does, and how to use it, especially in Thicket.This PR improves naming, simplifies default use, and enhances functionality of this feature. More specifically, this PR does 3 things:
multi_index_modetopredicate_row_aggregator, which more clearly indicates that the argument is used to aggregate per-row outputs from predicatespredicate_row_aggregatorQuery,ObjectQuery,StringQuery) to define a default aggregatorQueryEngine, which allows us to bypass all of this if we don't have a rowMultiIndexWith this PR, the
predicate_row_aggregatorargument now accepts the following:None: tells Hatchet to use the default aggregator for the type of query"off": tells Hatchet to not use any aggregators (note: this will result in errors if there is a rowMultiIndex)"all": applies an aggregator that returns true if and only if the predicate returned true for all rows associated with a node"any": applies an aggregator that returns true if the predicate returned true for any row associated with a nodepandas.Seriesof booleans as input and returns a boolean as output: applies the user-provided function as an aggregatorWhen using
predicate_row_aggregator=None, the aggregators used will be:"off"if using a base syntax query (corresponds to theQueryclass)"all"if using a object or string dialect query (corresponds to theObjectQueryandStringQueryclasses)