From a5d144be31e426203af48f8299a21e4af636903b Mon Sep 17 00:00:00 2001 From: Ian Lumsden Date: Fri, 12 Mar 2021 16:35:31 -0500 Subject: [PATCH 1/7] Adds an expanded documentation section on the query language --- docs/index.rst | 1 + docs/query_lang.rst | 225 ++++++++++++++++++++++++++++++++++++++++ docs/source/hatchet.rst | 4 +- 3 files changed, 228 insertions(+), 2 deletions(-) create mode 100644 docs/query_lang.rst diff --git a/docs/index.rst b/docs/index.rst index 7a488e88..39e12c0e 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -43,6 +43,7 @@ If you are new to hatchet and want to start using it, see :doc:`Getting Started getting_started user_guide + query_lang analysis_examples diff --git a/docs/query_lang.rst b/docs/query_lang.rst new file mode 100644 index 00000000..f5b17516 --- /dev/null +++ b/docs/query_lang.rst @@ -0,0 +1,225 @@ +.. Copyright 2017-2021 Lawrence Livermore National Security, LLC and other + Hatchet Project Developers. See the top-level LICENSE file for details. + + SPDX-License-Identifier: MIT + +************** +Query Language +************** + +As of version 1.2.0, Hatchet has a filtering query language that allows users to filter GraphFrames based on caller-callee relationships between nodes in the Graph. This query language contains two APIs: a high-level API that is expressed using built-in Python data types (e.g., lists, dictionaries, strings) and a low-level API that is expressed using Python callables. + +Regardless of API, queries in Hatchet represent abstract paths, or path patterns, within the Graph being filtered. When filtering on a query, Hatchet will identify all paths in the Graph that match the query. Then, it will return a new GraphFrame object containing only the nodes contained in the matched paths. A query is represented as a list of *abstract graph nodes*. Each *abstract graph node* is made of two parts: + +- A wildcard that specifies the number of real nodes to match to the abstract node +- A filter that is used to determine whether a real node matches the abstract node + +The primary differences between the two APIs are the representation of filters, how wildcards and filters are combined into *abstract graph nodes*, and how *abstract graph nodes* are combined into a full query. + +The following sections will describe the specifications for queries in both APIs and provide examples of how to use the query language. + +High-Level API +============== + +The high-level API for Hatchet's query language is designed to allow users to quickly write simple queries. It has a simple syntax based on built-in Python data types (e.g., lists, dictionaries, strings). The following subsections will describe each component of high-level queries. After creating a query, it can be used to filter a GraphFrame by passing it to the :code:`GraphFrame.filter` function as follows: + +.. code-block:: python + + query = + filtered_gf = gf.filter(query) + +Wildcards +~~~~~~~~~ + +Wildcards in the high-level API are specified by one of four possible values: + +- The string :code:`"."`, which means "match 1 node" +- The string :code:`"*"`, which means "match 0 or more nodes" +- The string :code:`"+"`, which means "match 1 or more nodes" +- An integer, which means "match exactly that number of nodes" (integer 1 is equivalent to :code:`"."`) + +Filters +~~~~~~~ + +Filters in the high-level API are specified by Python dictionaries. These dictionaries are keyed on the names of *node attributes*. These attributes' names are the same as the column names from the DataFrame associated with the GraphFrame being filtered (which can be obtained with :code:`gf.dataframe`). There are also two special attribute names: + +- `depth`, which filters on the depth of the node in the Graph +- `node_id`, which filters on the node's unique identifier within the GraphFrame + +The values in a high-level API filter dictionary define the conditions that must be passed to pass the filter. Their data types depend on the data type of the corresponding attribute. The table below describes what value data types are valid for different attribute data types. + ++----------------------------+--------------------------+------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ +| Attribute Data Type | Example Attributes | Valid Filter Value Types | Description of Condition | ++============================+==========================+================================================================================================+================================================================================================================+ +| Real (integer or float) | `time` | Real (integer or float) | Attribute value exactly equals filter value | ++ + +------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ +| | `time (inc)` | String starting with comparison operator | Attribute value must pass comparison described in filter value | ++----------------------------+--------------------------+------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ +| String | `name` | Regex String (see `Python re module `_ for details) | Attribute must match filter value (passed to `re.match `_) | ++----------------------------+--------------------------+------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ + +The values in a high-level API filter dictionary can also be iterables (e.g., lists, tuples) of the valid values defined in the table above. + +In the high-level API, all conditions (key-value pairs, including conditions contained in a list value) in a filter must pass for the a real node to match the corresponding *abstract graph node*. + +Abstract Graph Nodes +~~~~~~~~~~~~~~~~~~~~ + +In the high-level API, *abstract graph nodes* are represented by Python tuples containing a single wildcard and a single filter. Alternatively, an *abstract graph node* can be represented by only a single . When only providing a wildcard or a filter (and not both), the default is used for the other component. The defaults are as follows: + +- Wildcard: :code:`"."` (match 1 node) +- Filter: an "always-true" filter (any node passes this filter) + +Full Queries +~~~~~~~~~~~~ + +In the high-level API, a query is represented as a Python list of *abstract graph nodes*. In general, the following code can be used as a template to build a low-level query. + +.. code-block:: python + + query = [ + (wildcard1, query1), + (wildcard2, query2), + (wildcard3, query3) + ] + filtered_gf = gf.filter(query) + +Low-Level API +============= + +The low-level API for Hatchet's query language is designed to allow users to perform more complex queries. It's syntax is based on Python callables (e.g., functions, lambdas). The following subsections will describe each component of low-level queries. Like high-level queries, low-level queries can be used to filter a GraphFrame by passing it to the :code:`GraphFrame.filter` function as follows: + +.. code-block:: python + + query = + filtered_gf = gf.filter(query) + +Wildcards +~~~~~~~~~ + +Wildcards in the low-level API are the exact same as wildcards in the high-level API. The following values are currently allowed for wildcards: + +- The string :code:`"."`, which means "match 1 node" +- The string :code:`"*"`, which means "match 0 or more nodes" +- The string :code:`"+"`, which means "match 1 or more nodes" +- An integer, which means "match exactly that number of nodes" (integer 1 is equivalent to :code:`"."`) + +Filters +~~~~~~~ + +The biggest difference between the high-level and low-level APIs are how filters are represented. In the low-level API, filters are represented by Python callables. These callables should take one argument representing a node in the graph and should return a boolean stating whether or not the node satisfies the filter. The type of the argument to the callable depends on whether the :code:`GraphFrame.drop_index_levels` function was previously called. If this function was called, the type of the argument will be a :code:`pandas.Series`. This :code:`Series` will be the row representing a node in the internal :code:`pandas.DataFrame`. If the :code:`GraphFrame.drop_index_levels` function was not called, the type of the argument will be a :code:`pandas.DataFrame`. This :code:`DataFrame` will contain the rows of the internal :code:`pandas.DataFrame` representing a node. Multiple rows are returned in this case because the internal :code:`DataFrame` will contain one row for every thread and function call. + +For example, if you want to match nodes with an exclusive time (represented by "time" column) greater than 2 and an inclusive time (represented by "time (inc)" column) greater than 5, you could use the following filter. This filter assumes you have already called the :code:`GraphFrame.drop_index_levels` function. + +.. code-block:: python + + filter = lambda row: row["time"] > 2 and row["time (inc)"] > 5 + +Abstract Graph Nodes +~~~~~~~~~~~~~~~~~~~~ + +To build *abstract graph nodes* in the low-level API, you will first need to import Hatchet's :code:`QueryMatcher` class. This can be done with the following import. + +.. code-block:: python + + from hatchet import QueryMatcher + +The :code:`QueryMatcher` class has two functions that can be used to build *abstract graph nodes*. The first function is :code:`QueryMatcher.match`, which resets the query and constructs a new *abstract graph node* as the root of the query. The second function is :code:`QueryMatcher.rel`, which constructs a new *abstract graph node* and appends it to the query. Both of these functions take two arguments: a wildcard and a low-level filter. If either the filter or wildcard are not provided, the default will be used. The defaults are as follows: + +- Wildcard: :code:`"."` (match 1 node) +- Filter: an "always-true" filter (any node passes this filter) + +Both of these functions also return a reference to the :code:`self` parameter of the :code:`QueryMatcher` object. This allows :code:`QueryMatcher.match` and :code:`QueryMatcher.rel` to be chained together. + +Full Queries +~~~~~~~~~~~~ + +Full queries in the low-level API are built by making sucessive calls to the :code:`QueryMatcher.match` and :code:`QueryMatcher.rel` functions. In general, the following code can be used as a template to build a low-level query. + +.. code-block:: python + + from hatchet import QueryMatcher + + query = QueryMatcher().match(wildcard1, filter1) + .rel(wildcard2, filter2) + .rel(wildcard3, filter3) + filtered_gf = gf.filter(query) + +Compound Queries +================ + +*Compound queries is currently a development feature.* + +Compound queries allow users to apply some operation on the results of one or more queries. Currently, the following compound queries are available directly from :code:`hatchet.query`: + +- :code:`AndQuery` and :code:`IntersectionQuery` +- :code:`OrQuery` and :code:`UnionQuery` +- :code:`XorQuery` and :code:`SymDifferenceQuery` + +Additionally, the compound query feature provides the following abstract base classes that can be used by users to implement their own compound queries: + +- :code:`AbstractQuery` +- :code:`NaryQuery` + +The following subsections will describe each of these compound query classes. + +AbstractQuery +~~~~~~~~~~~~~ + +:code:`AbstractQuery` is an interface (i.e., abstract base class with no implementation) that defines the basic requirements for a query in the Hatchet query language. All query types, including user-created compound queries, must inherit from this class. + +NaryQuery +~~~~~~~~~ + +:code:`NaryQuery` is an abstract base class that inherits from :code:`AbstractQuery`. It defines the basic functionality and requirements for compound queries that perform one or more subqueries, collect the results of the subqueries, and performs some subclass defined operation to merge the results into a single result. Queries that inherit from :code:`NaryQuery` must implment the :code:`_perform_nary_op` function, which takes a list of results and should perform some operation on it. + +AndQuery +~~~~~~~~ + +The :code:`AndQuery` class can be used to perform two or more subqueries and compute the intersection of all the returned lists of matched nodes. To create an :code:`AndQuery`, simply create your subqueries (which can be high-level, low-level, or compound), and pass them to the :code:`AndQuery` constructor. The following code can be used as a template for creating an :code:`AndQuery`. + +.. code-block:: python + + from hatchet.query import AndQuery + + query1 = + query2 = + query3 = + and_query = AndQuery(query1, query2, query3) + filtered_gf = gf.filter(and_query) + +:code:`IntersectionQuery` is also provided as an alias (i.e., renaming) of :code:`AndQuery`. The two can be used interchangably. + +OrQuery +~~~~~~~~ + +The :code:`OrQuery` class can be used to perform two or more subqueries and compute the union of all the returned lists of matched nodes. To create an :code:`OrQuery`, simply create your subqueries (which can be high-level, low-level, or compound), and pass them to the :code:`OrQuery` constructor. The following code can be used as a template for creating an :code:`OrQuery`. + +.. code-block:: python + + from hatchet.query import OrQuery + + query1 = + query2 = + query3 = + or_query = OrQuery(query1, query2, query3) + filtered_gf = gf.filter(or_query) + +:code:`UnionQuery` is also provided as an alias (i.e., renaming) of :code:`OrQuery`. The two can be used interchangably. + +XorQuery +~~~~~~~~ + +The :code:`XorQuery` class can be used to perform two or more subqueries and compute the symmetric difference (set theory equivalent to XOR) of all the returned lists of matched nodes. To create an :code:`XorQuery`, simply create your subqueries (which can be high-level, low-level, or compound), and pass them to the :code:`XorQuery` constructor. The following code can be used as a template for creating an :code:`XorQuery`. + +.. code-block:: python + + from hatchet.query import XorQuery + + query1 = + query2 = + query3 = + xor_query = XorQuery(query1, query2, query3) + filtered_gf = gf.filter(xor_query) + +:code:`SymDifferenceQuery` is also provided as an alias (i.e., renaming) of :code:`XorQuery`. The two can be used interchangably. diff --git a/docs/source/hatchet.rst b/docs/source/hatchet.rst index 652d2e55..f6c028e4 100644 --- a/docs/source/hatchet.rst +++ b/docs/source/hatchet.rst @@ -47,10 +47,10 @@ hatchet.node module :undoc-members: :show-inheritance: -hatchet.query\_matcher module +hatchet.query module ----------------------------- -.. automodule:: hatchet.query_matcher +.. automodule:: hatchet.query :members: :undoc-members: :show-inheritance: From c3cea4f86795d0289864a5434931503de4383a29 Mon Sep 17 00:00:00 2001 From: ilumsden Date: Fri, 28 May 2021 13:31:20 -0400 Subject: [PATCH 2/7] Adds docstrings to compound queries --- hatchet/query.py | 78 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 73 insertions(+), 5 deletions(-) diff --git a/hatchet/query.py b/hatchet/query.py index 42478610..b2cc84bf 100644 --- a/hatchet/query.py +++ b/hatchet/query.py @@ -32,6 +32,14 @@ class AbstractQuery(ABC): @abstractmethod def apply(self, gf): + """Apply the query to a GraphFrame. + + Arguments: + gf (GraphFrame): the GraphFrame on which to apply the query. + + Returns: + (list): A list representing the set of nodes from paths that match this query. + """ pass def __and__(self, other): @@ -52,6 +60,11 @@ class NaryQuery(AbstractQuery): that acts on and merges N separate subqueries""" def __init__(self, *args): + """Create a new NaryQuery object. + + Arguments: + *args (tuple): the subqueries (high-level, low-level, or compound) to be performed. + """ self.subqueries = [] if isinstance(args[0], tuple) and len(args) == 1: args = args[0] @@ -67,10 +80,26 @@ def __init__(self, *args): ) @abstractmethod - def _perform_nary_op(self, query_results, gf): + def _perform_nary_op(self, query_results): + """Perform the NaryQuery subclass's designated operation on the results of the subqueries. + + Arguments: + query_results (list): the results of the subqueries. + + Returns: + (list): A list of nodes representing the result of applying the subclass-designated operation to the results of the subqueries. + """ pass def apply(self, gf): + """Apply the NaryQuery to a GraphFrame. + + Arguments: + gf (GraphFrame): the GraphFrame on which to apply the query. + + Returns: + (list): A list of nodes representing the result of applying the subclass-designated operation to the results of the subqueries. + """ results = [] for query in self.subqueries: results.append(query.apply(gf)) @@ -376,7 +405,7 @@ def apply(self, gf): gf (GraphFrame): the GraphFrame on which to apply the query. Returns: - (list): A list of lists representing the set of paths that match this query. + (list): A list representing the set of nodes from paths that match this query. """ self.search_cache = {} matches = [] @@ -646,6 +675,11 @@ class AndQuery(NaryQuery): of the subqueries""" def __init__(self, *args): + """Create a new AndQuery object. + + Arguments: + *args (tuple): the subqueries (high-level, low-level, or compound) to be performed. + """ if sys.version_info[0] == 2: super(AndQuery, self).__init__(args) else: @@ -654,11 +688,19 @@ def __init__(self, *args): raise BadNumberNaryQueryArgs("AndQuery requires 2 or more subqueries") def _perform_nary_op(self, query_results, gf): + """Perform an intersection operation on the results of the subqueries. + + Arguments: + query_results (list): the results of the subqueries. + + Returns: + (list): A list of nodes representing the intersection of the results of the subqueries. + """ intersection_set = set(query_results[0]).intersection(*query_results[1:]) return list(intersection_set) -# Alias of AndQuery to signify the relationship to set Intersection +"""Alias of AndQuery to signify the relationship to set Intersection""" IntersectionQuery = AndQuery @@ -667,6 +709,11 @@ class OrQuery(NaryQuery): of the subqueries""" def __init__(self, *args): + """Create a new OrQuery object. + + Arguments: + *args (tuple): the subqueries (high-level, low-level, or compound) to be performed. + """ if sys.version_info[0] == 2: super(OrQuery, self).__init__(args) else: @@ -675,11 +722,19 @@ def __init__(self, *args): raise BadNumberNaryQueryArgs("OrQuery requires 2 or more subqueries") def _perform_nary_op(self, query_results, gf): + """Perform an union operation on the results of the subqueries. + + Arguments: + query_results (list): the results of the subqueries. + + Returns: + (list): A list of nodes representing the union of the results of the subqueries. + """ union_set = set().union(*query_results) return list(union_set) -# Alias of OrQuery to signify the relationship to set Union +"""Alias of OrQuery to signify the relationship to set Union""" UnionQuery = OrQuery @@ -688,6 +743,11 @@ class XorQuery(NaryQuery): (i.e., set-based XOR) of the results of the subqueries""" def __init__(self, *args): + """Create a new XorQuery object. + + Arguments: + *args (tuple): the subqueries (high-level, low-level, or compound) to be performed. + """ if sys.version_info[0] == 2: super(XorQuery, self).__init__(args) else: @@ -696,13 +756,21 @@ def __init__(self, *args): raise BadNumberNaryQueryArgs("XorQuery requires 2 or more subqueries") def _perform_nary_op(self, query_results, gf): + """Perform a symmetric difference operation on the results of the subqueries. + + Arguments: + query_results (list): the results of the subqueries. + + Returns: + (list): A list of nodes representing the symmetric difference of the results of the subqueries. + """ xor_set = set() for res in query_results: xor_set = xor_set.symmetric_difference(set(res)) return list(xor_set) -# Alias of XorQuery to signify the relationship to set Symmetric Difference +"""Alias of XorQuery to signify the relationship to set Symmetric Difference""" SymDifferenceQuery = XorQuery From 25c9d1a0ee8cd2068245c2ab4f1d2cd3e094d93e Mon Sep 17 00:00:00 2001 From: ilumsden Date: Fri, 28 May 2021 13:38:15 -0400 Subject: [PATCH 3/7] Updates docstrings to account for new features --- hatchet/query.py | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/hatchet/query.py b/hatchet/query.py index b2cc84bf..1866a76e 100644 --- a/hatchet/query.py +++ b/hatchet/query.py @@ -80,11 +80,12 @@ def __init__(self, *args): ) @abstractmethod - def _perform_nary_op(self, query_results): + def _perform_nary_op(self, query_results, gf): """Perform the NaryQuery subclass's designated operation on the results of the subqueries. Arguments: query_results (list): the results of the subqueries. + gf (GraphFrame): the GraphFrame on which the query is applied. Returns: (list): A list of nodes representing the result of applying the subclass-designated operation to the results of the subqueries. @@ -692,6 +693,7 @@ def _perform_nary_op(self, query_results, gf): Arguments: query_results (list): the results of the subqueries. + gf (GraphFrame): the GraphFrame on which the query is applied. Returns: (list): A list of nodes representing the intersection of the results of the subqueries. @@ -726,6 +728,7 @@ def _perform_nary_op(self, query_results, gf): Arguments: query_results (list): the results of the subqueries. + gf (GraphFrame): the GraphFrame on which the query is applied. Returns: (list): A list of nodes representing the union of the results of the subqueries. @@ -760,6 +763,7 @@ def _perform_nary_op(self, query_results, gf): Arguments: query_results (list): the results of the subqueries. + gf (GraphFrame): the GraphFrame on which the query is applied. Returns: (list): A list of nodes representing the symmetric difference of the results of the subqueries. @@ -779,6 +783,11 @@ class NotQuery(NaryQuery): are not returned from the subquery.""" def __init__(self, *args): + """Create a new XorQuery object. + + Arguments: + *args (tuple): the subquery (high-level, low-level, or compound) to be performed. + """ if sys.version_info[0] == 2: super(NotQuery, self).__init__(args) else: @@ -787,6 +796,15 @@ def __init__(self, *args): raise BadNumberNaryQueryArgs("NotQuery requires exactly 1 subquery") def _perform_nary_op(self, query_results, gf): + """Collect all nodes in the graph not present in the query result. + + Arguments: + query_results (list): the result of the subquery. + gf (GraphFrame): the GraphFrame on which the query is applied. + + Returns: + (list): A list of all nodes not found in the subquery. + """ nodes = set(gf.graph.traverse()) query_nodes = set(query_results[0]) return list(nodes.difference(query_nodes)) From 181cc8971c4973be90fa9ade8a428faf33da03a8 Mon Sep 17 00:00:00 2001 From: ilumsden Date: Fri, 28 May 2021 13:49:01 -0400 Subject: [PATCH 4/7] Adds docstrings for compound query operators --- hatchet/query.py | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/hatchet/query.py b/hatchet/query.py index 1866a76e..97ddb949 100644 --- a/hatchet/query.py +++ b/hatchet/query.py @@ -43,15 +43,44 @@ def apply(self, gf): pass def __and__(self, other): + """Create an AndQuery with this query and another. + + Arguments: + other (GraphFrame): the other query to use in the AndQuery. + + Returns: + (AndQuery): A query object representing the intersection of the two queries. + """ return AndQuery(self, other) def __or__(self, other): + """Create an OrQuery with this query and another. + + Arguments: + other (GraphFrame): the other query to use in the OrQuery. + + Returns: + (OrQuery): A query object representing the union of the two queries. + """ return OrQuery(self, other) def __xor__(self, other): + """Create a XorQuery with this query and another. + + Arguments: + other (GraphFrame): the other query to use in the XorQuery. + + Returns: + (XorQuery): A query object representing the symmetric difference of the two queries. + """ return XorQuery(self, other) def __invert__(self): + """Create a NotQuery with this query. + + Returns: + (NotQuery): A query object representing all nodes that don't match this query. + """ return NotQuery(self) From 83ef2ada5460e778912551b7b5d884abe250a6b5 Mon Sep 17 00:00:00 2001 From: Ian Lumsden Date: Mon, 7 Jun 2021 13:30:45 -0400 Subject: [PATCH 5/7] Adds a table of features for query language levels, documentation for NotQuery, and some miscellaneous changes --- docs/query_lang.rst | 80 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 76 insertions(+), 4 deletions(-) diff --git a/docs/query_lang.rst b/docs/query_lang.rst index f5b17516..858754d3 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -3,6 +3,11 @@ SPDX-License-Identifier: MIT +.. + TODO: Add color to the Checkmarks and X's +.. |check| unicode:: U+02713 .. CHECK MARK +.. |cross| unicode:: U+02717 .. BALLOT X + ************** Query Language ************** @@ -14,7 +19,45 @@ Regardless of API, queries in Hatchet represent abstract paths, or path patterns - A wildcard that specifies the number of real nodes to match to the abstract node - A filter that is used to determine whether a real node matches the abstract node -The primary differences between the two APIs are the representation of filters, how wildcards and filters are combined into *abstract graph nodes*, and how *abstract graph nodes* are combined into a full query. +The primary differences between the two APIs are the representation of filters, how wildcards and filters are combined into *abstract graph nodes*, and how *abstract graph nodes* are combined into a full query. Some of these differences are shown in the table below: + +|check| + ++-------------------------------------------------------+--------------+--------------+---------------+ +| Feature | High-Level | Middle-Level | Low-Level | ++=======================================================+==============+==============+===============+ +| Wildcards to specify number of nodes to match | |check| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| String Attribute "Equivalence" Filters | |check| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| String Attribute "Begins With" Filters | |check| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| String Attribute "Ends With" Filters | |check| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| String Attribute "Contains" Filters | |check| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| String Attribute "Regex" Filters | |check| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| Number Attribute "Equivalence" Filters | |check| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| Number Attribute "Less-Than (or Equal-To)" Filters | |check| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| Number Attribute "Greater-Than (or Equal-To)" Filters | |check| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| Number Attribute "Is NaN" Filters | |cross| [1]_ | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| Number Attribute "Is Not NaN" Filters | |cross| [1]_ | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| Other Attribute Datatype Filters | |cross| | |cross| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| Combine Filters with AND for 1 Node | |check| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| Combine Filters with OR for 1 Node | |cross| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| Negate Filters for 1 Node | |cross| | |check| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ +| Other Ways of Combining Filters for 1 Node | |cross| | |cross| | |check| | ++-------------------------------------------------------+--------------+--------------+---------------+ The following sections will describe the specifications for queries in both APIs and provide examples of how to use the query language. @@ -84,6 +127,11 @@ In the high-level API, a query is represented as a Python list of *abstract grap ] filtered_gf = gf.filter(query) +Middle-Level API +================ + +In Progress + Low-Level API ============= @@ -155,6 +203,7 @@ Compound queries allow users to apply some operation on the results of one or mo - :code:`AndQuery` and :code:`IntersectionQuery` - :code:`OrQuery` and :code:`UnionQuery` - :code:`XorQuery` and :code:`SymDifferenceQuery` +- :code:`NotQuery` Additionally, the compound query feature provides the following abstract base classes that can be used by users to implement their own compound queries: @@ -176,7 +225,7 @@ NaryQuery AndQuery ~~~~~~~~ -The :code:`AndQuery` class can be used to perform two or more subqueries and compute the intersection of all the returned lists of matched nodes. To create an :code:`AndQuery`, simply create your subqueries (which can be high-level, low-level, or compound), and pass them to the :code:`AndQuery` constructor. The following code can be used as a template for creating an :code:`AndQuery`. +The :code:`AndQuery` class can be used to perform two or more subqueries and compute the intersection of all the returned lists of matched nodes. To create an :code:`AndQuery`, simply create your subqueries (which can be of any type), and pass them to the :code:`AndQuery` constructor. The following code can be used as a template for creating an :code:`AndQuery`. .. code-block:: python @@ -188,12 +237,14 @@ The :code:`AndQuery` class can be used to perform two or more subqueries and com and_query = AndQuery(query1, query2, query3) filtered_gf = gf.filter(and_query) +:code:`AndQuery` objects can also be created from two (and only two) subqueries using the binary AND operator (:code:`&`). + :code:`IntersectionQuery` is also provided as an alias (i.e., renaming) of :code:`AndQuery`. The two can be used interchangably. OrQuery ~~~~~~~~ -The :code:`OrQuery` class can be used to perform two or more subqueries and compute the union of all the returned lists of matched nodes. To create an :code:`OrQuery`, simply create your subqueries (which can be high-level, low-level, or compound), and pass them to the :code:`OrQuery` constructor. The following code can be used as a template for creating an :code:`OrQuery`. +The :code:`OrQuery` class can be used to perform two or more subqueries and compute the union of all the returned lists of matched nodes. To create an :code:`OrQuery`, simply create your subqueries (which can be of any type), and pass them to the :code:`OrQuery` constructor. The following code can be used as a template for creating an :code:`OrQuery`. .. code-block:: python @@ -205,12 +256,14 @@ The :code:`OrQuery` class can be used to perform two or more subqueries and comp or_query = OrQuery(query1, query2, query3) filtered_gf = gf.filter(or_query) +:code:`OrQuery` objects can also be created from two (and only two) subqueries using the binary OR operator (:code:`|`). + :code:`UnionQuery` is also provided as an alias (i.e., renaming) of :code:`OrQuery`. The two can be used interchangably. XorQuery ~~~~~~~~ -The :code:`XorQuery` class can be used to perform two or more subqueries and compute the symmetric difference (set theory equivalent to XOR) of all the returned lists of matched nodes. To create an :code:`XorQuery`, simply create your subqueries (which can be high-level, low-level, or compound), and pass them to the :code:`XorQuery` constructor. The following code can be used as a template for creating an :code:`XorQuery`. +The :code:`XorQuery` class can be used to perform two or more subqueries and compute the symmetric difference (set theory equivalent to XOR) of all the returned lists of matched nodes. To create an :code:`XorQuery`, simply create your subqueries (which can be of any type), and pass them to the :code:`XorQuery` constructor. The following code can be used as a template for creating an :code:`XorQuery`. .. code-block:: python @@ -222,4 +275,23 @@ The :code:`XorQuery` class can be used to perform two or more subqueries and com xor_query = XorQuery(query1, query2, query3) filtered_gf = gf.filter(xor_query) +:code:`XorQuery` objects can also be created from two (and only two) subqueries using the binary XOR operator (:code:`^`). + :code:`SymDifferenceQuery` is also provided as an alias (i.e., renaming) of :code:`XorQuery`. The two can be used interchangably. + +NotQuery +~~~~~~~~ + +The :code:`NotQuery` class can be used to get all nodes not captured by the one (and only one) subquery. To create a :code:`NotQuery`, simply create your subquery (which can be of any type), and pass them to the :code:`NotQuery` constructor. The following code can be used as a template for creating a :code:`NotQuery`. + +.. code-block:: python + + from hatchet.query import NotQuery + + query = + not_query = NotQuery(query) + filtered_gf = gf.filter(not_query) + +:code:`NotQuery` objects can also be created from the subquery using the binary NOT operator (:code:`~`). + +.. [1] The High-Level API cannot check for NaN because, in Python, NaN does not equal NaN. From 99c33ed598645f869744f85e4b63013c1fafcdbf Mon Sep 17 00:00:00 2001 From: Ian Lumsden Date: Mon, 7 Jun 2021 13:35:08 -0400 Subject: [PATCH 6/7] Removes a small floating character that was misplaced --- docs/query_lang.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/query_lang.rst b/docs/query_lang.rst index 858754d3..05afa2d9 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -21,8 +21,6 @@ Regardless of API, queries in Hatchet represent abstract paths, or path patterns The primary differences between the two APIs are the representation of filters, how wildcards and filters are combined into *abstract graph nodes*, and how *abstract graph nodes* are combined into a full query. Some of these differences are shown in the table below: -|check| - +-------------------------------------------------------+--------------+--------------+---------------+ | Feature | High-Level | Middle-Level | Low-Level | +=======================================================+==============+==============+===============+ From 16113ee31e7db098cb7d64ef8464e78d0c9ab18c Mon Sep 17 00:00:00 2001 From: Ian Lumsden Date: Tue, 8 Jun 2021 12:32:39 -0400 Subject: [PATCH 7/7] Current progress on middle-level docs --- docs/query_lang.rst | 40 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/docs/query_lang.rst b/docs/query_lang.rst index 05afa2d9..df0852b9 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -128,12 +128,48 @@ In the high-level API, a query is represented as a Python list of *abstract grap Middle-Level API ================ -In Progress +The middle-level API for Hatchet's query language is designed to allow users to perform more complex queries than the high-level API allows, while still being simpler than the low-level API. Its syntax is a slightly modified subset of the [Cypher Query Language](https://www.opencypher.org/) for property graph databases. As a result, the specification and ordering of wildcards and filters is different from the high- and low-level APIs. As with Cypher queries, middle-level queries have two components: +1. A Path Specification which defines and labels *abstract graph nodes*. Wildcards are specified here. +2. A Filter Specification which defines all the filters as one long boolean expression. + +The following subsections will describe each of these components and how to combine them into a full query. Like high-level queries, middle-level queries can be used to filter a GraphFrame by passing it to the :code:`GraphFrame.filter` function as follows: + +.. code-block:: python + + query = + filtered_gf = gf.filter(query) + +Path Specification +~~~~~~~~~~~~~~~~~~ + +The path specification defines the wildcards and order of abstract nodes. It also labels each abstract node with a variable name that will be used to refer to the node in the filter specification. This component of the query must start with the :code:`"MATCH"` keyword (case-sensitive). The rest of the specification consists of a series of abstract nodes and edges in the following general form: + +.. code-block:: cypher + + MATCH (...)-[...]->(...)-[...]->(...) + +In the above example, every instance of :code:`"..."` represents information for a particular abstract node. Each instance can be replaced with one of the following options: +1. Only a wildcard (same syntax as the high-level API) (e.g., :code:`"*"`) +2. A variable name to be used in the filter specification (e.g., :code:`p`) +3. Both a wildcard and a variable name, separated by a comma (e.g., :code:`"*", p`) + +Filter Specification +~~~~~~~~~~~~~~~~~~~~ + +The filter specification is used to specify all filters that define what real nodes can be matched to the abstract nodes defined in the path specification section. This is similar to the filters of the high- and low-level APIs, but, in the middle-level API, all the conditions are specified in a single boolean expression. Internally, the :code:`CypherQuery` class will convert the filter specification into individual filters. + +The filter specificaiton starts with the :code:`"WHERE"` keyword (case-sensitive) followed by the individual filters. Each individual filter has the following form: + +Full Queries +~~~~~~~~~~~~ + +Grammar +~~~~~~~ Low-Level API ============= -The low-level API for Hatchet's query language is designed to allow users to perform more complex queries. It's syntax is based on Python callables (e.g., functions, lambdas). The following subsections will describe each component of low-level queries. Like high-level queries, low-level queries can be used to filter a GraphFrame by passing it to the :code:`GraphFrame.filter` function as follows: +The low-level API for Hatchet's query language is designed to allow users to perform more complex queries. Its syntax is based on Python callables (e.g., functions, lambdas). The following subsections will describe each component of low-level queries. Like high-level queries, low-level queries can be used to filter a GraphFrame by passing it to the :code:`GraphFrame.filter` function as follows: .. code-block:: python