-
Notifications
You must be signed in to change notification settings - Fork 55
In Network Query Processing
This Page describes briefly the encoding of queries for the In-Network Query Processing component (INQP). The implementation can be found in wiselib.testing/algorithms/rdf/inqp/.
A query consists of the following:
- A numeric id (one byte, assumed to uniquely identify the query during its existence in the network)
- A set of operators
General things to note:
- Values are always 4 byte long and can be either integers, floats or strings (identified by hash value)
- For hash we use SDBM, see http://www.cse.yorku.ca/~oz/hash.html and wiselib.testing/algorithms/hash/sdbm.h
Operators and their parameters are encoded in a compact format and can be transmitted individually to the INQP communicator. INQP will start processing the query once its set of operators is complete. All operator types have a common set of base fields, described below. Depending on the operator type, each operator also features different options. Operators form a query tree, each operator has 2 input ports and one output (for some operators only the left input port is being used). Each operator implicitely also defines a subsequent projection (this way, unenecessary allocations for throw-away columns are avoided).
+----------+--------------+-----------+-----------------+-----------------+--------------------------+ | QUERY ID | OPERATOOR ID | TYPE | PARENT ID | PROJECTION INFO | Type-specific options... | +----------+--------------+-----------+-----------------+-----------------+--------------------------+ | 1 byte | 1 byte | 1 byte | 1 byte |4 bytes | | +----------+--------------+-----------+-----------------+-----------------+--------------------------+
Query Id
Numerical query id, see above
Operator Id
Numerical id of this operator. Must be unique in this query tree. 0 is reserved for special purposes, operators should be numbered such that operators higher in the tree have a higher operator id.
Type
Operator type, one of
====== ===========
Value Meaning
------ -----------
'g' Graph Pattern Selection
's' (General) Selection
'j' Simple Local Join
'c' Collect
'C' Construct
'a' Aggregate
'D' Delete
====== ===========
Parent Id
`````
The set of operators in a query form a directed tree, in general, data is input at the
leaves and output at the root.
The parent id specifies where to connect the output port of the current
operator, that is a target operator (by operator id) plus a port (left or right).
The MSB defines the parent port, the remaining 7 bits the parent operator id, i.e.::
parent_id = (parent_op_id & 0x7f) | (left ? 0 : 0x80)
By definition the root of the operator tree has no parent which is denoted by
parent_id = 0.
Projection Info
The projection info defines for each output column of the operator it is attached to, whether to keep it or to drop it. For easier processing, the projection info also re-states the type of each column it defines to be kept. Data in INQP can have three types: INTEGER, FLOAT or STRING.
For each column, 2 bits are stored, defining whether to drop/ignore the column or to keep it and if so, its type.
=== === ====== Hi Low Meaning
0 0 IGNORE 0 1 INTEGER 1 0 FLOAT 1 1 STRING === === ======
The first output column is defined by the two least-significant bits of the first byte of the projection info, the second column by the 3rd and 4th bit, etc...
Type Selection
Inputs 0
Outputs 1
Options length 4, 8 or 12 byte
+---------+-----------+-----------+-----------+ | BITMASK | (VALUE 1) | (VALUE 2) | (VALUE 3) | +---------+-----------+-----------+-----------+ | 1 byte | 4 byte | 4 byte | 4 byte | +---------+-----------+-----------+-----------+
The Graph Patterns Selection (GPS) is (currently) the only operator that is allowed to appear as a leaf in the query tree. It never has childs and receives its data directly from the tuple store. It will select all tuples that match the specified pattern. The selection pattern is specified as follows: BITMASK defines which attributes of the input data are to be compared (bit 3 = subject, bit 2 = predicate, bit 1 = LSB = object). According to that bitmask a number of comparison values follows that will be used to compare the according column to.
E.g::
(?foo 0x1234 0x5678) translates into
{ BIN(011), 0, 0, 0x12, 0x34, 0, 0, 0x56, 0x78 }
In contrast to other operators the projection info in GPS is not only used for (re-)stating the type of column to be output but can also be used to trigger a conversion from the string-only tuplestore.
Type Selection
Inputs 1
Outputs 1
Options length 2..? byte
+-----------+-----------+------------------------+ | #CRITERIA | #VALUES | Criteria... Values... | +-----------+-----------+------------------------+ | 1 byte | 1 byte | | +-----------+-----------+------------------------+
The general selection allows more sophisticated comparisons. It defines a number of critera, followed by a number of values.
The first criterion always refers to the first input column and defines how to compare it and to what. If the criterion does not set the AGAIN flag (MSB), the next criterion will refer to the following column. If it does set the AGAIN flag, it will refer to the same column.
+-------+-----------------+---------------+ | AGAIN | CRITERION | VALUE INDEX | +-------+-----------------+---------------+ | 1 bit | 4 bit | 1 bit | +-------+-----------------+---------------+ | MSB LSB | +-----------------------------------------+
The table belows gives possible values for a criterion field, that can be or'ed together (i.e. 0x20 | 0x80 | 3 means "the current attribute has to be > than the attribute with index 3, another criterion for this attribute will follow").
===== ======== 0x00 Ignore (always matches) 0x08 == 0x10 < 0x18 <= 0x20 > 0x28 >= 0x30 !=
0x80 AGAIN
0-7 Value index ===== ========
Type Join
Inputs 2
Outputs 1
Options length 1 byte
The simple local join simply compares an attribute of its left child with an attribute of its right child (for equality). Both attribute indices are identified by a single byte, the left attribute index in the high bits, the right one in the low bits, i.e.::
byte = (left_attr_index << 4) | right_attr_index;
Type Distributed
Inputs 1
Outputs 0
Options length 0 byte
The collect operator simple sends its input tuple to the sink for further processing (sorting, non-local joins etc...).
Type Distributed Aggregation
Inputs 1
Outputs 0
Options length 1..? byte
===== ======== 0 GROUP BY this attribute 1 SUM() 2 AVG() 3 COUNT() 4 MIN() 5 MAX()
0x80 AGAIN ===== ========
+------------+---------------------------+ | #OPERATORS | Aggregation Operations... | +------------+---------------------------+ | 1 byte | | +------------+---------------------------+