-
Notifications
You must be signed in to change notification settings - Fork 3.9k
SQL Filter Expressions for Streams #14110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
ansd
wants to merge
1
commit into
main
Choose a base branch
from
sql-stream
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+6,108
−151
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
07a03f3
to
923f097
Compare
## What? This commit allows AMQP 1.0 clients to define SQL-like filter expressions when consuming from streams, enabling server-side message filtering. RabbitMQ will only dispatch messages that match the provided filter expression, reducing network traffic and client-side processing overhead. SQL filter expressions are a more powerful alternative to the [AMQP Property Filter Expressions](https://www.rabbitmq.com/blog/2024/12/13/amqp-filter-expressions) introduced in RabbitMQ 4.1. SQL filter expressions are based on the [JMS message selector syntax](https://jakarta.ee/specifications/messaging/3.1/jakarta-messaging-spec-3.1#message-selector-syntax) and support: * Comparison operators (`=`, `<>`, `>`, `<`, `>=`, `<=`) * Logical operators (`AND`, `OR`, `NOT`) * Arithmetic operators (`+`, `-`, `*`, `/`) * Special operators (`BETWEEN`, `LIKE`, `IN`, `IS NULL`) * Access to the properties and application-properties sections **Examples** Simple expression: ```sql header.priority > 4 ``` Complex expression: ```sql order_type IN ('premium', 'express') AND total_amount BETWEEN 100 AND 5000 AND (customer_region LIKE 'EU-%' OR customer_region = 'US-CA') AND properties.creation-time >= 1750772279000 AND NOT cancelled ``` Like AMQP property filter expressions, SQL filter expressions can be combined with Bloom filters. Combining both allows for highly customisable expressions (SQL) and extremely fast evaluation (Bloom filter) if only a subset of the chunks need to be read from disk. ## Why? Compared to AMQP property filter expressions, SQL filter expressions provide the following advantage: * High expressiveness and flexibility in defining the filter Like for AMQP property filter expressions, the following advantages apply: * No false positives (as is the case for Bloom filters) * Multiple concurrent clients can attach to the same stream each consuming only a specific subset of messages while preserving message order. * Low network overhead as only messages that match the filter are transferred to the client * Likewise, lower resource usage (CPU and memory) on clients since they don't need to deserialise messages that they are not interested in. * If the SQL expression is simple, even the broker will save resources because it doesn't need to serialse and send messages that the client isn't interested in. ## How? ### JMS Message Selector Syntax vs. AMQP Extension Spec The AMQP Filter Expressions Version 1.0 extension Working Draft 09 defines SQL Filter Expressions in Section 6. This spec differs from the JMS message selector spec. Neither is a subset of the other. We can choose to follow either. However, I think it makes most sense to follow the JMS spec because: * The JMS spec is better defined * The JMS spec is far more widespread than the AMQP Working Draft spec. (A slight variation of the AMQP Working Draft is used by Azure Service Bus: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-messaging-sql-filter) * The JMS spec is mostly simpler (partly because matching on only simple types) * This will allow for a single SQL parser in RabbitMQ for both AMQP clients consuming from a stream and possibly in future for JMS clients consuming from queues or topics. <details> <summary>AMQP extension spec vs JMS spec</summary> AMQP != is synonym for <> JMS defines only <> Conclusion <> is sufficient AMQP Strings can be tested for “greater than” “both operands are of type string or of type symbol (any combination is permitted) and the lexicographical rank of the left operand is greater than the lexicographical rank of the right operand” JMS “String and Boolean comparison is restricted to = and <>.” Conclusion The JMS behaviour is sufficient. AMQP IN <set-expression> set-expression can contain non-string literals JMS: set-expression can contain only string literals Conclusion The JMS behaviour is sufficient. AMQP EXISTS predicate to check for composite types JMS Only simple types Conclusion We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section. AMQP: Modulo operator % Conclusion JMS doesn't define the modulo operator. Let's start without it. We can decide in future to add support since it can actually be useful, for example for two receivers who want to process every other message. AMQP: The ‘+’ operator can concatenate string and symbol values Conclusion Such string concatenation isn't defined in JMS. We don't need it. AMQP: Define NAN and INF JMS: “Approximate literals use the Java floating-point literal syntax.” Examples include "7." Conclusion We can go with the JMS spec given that needs to be implemented anyway for JMS support. Scientific notations are supported in both the AMQP spec and JMS spec. AMQP String literals can be surrounded by single or double quotation marks JMS A string literal is enclosed in single quotes Conclusion Supporting single quotes is good enough. AMQP “A binary constant is a string of pairs of hexadecimal digits prefixed by ‘0x’ that are not enclosed in quotation marks” Conclusion JMS doesn't support binary constants. We can start without binary constants. Matching against binary values are still supported if these binary values can be expressed as UTF-8 strings. AMQP Functions DATE, UTC, SUBSTRING, LOWER, UPPER, LEFT, RIGHT Vendor specific functions Conclusion JMS doesn't define such functions. We can start without those functions. AMQP <field><array_element_reference> <field>‘.’<composite_type_reference> to access map and array elements Conclusion Same as above: We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section. AMQP allows for delimited identifiers JMS Java identifier part characters Conclusion We can go with the Java identifiers extending the allowed characters by `.` and `-` to reference field names such as `properties.group-id`. JMS: BETWEEN operator Conclusion The BETWEEN operator isn't supported in the AMQP spec. Let's support it as convenience since it's already available in JMS. </details> ### Filter Name The client provides a filter with name `sql-filter` instead of name `jms-selector` to allow to differentiate between JMS clients and other native AMQP 1.0 clients using SQL expressions. This way, we can also optionally extend the SQL grammar in future. ### Identifiers JMS message selectors allow identifiers to contain some well known JMS headers that match to well known AMQP fields, for example: ```erl jms_header_to_amqp_field_name(<<"JMSDeliveryMode">>) -> durable; jms_header_to_amqp_field_name(<<"JMSPriority">>) -> priority; jms_header_to_amqp_field_name(<<"JMSMessageID">>) -> message_id; jms_header_to_amqp_field_name(<<"JMSTimestamp">>) -> creation_time; jms_header_to_amqp_field_name(<<"JMSCorrelationID">>) -> correlation_id; jms_header_to_amqp_field_name(<<"JMSType">>) -> subject; %% amqp-bindmap-jms-v1.0-wd10 § 3.2.2 JMS-defined ’JMSX’ Properties jms_header_to_amqp_field_name(<<"JMSXUserID">>) -> user_id; jms_header_to_amqp_field_name(<<"JMSXGroupID">>) -> group_id; jms_header_to_amqp_field_name(<<"JMSXGroupSeq">>) -> group_sequence; ``` This commit does a similar matching for `header.` and `properties.` prefixed identifiers to field names in the AMQP property section. The only field that is supported to filter on in the AMQP header section is `priority`, that is identifier `header.priority`. By default, as described in the AMQP extension spec, if an identifier is not prefixed, it refers to a key in the application-properties section. Hence, all identifiers prefixed with `header.`, and `properties.` have special meanings and MUST be avoided by applications unless they want to refer to those specific fields. Azure Service Bus uses the `sys.` and `user.` prefixes for well known field names and arbitrary application-provided keys, respectively. ### SQL lexer, parser and evaluator This commit implements the SQL lexer and parser in files rabbit_jms_selector_lexer.xrl and rabbit_jms_selector_parser.yrl, respectively. Advantages: * Both the definitions in the lexer and the grammar in the parser are defined **declaratively**. * In total, the entire SQL syntax and grammar is defined in only 240 lines. * Therefore, lexer and parser are simple to maintain. The idea of this commit is to use the same lexer and parser for native AMQP clients consumings from streams (this commit) as for JMS clients (in the future). All native AMQP client vs JMS client bits are then manipulated after the Abstract Syntax Tree (AST) has been created by the parser. For example, this commit transforms the AST specifically for native AMQP clients by mapping `properties.` prefixed identifiers (field names) to atoms. A JMS client's mapping from `JMS` prefixed headers can transform the AST differently. Likewise, this commit transforms the AST to compile a regex for complex LIKE expressions when consuming from a stream while a future version might not want to compile a regex when consuming from quorum queues. Module `rabbit_jms_ast` provides such AST helper methods. The lexer and parser are not performance critical as this work happens upon receivers attaching to the stream. The evaluator however is performance critical as message evaluation happens on the hot path. ### LIKE expressions The evaluator has been optimised to only compile a regex when necessary. If the LIKE expression-value contains no wildcard or only a single `%` wildcard, Erlang pattern matching is used as it's more efficient. Since `_` can match any UTF-8 character, a regex will be compiled with the `[unicode]` options. ### Filter errors Any errors upon a receiver attaching to a stream causes the filter to not become active. RabbitMQ will log a warning describing the reason and will omit the named filter in its attach reply frame. The client lib is responsible for detaching the link as explained in the AMQP spec: > The receiving endpoint sets its desired filter, the sending endpoint sets the filter actually in place (including any filters defaulted at the node). The receiving endpoint MUST check that the filter in place meets its needs and take responsibility for detaching if it does not. This applies to lexer and parser errors. Errors during message evaluation will result in an unknown value. Conditional operators on unknown are described in the JMS spec. If the entire selector condition is unknown, the message does not match, and will therefore not be delivered to the client. ## Clients Support for passing the SQL expression from app to broker is provided by the Java client in rabbitmq/rabbitmq-amqp-java-client#216
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What?
This commit allows AMQP 1.0 clients to define SQL-like filter expressions when consuming from streams, enabling server-side message filtering.
RabbitMQ will only dispatch messages that match the provided filter expression, reducing network traffic and client-side processing overhead.
SQL filter expressions are a more powerful alternative to the AMQP Property Filter Expressions introduced in RabbitMQ 4.1.
SQL filter expressions are based on the JMS message selector syntax and support:
=
,<>
,>
,<
,>=
,<=
)AND
,OR
,NOT
)+
,-
,*
,/
)BETWEEN
,LIKE
,IN
,IS NULL
)Examples
Simple expression:
Complex expression:
Like AMQP property filter expressions, SQL filter expressions can be
combined with Bloom filters. Combining both allows for highly customisable
expressions (SQL) and extremely fast evaluation (Bloom filter) if only a
subset of the chunks need to be read from disk.
Why?
Compared to AMQP property filter expressions, SQL filter expressions provide the following advantage:
Like for AMQP property filter expressions, the following advantages apply:
only a specific subset of messages while preserving message order.
transferred to the client
don't need to deserialise messages that they are not interested in.
because it doesn't need to serialse and send messages that the client
isn't interested in.
How?
JMS Message Selector Syntax vs. AMQP Extension Spec
The AMQP Filter Expressions Version 1.0 extension Working Draft 09 defines SQL Filter Expressions in Section 6.
This spec differs from the JMS message selector spec. Neither is a subset of the other. We can choose to follow either.
However, I think it makes most sense to follow the JMS spec because:
Draft is used by Azure Service Bus: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-messaging-sql-filter)
AMQP extension spec vs JMS spec
AMQP
!= is synonym for <>
JMS
defines only <>
Conclusion
<> is sufficient
AMQP
Strings can be tested for “greater than”
“both operands are of type string or of type symbol (any combination is permitted) and
the lexicographical rank of the left operand is greater than the lexicographical rank of the right operand”
JMS
“String and Boolean comparison is restricted to = and <>.”
Conclusion
The JMS behaviour is sufficient.
AMQP
IN
set-expression can contain non-string literals
JMS:
set-expression can contain only string literals
Conclusion
The JMS behaviour is sufficient.
AMQP
EXISTS predicate to check for composite types
JMS
Only simple types
Conclusion
We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section.
AMQP:
Modulo operator %
Conclusion
JMS doesn't define the modulo operator. Let's start without it.
We can decide in future to add support since it can actually be useful, for example for two receivers who want to process every other message.
AMQP:
The ‘+’ operator can concatenate string and symbol values
Conclusion
Such string concatenation isn't defined in JMS. We don't need it.
AMQP:
Define NAN and INF
JMS:
“Approximate literals use the Java floating-point literal syntax.”
Examples include "7."
Conclusion
We can go with the JMS spec given that needs to be implemented anyway
for JMS support.
Scientific notations are supported in both the AMQP spec and JMS spec.
AMQP
String literals can be surrounded by single or double quotation marks
JMS
A string literal is enclosed in single quotes
Conclusion
Supporting single quotes is good enough.
AMQP
“A binary constant is a string of pairs of hexadecimal digits prefixed by ‘0x’ that are not enclosed in quotation marks”
Conclusion
JMS doesn't support binary constants. We can start without binary constants.
Matching against binary values are still supported if these binary values can be expressed as UTF-8 strings.
AMQP
Functions DATE, UTC, SUBSTRING, LOWER, UPPER, LEFT, RIGHT
Vendor specific functions
Conclusion
JMS doesn't define such functions. We can start without those functions.
AMQP
<array_element_reference>
‘.’<composite_type_reference>
to access map and array elements
Conclusion
Same as above:
We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section.
AMQP
allows for delimited identifiers
JMS
Java identifier part characters
Conclusion
We can go with the Java identifiers extending the allowed characters by
.
and-
to reference field names such asproperties.group-id
.JMS:
BETWEEN operator
Conclusion
The BETWEEN operator isn't supported in the AMQP spec. Let's support it as convenience since it's already available in JMS.
Filter Name
The client provides a filter with name
sql-filter
instead of namejms-selector
to allow to differentiate between JMS clients and othernative AMQP 1.0 clients using SQL expressions. This way, we can also
optionally extend the SQL grammar in future.
Identifiers
JMS message selectors allow identifiers to contain some well known JMS headers that match to well known AMQP fields, for example:
This commit does a similar matching for
header.
andproperties.
prefixed identifiers to field names in the AMQP property section.The only field that is supported to filter on in the AMQP header section is
priority
, that is identifierheader.priority
.By default, as described in the AMQP extension spec, if an identifier is not prefixed, it refers to a key in the application-properties section.
Hence, all identifiers prefixed with
header.
, andproperties.
have special meanings and MUST be avoided by applications unless they want to refer to those specific fields.Azure Service Bus uses the
sys.
anduser.
prefixes for well known field names and arbitrary application-provided keys, respectively.SQL lexer, parser and evaluator
This commit implements the SQL lexer and parser in files rabbit_jms_selector_lexer.xrl and
rabbit_jms_selector_parser.yrl, respectively.
Advantages:
The idea of this commit is to use the same lexer and parser for native AMQP clients consumings
from streams (this commit) as for JMS clients (in the future).
All native AMQP client vs JMS client bits are then manipulated after
the Abstract Syntax Tree (AST) has been created by the parser.
For example, this commit transforms the AST specifically for native AMQP clients
by mapping
properties.
prefixed identifiers (field names) to atoms.A JMS client's mapping from
JMS
prefixed headers can transform the ASTdifferently.
Likewise, this commit transforms the AST to compile a regex for complex LIKE
expressions when consuming from a stream while a future version
might not want to compile a regex when consuming from quorum queues.
Module
rabbit_jms_ast
provides such AST helper methods.The lexer and parser are not performance critical as this work happens
upon receivers attaching to the stream.
The evaluator however is performance critical as message evaluation
happens on the hot path.
LIKE expressions
The evaluator has been optimised to only compile a regex when necessary.
If the LIKE expression-value contains no wildcard or only a single
%
wildcard, Erlang pattern matching is used as it's more efficient.
Since
_
can match any UTF-8 character, a regex will be compiled withthe
[unicode]
options.Filter errors
Any errors upon a receiver attaching to a stream causes the filter to
not become active. RabbitMQ will log a warning describing the reason and
will omit the named filter in its attach reply frame. The client lib is
responsible for detaching the link as explained in the AMQP spec:
This applies to lexer and parser errors.
Errors during message evaluation will result in an unknown value.
Conditional operators on unknown are described in the JMS spec. If the
entire selector condition is unknown, the message does not match, and
will therefore not be delivered to the client.
Clients
Support for passing the SQL expression from app to broker is provided by
the Java client in rabbitmq/rabbitmq-amqp-java-client#216