SQL Filter Expressions for Streams #14110

ansd · 2025-06-23T14:08:23Z

What?

This commit allows AMQP 1.0 clients to define SQL-like filter expressions when consuming from streams, enabling server-side message filtering.
RabbitMQ will only dispatch messages that match the provided filter expression, reducing network traffic and client-side processing overhead.
SQL filter expressions are a more powerful alternative to the AMQP Property Filter Expressions introduced in RabbitMQ 4.1.

SQL filter expressions are based on the JMS message selector syntax and support:

Comparison operators (=, <>, >, <, >=, <=)
Logical operators (AND, OR, NOT)
Arithmetic operators (+, -, *, /)
Special operators (BETWEEN, LIKE, IN, IS NULL)
Access to the properties and application-properties sections

Examples

Simple expression:

header.priority > 4

Complex expression:

order_type IN ('premium', 'express') AND
total_amount BETWEEN 100 AND 5000 AND
(customer_region LIKE 'EU-%' OR customer_region = 'US-CA') AND
properties.creation-time >= 1750772279000 AND
NOT cancelled

Like AMQP property filter expressions, SQL filter expressions can be
combined with Bloom filters. Combining both allows for highly customisable
expressions (SQL) and extremely fast evaluation (Bloom filter) if only a
subset of the chunks need to be read from disk.

Why?

Compared to AMQP property filter expressions, SQL filter expressions provide the following advantage:

High expressiveness and flexibility in defining the filter

Like for AMQP property filter expressions, the following advantages apply:

No false positives (as is the case for Bloom filters)
Multiple concurrent clients can attach to the same stream each consuming
only a specific subset of messages while preserving message order.
Low network overhead as only messages that match the filter are
transferred to the client
Likewise, lower resource usage (CPU and memory) on clients since they
don't need to deserialise messages that they are not interested in.
If the SQL expression is simple, even the broker will save resources
because it doesn't need to serialse and send messages that the client
isn't interested in.

How?

JMS Message Selector Syntax vs. AMQP Extension Spec

The AMQP Filter Expressions Version 1.0 extension Working Draft 09 defines SQL Filter Expressions in Section 6.
This spec differs from the JMS message selector spec. Neither is a subset of the other. We can choose to follow either.
However, I think it makes most sense to follow the JMS spec because:

The JMS spec is better defined
The JMS spec is far more widespread than the AMQP Working Draft spec. (A slight variation of the AMQP Working
Draft is used by Azure Service Bus: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-messaging-sql-filter)
The JMS spec is mostly simpler (partly because matching on only simple types)
This will allow for a single SQL parser in RabbitMQ for both AMQP clients consuming from a stream and possibly in future for JMS clients consuming from queues or topics.

AMQP extension spec vs JMS spec

AMQP

!= is synonym for <>

JMS

defines only <>

Conclusion

<> is sufficient

AMQP

Strings can be tested for “greater than”

“both operands are of type string or of type symbol (any combination is permitted) and

the lexicographical rank of the left operand is greater than the lexicographical rank of the right operand”

JMS

“String and Boolean comparison is restricted to = and <>.”

Conclusion

The JMS behaviour is sufficient.

AMQP

IN

set-expression can contain non-string literals

JMS:

set-expression can contain only string literals

Conclusion

The JMS behaviour is sufficient.

AMQP

EXISTS predicate to check for composite types

JMS

Only simple types

Conclusion

We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section.

AMQP:

Modulo operator %

Conclusion

JMS doesn't define the modulo operator. Let's start without it.

We can decide in future to add support since it can actually be useful, for example for two receivers who want to process every other message.

AMQP:

The ‘+’ operator can concatenate string and symbol values

Conclusion

Such string concatenation isn't defined in JMS. We don't need it.

AMQP:

Define NAN and INF

JMS:

“Approximate literals use the Java floating-point literal syntax.”

Examples include "7."

Conclusion

We can go with the JMS spec given that needs to be implemented anyway
for JMS support.
Scientific notations are supported in both the AMQP spec and JMS spec.

AMQP

String literals can be surrounded by single or double quotation marks

JMS

A string literal is enclosed in single quotes

Conclusion

Supporting single quotes is good enough.

AMQP

“A binary constant is a string of pairs of hexadecimal digits prefixed by ‘0x’ that are not enclosed in quotation marks”

Conclusion

JMS doesn't support binary constants. We can start without binary constants.

Matching against binary values are still supported if these binary values can be expressed as UTF-8 strings.

AMQP

Functions DATE, UTC, SUBSTRING, LOWER, UPPER, LEFT, RIGHT

Vendor specific functions

Conclusion

JMS doesn't define such functions. We can start without those functions.

AMQP

<array_element_reference>

‘.’<composite_type_reference>

to access map and array elements

Conclusion

Same as above:

We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section.

AMQP

allows for delimited identifiers

JMS

Java identifier part characters

Conclusion

We can go with the Java identifiers extending the allowed characters by
. and - to reference field names such as properties.group-id.

JMS:

BETWEEN operator

Conclusion

The BETWEEN operator isn't supported in the AMQP spec. Let's support it as convenience since it's already available in JMS.

Filter Name

The client provides a filter with name sql-filter instead of name
jms-selector to allow to differentiate between JMS clients and other
native AMQP 1.0 clients using SQL expressions. This way, we can also
optionally extend the SQL grammar in future.

Identifiers

JMS message selectors allow identifiers to contain some well known JMS headers that match to well known AMQP fields, for example:

jms_header_to_amqp_field_name(<<"JMSDeliveryMode">>) -> durable;
jms_header_to_amqp_field_name(<<"JMSPriority">>) -> priority;
jms_header_to_amqp_field_name(<<"JMSMessageID">>) -> message_id;
jms_header_to_amqp_field_name(<<"JMSTimestamp">>) -> creation_time;
jms_header_to_amqp_field_name(<<"JMSCorrelationID">>) -> correlation_id;
jms_header_to_amqp_field_name(<<"JMSType">>) -> subject;
%% amqp-bindmap-jms-v1.0-wd10 § 3.2.2 JMS-defined ’JMSX’ Properties
jms_header_to_amqp_field_name(<<"JMSXUserID">>) -> user_id;
jms_header_to_amqp_field_name(<<"JMSXGroupID">>) -> group_id;
jms_header_to_amqp_field_name(<<"JMSXGroupSeq">>) -> group_sequence;

This commit does a similar matching for header. and properties. prefixed identifiers to field names in the AMQP property section.
The only field that is supported to filter on in the AMQP header section is priority, that is identifier header.priority.

By default, as described in the AMQP extension spec, if an identifier is not prefixed, it refers to a key in the application-properties section.

Hence, all identifiers prefixed with header., and properties. have special meanings and MUST be avoided by applications unless they want to refer to those specific fields.

Azure Service Bus uses the sys. and user. prefixes for well known field names and arbitrary application-provided keys, respectively.

SQL lexer, parser and evaluator

This commit implements the SQL lexer and parser in files rabbit_jms_selector_lexer.xrl and
rabbit_jms_selector_parser.yrl, respectively.

Advantages:

Both the definitions in the lexer and the grammar in the parser are defined declaratively.
In total, the entire SQL syntax and grammar is defined in only 240 lines.
Therefore, lexer and parser are simple to maintain.

The idea of this commit is to use the same lexer and parser for native AMQP clients consumings
from streams (this commit) as for JMS clients (in the future).
All native AMQP client vs JMS client bits are then manipulated after
the Abstract Syntax Tree (AST) has been created by the parser.

For example, this commit transforms the AST specifically for native AMQP clients
by mapping properties. prefixed identifiers (field names) to atoms.
A JMS client's mapping from JMS prefixed headers can transform the AST
differently.

Likewise, this commit transforms the AST to compile a regex for complex LIKE
expressions when consuming from a stream while a future version
might not want to compile a regex when consuming from quorum queues.

Module rabbit_jms_ast provides such AST helper methods.

The lexer and parser are not performance critical as this work happens
upon receivers attaching to the stream.

The evaluator however is performance critical as message evaluation
happens on the hot path.

LIKE expressions

The evaluator has been optimised to only compile a regex when necessary.
If the LIKE expression-value contains no wildcard or only a single %
wildcard, Erlang pattern matching is used as it's more efficient.
Since _ can match any UTF-8 character, a regex will be compiled with
the [unicode] options.

Filter errors

Any errors upon a receiver attaching to a stream causes the filter to
not become active. RabbitMQ will log a warning describing the reason and
will omit the named filter in its attach reply frame. The client lib is
responsible for detaching the link as explained in the AMQP spec:

The receiving endpoint sets its desired filter, the sending endpoint sets the filter actually in place
(including any filters defaulted at the node). The receiving endpoint MUST check that the filter in
place meets its needs and take responsibility for detaching if it does not.

This applies to lexer and parser errors.

Errors during message evaluation will result in an unknown value.
Conditional operators on unknown are described in the JMS spec. If the
entire selector condition is unknown, the message does not match, and
will therefore not be delivered to the client.

Clients

Support for passing the SQL expression from app to broker is provided by
the Java client in rabbitmq/rabbitmq-amqp-java-client#216

## What? This commit allows AMQP 1.0 clients to define SQL-like filter expressions when consuming from streams, enabling server-side message filtering. RabbitMQ will only dispatch messages that match the provided filter expression, reducing network traffic and client-side processing overhead. SQL filter expressions are a more powerful alternative to the [AMQP Property Filter Expressions](https://www.rabbitmq.com/blog/2024/12/13/amqp-filter-expressions) introduced in RabbitMQ 4.1. SQL filter expressions are based on the [JMS message selector syntax](https://jakarta.ee/specifications/messaging/3.1/jakarta-messaging-spec-3.1#message-selector-syntax) and support: * Comparison operators (`=`, `<>`, `>`, `<`, `>=`, `<=`) * Logical operators (`AND`, `OR`, `NOT`) * Arithmetic operators (`+`, `-`, `*`, `/`) * Special operators (`BETWEEN`, `LIKE`, `IN`, `IS NULL`) * Access to the properties and application-properties sections **Examples** Simple expression: ```sql header.priority > 4 ``` Complex expression: ```sql order_type IN ('premium', 'express') AND total_amount BETWEEN 100 AND 5000 AND (customer_region LIKE 'EU-%' OR customer_region = 'US-CA') AND properties.creation-time >= 1750772279000 AND NOT cancelled ``` Like AMQP property filter expressions, SQL filter expressions can be combined with Bloom filters. Combining both allows for highly customisable expressions (SQL) and extremely fast evaluation (Bloom filter) if only a subset of the chunks need to be read from disk. ## Why? Compared to AMQP property filter expressions, SQL filter expressions provide the following advantage: * High expressiveness and flexibility in defining the filter Like for AMQP property filter expressions, the following advantages apply: * No false positives (as is the case for Bloom filters) * Multiple concurrent clients can attach to the same stream each consuming only a specific subset of messages while preserving message order. * Low network overhead as only messages that match the filter are transferred to the client * Likewise, lower resource usage (CPU and memory) on clients since they don't need to deserialise messages that they are not interested in. * If the SQL expression is simple, even the broker will save resources because it doesn't need to serialse and send messages that the client isn't interested in. ## How? ### JMS Message Selector Syntax vs. AMQP Extension Spec The AMQP Filter Expressions Version 1.0 extension Working Draft 09 defines SQL Filter Expressions in Section 6. This spec differs from the JMS message selector spec. Neither is a subset of the other. We can choose to follow either. However, I think it makes most sense to follow the JMS spec because: * The JMS spec is better defined * The JMS spec is far more widespread than the AMQP Working Draft spec. (A slight variation of the AMQP Working Draft is used by Azure Service Bus: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-messaging-sql-filter) * The JMS spec is mostly simpler (partly because matching on only simple types) * This will allow for a single SQL parser in RabbitMQ for both AMQP clients consuming from a stream and possibly in future for JMS clients consuming from queues or topics. <details> <summary>AMQP extension spec vs JMS spec</summary> AMQP != is synonym for <> JMS defines only <> Conclusion <> is sufficient AMQP Strings can be tested for “greater than” “both operands are of type string or of type symbol (any combination is permitted) and the lexicographical rank of the left operand is greater than the lexicographical rank of the right operand” JMS “String and Boolean comparison is restricted to = and <>.” Conclusion The JMS behaviour is sufficient. AMQP IN <set-expression> set-expression can contain non-string literals JMS: set-expression can contain only string literals Conclusion The JMS behaviour is sufficient. AMQP EXISTS predicate to check for composite types JMS Only simple types Conclusion We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section. AMQP: Modulo operator % Conclusion JMS doesn't define the modulo operator. Let's start without it. We can decide in future to add support since it can actually be useful, for example for two receivers who want to process every other message. AMQP: The ‘+’ operator can concatenate string and symbol values Conclusion Such string concatenation isn't defined in JMS. We don't need it. AMQP: Define NAN and INF JMS: “Approximate literals use the Java floating-point literal syntax.” Examples include "7." Conclusion We can go with the JMS spec given that needs to be implemented anyway for JMS support. Scientific notations are supported in both the AMQP spec and JMS spec. AMQP String literals can be surrounded by single or double quotation marks JMS A string literal is enclosed in single quotes Conclusion Supporting single quotes is good enough. AMQP “A binary constant is a string of pairs of hexadecimal digits prefixed by ‘0x’ that are not enclosed in quotation marks” Conclusion JMS doesn't support binary constants. We can start without binary constants. Matching against binary values are still supported if these binary values can be expressed as UTF-8 strings. AMQP Functions DATE, UTC, SUBSTRING, LOWER, UPPER, LEFT, RIGHT Vendor specific functions Conclusion JMS doesn't define such functions. We can start without those functions. AMQP <field><array_element_reference> <field>‘.’<composite_type_reference> to access map and array elements Conclusion Same as above: We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section. AMQP allows for delimited identifiers JMS Java identifier part characters Conclusion We can go with the Java identifiers extending the allowed characters by `.` and `-` to reference field names such as `properties.group-id`. JMS: BETWEEN operator Conclusion The BETWEEN operator isn't supported in the AMQP spec. Let's support it as convenience since it's already available in JMS. </details> ### Filter Name The client provides a filter with name `sql-filter` instead of name `jms-selector` to allow to differentiate between JMS clients and other native AMQP 1.0 clients using SQL expressions. This way, we can also optionally extend the SQL grammar in future. ### Identifiers JMS message selectors allow identifiers to contain some well known JMS headers that match to well known AMQP fields, for example: ```erl jms_header_to_amqp_field_name(<<"JMSDeliveryMode">>) -> durable; jms_header_to_amqp_field_name(<<"JMSPriority">>) -> priority; jms_header_to_amqp_field_name(<<"JMSMessageID">>) -> message_id; jms_header_to_amqp_field_name(<<"JMSTimestamp">>) -> creation_time; jms_header_to_amqp_field_name(<<"JMSCorrelationID">>) -> correlation_id; jms_header_to_amqp_field_name(<<"JMSType">>) -> subject; %% amqp-bindmap-jms-v1.0-wd10 § 3.2.2 JMS-defined ’JMSX’ Properties jms_header_to_amqp_field_name(<<"JMSXUserID">>) -> user_id; jms_header_to_amqp_field_name(<<"JMSXGroupID">>) -> group_id; jms_header_to_amqp_field_name(<<"JMSXGroupSeq">>) -> group_sequence; ``` This commit does a similar matching for `header.` and `properties.` prefixed identifiers to field names in the AMQP property section. The only field that is supported to filter on in the AMQP header section is `priority`, that is identifier `header.priority`. By default, as described in the AMQP extension spec, if an identifier is not prefixed, it refers to a key in the application-properties section. Hence, all identifiers prefixed with `header.`, and `properties.` have special meanings and MUST be avoided by applications unless they want to refer to those specific fields. Azure Service Bus uses the `sys.` and `user.` prefixes for well known field names and arbitrary application-provided keys, respectively. ### SQL lexer, parser and evaluator This commit implements the SQL lexer and parser in files rabbit_jms_selector_lexer.xrl and rabbit_jms_selector_parser.yrl, respectively. Advantages: * Both the definitions in the lexer and the grammar in the parser are defined **declaratively**. * In total, the entire SQL syntax and grammar is defined in only 240 lines. * Therefore, lexer and parser are simple to maintain. The idea of this commit is to use the same lexer and parser for native AMQP clients consumings from streams (this commit) as for JMS clients (in the future). All native AMQP client vs JMS client bits are then manipulated after the Abstract Syntax Tree (AST) has been created by the parser. For example, this commit transforms the AST specifically for native AMQP clients by mapping `properties.` prefixed identifiers (field names) to atoms. A JMS client's mapping from `JMS` prefixed headers can transform the AST differently. Likewise, this commit transforms the AST to compile a regex for complex LIKE expressions when consuming from a stream while a future version might not want to compile a regex when consuming from quorum queues. Module `rabbit_jms_ast` provides such AST helper methods. The lexer and parser are not performance critical as this work happens upon receivers attaching to the stream. The evaluator however is performance critical as message evaluation happens on the hot path. ### LIKE expressions The evaluator has been optimised to only compile a regex when necessary. If the LIKE expression-value contains no wildcard or only a single `%` wildcard, Erlang pattern matching is used as it's more efficient. Since `_` can match any UTF-8 character, a regex will be compiled with the `[unicode]` options. ### Filter errors Any errors upon a receiver attaching to a stream causes the filter to not become active. RabbitMQ will log a warning describing the reason and will omit the named filter in its attach reply frame. The client lib is responsible for detaching the link as explained in the AMQP spec: > The receiving endpoint sets its desired filter, the sending endpoint sets the filter actually in place (including any filters defaulted at the node). The receiving endpoint MUST check that the filter in place meets its needs and take responsibility for detaching if it does not. This applies to lexer and parser errors. Errors during message evaluation will result in an unknown value. Conditional operators on unknown are described in the JMS spec. If the entire selector condition is unknown, the message does not match, and will therefore not be delivered to the client. ## Clients Support for passing the SQL expression from app to broker is provided by the Java client in rabbitmq/rabbitmq-amqp-java-client#216

ansd added this to the 4.2.0 milestone Jun 23, 2025

mergify bot added the make label Jun 23, 2025

ansd force-pushed the sql-stream branch 4 times, most recently from 07a03f3 to 923f097 Compare June 24, 2025 13:46

ansd force-pushed the sql-stream branch from 923f097 to 8df3dad Compare June 24, 2025 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SQL Filter Expressions for Streams #14110

SQL Filter Expressions for Streams #14110

ansd commented Jun 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

SQL Filter Expressions for Streams #14110

Are you sure you want to change the base?

SQL Filter Expressions for Streams #14110

Conversation

ansd commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

How?

JMS Message Selector Syntax vs. AMQP Extension Spec

Filter Name

Identifiers

SQL lexer, parser and evaluator

LIKE expressions

Filter errors

Clients

Uh oh!

Uh oh!

ansd commented Jun 23, 2025 •

edited

Loading