Skip to content

SQL Filter Expressions for Streams #14110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

SQL Filter Expressions for Streams #14110

wants to merge 1 commit into from

Conversation

ansd
Copy link
Member

@ansd ansd commented Jun 23, 2025

What?

This commit allows AMQP 1.0 clients to define SQL-like filter expressions when consuming from streams, enabling server-side message filtering.
RabbitMQ will only dispatch messages that match the provided filter expression, reducing network traffic and client-side processing overhead.
SQL filter expressions are a more powerful alternative to the AMQP Property Filter Expressions introduced in RabbitMQ 4.1.

SQL filter expressions are based on the JMS message selector syntax and support:

  • Comparison operators (=, <>, >, <, >=, <=)
  • Logical operators (AND, OR, NOT)
  • Arithmetic operators (+, -, *, /)
  • Special operators (BETWEEN, LIKE, IN, IS NULL)
  • Access to the properties and application-properties sections

Examples

Simple expression:

header.priority > 4

Complex expression:

order_type IN ('premium', 'express') AND
total_amount BETWEEN 100 AND 5000 AND
(customer_region LIKE 'EU-%' OR customer_region = 'US-CA') AND
properties.creation-time >= 1750772279000 AND
NOT cancelled

Like AMQP property filter expressions, SQL filter expressions can be
combined with Bloom filters. Combining both allows for highly customisable
expressions (SQL) and extremely fast evaluation (Bloom filter) if only a
subset of the chunks need to be read from disk.

Why?

Compared to AMQP property filter expressions, SQL filter expressions provide the following advantage:

  • High expressiveness and flexibility in defining the filter

Like for AMQP property filter expressions, the following advantages apply:

  • No false positives (as is the case for Bloom filters)
  • Multiple concurrent clients can attach to the same stream each consuming
    only a specific subset of messages while preserving message order.
  • Low network overhead as only messages that match the filter are
    transferred to the client
  • Likewise, lower resource usage (CPU and memory) on clients since they
    don't need to deserialise messages that they are not interested in.
  • If the SQL expression is simple, even the broker will save resources
    because it doesn't need to serialse and send messages that the client
    isn't interested in.

How?

JMS Message Selector Syntax vs. AMQP Extension Spec

The AMQP Filter Expressions Version 1.0 extension Working Draft 09 defines SQL Filter Expressions in Section 6.
This spec differs from the JMS message selector spec. Neither is a subset of the other. We can choose to follow either.
However, I think it makes most sense to follow the JMS spec because:

  • The JMS spec is better defined
  • The JMS spec is far more widespread than the AMQP Working Draft spec. (A slight variation of the AMQP Working
    Draft is used by Azure Service Bus: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-messaging-sql-filter)
  • The JMS spec is mostly simpler (partly because matching on only simple types)
  • This will allow for a single SQL parser in RabbitMQ for both AMQP clients consuming from a stream and possibly in future for JMS clients consuming from queues or topics.
AMQP extension spec vs JMS spec

AMQP

!= is synonym for <>

JMS

defines only <>

Conclusion

<> is sufficient

AMQP

Strings can be tested for “greater than”

“both operands are of type string or of type symbol (any combination is permitted) and

the lexicographical rank of the left operand is greater than the lexicographical rank of the right operand”

JMS

“String and Boolean comparison is restricted to = and <>.”

Conclusion

The JMS behaviour is sufficient.

AMQP

IN

set-expression can contain non-string literals

JMS:

set-expression can contain only string literals

Conclusion

The JMS behaviour is sufficient.

AMQP

EXISTS predicate to check for composite types

JMS

Only simple types

Conclusion

We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section.

AMQP:

Modulo operator %

Conclusion

JMS doesn't define the modulo operator. Let's start without it.

We can decide in future to add support since it can actually be useful, for example for two receivers who want to process every other message.

AMQP:

The ‘+’ operator can concatenate string and symbol values

Conclusion

Such string concatenation isn't defined in JMS. We don't need it.

AMQP:

Define NAN and INF

JMS:

“Approximate literals use the Java floating-point literal syntax.”

Examples include "7."

Conclusion

We can go with the JMS spec given that needs to be implemented anyway
for JMS support.
Scientific notations are supported in both the AMQP spec and JMS spec.

AMQP

String literals can be surrounded by single or double quotation marks

JMS

A string literal is enclosed in single quotes

Conclusion

Supporting single quotes is good enough.

AMQP

“A binary constant is a string of pairs of hexadecimal digits prefixed by ‘0x’ that are not enclosed in quotation marks”

Conclusion

JMS doesn't support binary constants. We can start without binary constants.

Matching against binary values are still supported if these binary values can be expressed as UTF-8 strings.

AMQP

Functions DATE, UTC, SUBSTRING, LOWER, UPPER, LEFT, RIGHT

Vendor specific functions

Conclusion

JMS doesn't define such functions. We can start without those functions.

AMQP

<array_element_reference>

‘.’<composite_type_reference>

to access map and array elements

Conclusion

Same as above:

We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section.

AMQP

allows for delimited identifiers

JMS

Java identifier part characters

Conclusion

We can go with the Java identifiers extending the allowed characters by
. and - to reference field names such as properties.group-id.

JMS:

BETWEEN operator

Conclusion

The BETWEEN operator isn't supported in the AMQP spec. Let's support it as convenience since it's already available in JMS.

Filter Name

The client provides a filter with name sql-filter instead of name
jms-selector to allow to differentiate between JMS clients and other
native AMQP 1.0 clients using SQL expressions. This way, we can also
optionally extend the SQL grammar in future.

Identifiers

JMS message selectors allow identifiers to contain some well known JMS headers that match to well known AMQP fields, for example:

jms_header_to_amqp_field_name(<<"JMSDeliveryMode">>) -> durable;
jms_header_to_amqp_field_name(<<"JMSPriority">>) -> priority;
jms_header_to_amqp_field_name(<<"JMSMessageID">>) -> message_id;
jms_header_to_amqp_field_name(<<"JMSTimestamp">>) -> creation_time;
jms_header_to_amqp_field_name(<<"JMSCorrelationID">>) -> correlation_id;
jms_header_to_amqp_field_name(<<"JMSType">>) -> subject;
%% amqp-bindmap-jms-v1.0-wd10 § 3.2.2 JMS-defined ’JMSX’ Properties
jms_header_to_amqp_field_name(<<"JMSXUserID">>) -> user_id;
jms_header_to_amqp_field_name(<<"JMSXGroupID">>) -> group_id;
jms_header_to_amqp_field_name(<<"JMSXGroupSeq">>) -> group_sequence;

This commit does a similar matching for header. and properties. prefixed identifiers to field names in the AMQP property section.
The only field that is supported to filter on in the AMQP header section is priority, that is identifier header.priority.

By default, as described in the AMQP extension spec, if an identifier is not prefixed, it refers to a key in the application-properties section.

Hence, all identifiers prefixed with header., and properties. have special meanings and MUST be avoided by applications unless they want to refer to those specific fields.

Azure Service Bus uses the sys. and user. prefixes for well known field names and arbitrary application-provided keys, respectively.

SQL lexer, parser and evaluator

This commit implements the SQL lexer and parser in files rabbit_jms_selector_lexer.xrl and
rabbit_jms_selector_parser.yrl, respectively.

Advantages:

  • Both the definitions in the lexer and the grammar in the parser are defined declaratively.
  • In total, the entire SQL syntax and grammar is defined in only 240 lines.
  • Therefore, lexer and parser are simple to maintain.

The idea of this commit is to use the same lexer and parser for native AMQP clients consumings
from streams (this commit) as for JMS clients (in the future).
All native AMQP client vs JMS client bits are then manipulated after
the Abstract Syntax Tree (AST) has been created by the parser.

For example, this commit transforms the AST specifically for native AMQP clients
by mapping properties. prefixed identifiers (field names) to atoms.
A JMS client's mapping from JMS prefixed headers can transform the AST
differently.

Likewise, this commit transforms the AST to compile a regex for complex LIKE
expressions when consuming from a stream while a future version
might not want to compile a regex when consuming from quorum queues.

Module rabbit_jms_ast provides such AST helper methods.

The lexer and parser are not performance critical as this work happens
upon receivers attaching to the stream.

The evaluator however is performance critical as message evaluation
happens on the hot path.

LIKE expressions

The evaluator has been optimised to only compile a regex when necessary.
If the LIKE expression-value contains no wildcard or only a single %
wildcard, Erlang pattern matching is used as it's more efficient.
Since _ can match any UTF-8 character, a regex will be compiled with
the [unicode] options.

Filter errors

Any errors upon a receiver attaching to a stream causes the filter to
not become active. RabbitMQ will log a warning describing the reason and
will omit the named filter in its attach reply frame. The client lib is
responsible for detaching the link as explained in the AMQP spec:

The receiving endpoint sets its desired filter, the sending endpoint sets the filter actually in place
(including any filters defaulted at the node). The receiving endpoint MUST check that the filter in
place meets its needs and take responsibility for detaching if it does not.

This applies to lexer and parser errors.

Errors during message evaluation will result in an unknown value.
Conditional operators on unknown are described in the JMS spec. If the
entire selector condition is unknown, the message does not match, and
will therefore not be delivered to the client.

Clients

Support for passing the SQL expression from app to broker is provided by
the Java client in rabbitmq/rabbitmq-amqp-java-client#216

@ansd ansd added this to the 4.2.0 milestone Jun 23, 2025
@mergify mergify bot added the make label Jun 23, 2025
@ansd ansd force-pushed the sql-stream branch 4 times, most recently from 07a03f3 to 923f097 Compare June 24, 2025 13:46
 ## What?

This commit allows AMQP 1.0 clients to define SQL-like filter expressions when consuming from streams, enabling server-side message filtering.
RabbitMQ will only dispatch messages that match the provided filter expression, reducing network traffic and client-side processing overhead.
SQL filter expressions are a more powerful alternative to the [AMQP Property Filter Expressions](https://www.rabbitmq.com/blog/2024/12/13/amqp-filter-expressions) introduced in RabbitMQ 4.1.

SQL filter expressions are based on the [JMS message selector syntax](https://jakarta.ee/specifications/messaging/3.1/jakarta-messaging-spec-3.1#message-selector-syntax) and support:
* Comparison operators (`=`, `<>`, `>`, `<`, `>=`, `<=`)
* Logical operators (`AND`, `OR`, `NOT`)
* Arithmetic operators (`+`, `-`, `*`, `/`)
* Special operators (`BETWEEN`, `LIKE`, `IN`, `IS NULL`)
* Access to the properties and application-properties sections

**Examples**

Simple expression:

```sql
header.priority > 4
```

Complex expression:

```sql
order_type IN ('premium', 'express') AND
total_amount BETWEEN 100 AND 5000 AND
(customer_region LIKE 'EU-%' OR customer_region = 'US-CA') AND
properties.creation-time >= 1750772279000 AND
NOT cancelled
```

Like AMQP property filter expressions, SQL filter expressions can be
combined with Bloom filters. Combining both allows for highly customisable
expressions (SQL) and extremely fast evaluation (Bloom filter) if only a
subset of the chunks need to be read from disk.

 ## Why?

Compared to AMQP property filter expressions, SQL filter expressions provide the following advantage:
* High expressiveness and flexibility in defining the filter

Like for AMQP property filter expressions, the following advantages apply:
* No false positives (as is the case for Bloom filters)
* Multiple concurrent clients can attach to the same stream each consuming
  only a specific subset of messages while preserving message order.
* Low network overhead as only messages that match the filter are
  transferred to the client
* Likewise, lower resource usage (CPU and memory) on clients since they
  don't need to deserialise messages that they are not interested in.
* If the SQL expression is simple, even the broker will save resources
  because it doesn't need to serialse and send messages that the client
  isn't interested in.

 ## How?

 ### JMS Message Selector Syntax vs. AMQP Extension Spec

The AMQP Filter Expressions Version 1.0 extension Working Draft 09 defines SQL Filter Expressions in Section 6.
This spec differs from the JMS message selector spec. Neither is a subset of the other. We can choose to follow either.
However, I think it makes most sense to follow the JMS spec because:
* The JMS spec is better defined
* The JMS spec is far more widespread than the AMQP Working Draft spec. (A slight variation of the AMQP Working
  Draft is used by Azure Service Bus: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-messaging-sql-filter)
* The JMS spec is mostly simpler (partly because matching on only simple types)
* This will allow for a single SQL parser in RabbitMQ for both AMQP clients consuming from a stream and possibly in future for JMS clients consuming from queues or topics.

<details>
<summary>AMQP extension spec vs JMS spec</summary>

AMQP

!= is synonym for <>

JMS

defines only <>

Conclusion

<> is sufficient

AMQP

Strings can be tested for “greater than”

“both operands are of type string or of type symbol (any combination is permitted) and

the lexicographical rank of the left operand is greater than the lexicographical rank of the right operand”

JMS

“String and Boolean comparison is restricted to = and <>.”

Conclusion

The JMS behaviour is sufficient.

AMQP

IN <set-expression>

set-expression can contain non-string literals

JMS:

set-expression can contain only string literals

Conclusion

The JMS behaviour is sufficient.

AMQP

EXISTS predicate to check for composite types

JMS

Only simple types

Conclusion

We want to match only for simple types, i.e. allowing matching only against values in the application-properties, properties sections and priority field of the header section.

AMQP:

Modulo operator %

Conclusion

JMS doesn't define the modulo operator. Let's start without it.

We can decide in future to add support since it can actually be useful, for example for two receivers who want to process every other message.

AMQP:

The ‘+’ operator can concatenate string and symbol values

Conclusion

Such string concatenation isn't defined in JMS. We don't need it.

AMQP:

Define NAN and INF

JMS:

“Approximate literals use the Java floating-point literal syntax.”

Examples include "7."

Conclusion

We can go with the JMS spec given that needs to be implemented anyway
for JMS support.
Scientific notations are supported in both the AMQP spec and JMS spec.

AMQP

String literals can be surrounded by single or double quotation marks

JMS

A string literal is enclosed in single quotes

Conclusion

Supporting single quotes is good enough.

AMQP

“A binary constant is a string of pairs of hexadecimal digits prefixed by ‘0x’ that are not enclosed in quotation marks”

Conclusion

JMS doesn't support binary constants. We can start without binary constants.

Matching against binary values are still supported if these binary values can be expressed as UTF-8 strings.

AMQP

Functions DATE, UTC, SUBSTRING, LOWER, UPPER, LEFT, RIGHT

Vendor specific functions

Conclusion

JMS doesn't define such functions. We can start without those functions.

AMQP

<field><array_element_reference>

<field>‘.’<composite_type_reference>

to access map and array elements

Conclusion

Same as above:

We want to match only for simple types, i.e.  allowing matching only against values in the application-properties, properties sections and priority field of the header section.

AMQP

allows for delimited identifiers

JMS

Java identifier part characters

Conclusion

We can go with the Java identifiers extending the allowed characters by
`.` and `-` to reference field names such as `properties.group-id`.

JMS:

BETWEEN operator

Conclusion

The BETWEEN operator isn't supported in the AMQP spec. Let's support it as convenience since it's already available in JMS.

</details>

  ### Filter Name

The client provides a filter with name `sql-filter` instead of name
`jms-selector` to allow to differentiate between JMS clients and other
native AMQP 1.0 clients using SQL expressions. This way, we can also
optionally extend the SQL grammar in future.

  ### Identifiers

JMS message selectors allow identifiers to contain some well known JMS headers that match to well known AMQP fields, for example:
 ```erl
jms_header_to_amqp_field_name(<<"JMSDeliveryMode">>) -> durable;
jms_header_to_amqp_field_name(<<"JMSPriority">>) -> priority;
jms_header_to_amqp_field_name(<<"JMSMessageID">>) -> message_id;
jms_header_to_amqp_field_name(<<"JMSTimestamp">>) -> creation_time;
jms_header_to_amqp_field_name(<<"JMSCorrelationID">>) -> correlation_id;
jms_header_to_amqp_field_name(<<"JMSType">>) -> subject;
%% amqp-bindmap-jms-v1.0-wd10 § 3.2.2 JMS-defined ’JMSX’ Properties
jms_header_to_amqp_field_name(<<"JMSXUserID">>) -> user_id;
jms_header_to_amqp_field_name(<<"JMSXGroupID">>) -> group_id;
jms_header_to_amqp_field_name(<<"JMSXGroupSeq">>) -> group_sequence;
```

This commit does a similar matching for `header.` and `properties.` prefixed identifiers to field names in the AMQP property section.
The only field that is supported to filter on in the AMQP header section is `priority`, that is identifier `header.priority`.

By default, as described in the AMQP extension spec, if an identifier is not prefixed, it refers to a key in the application-properties section.

Hence, all identifiers prefixed with `header.`, and `properties.` have special meanings and MUST be avoided by applications unless they want to refer to those specific fields.

Azure Service Bus uses the `sys.` and `user.` prefixes for well known field names and arbitrary application-provided keys, respectively.

  ### SQL lexer, parser and evaluator

This commit implements the SQL lexer and parser in files rabbit_jms_selector_lexer.xrl and
rabbit_jms_selector_parser.yrl, respectively.

Advantages:
* Both the definitions in the lexer and the grammar in the parser are defined **declaratively**.
* In total, the entire SQL syntax and grammar is defined in only 240 lines.
* Therefore, lexer and parser are simple to maintain.

The idea of this commit is to use the same lexer and parser for native AMQP clients consumings
from streams (this commit) as for JMS clients (in the future).
All native AMQP client vs JMS client bits are then manipulated after
the Abstract Syntax Tree (AST) has been created by the parser.

For example, this commit transforms the AST specifically for native AMQP clients
by mapping `properties.` prefixed identifiers (field names) to atoms.
A JMS client's mapping from `JMS` prefixed headers can transform the AST
differently.

Likewise, this commit transforms the AST to compile a regex for complex LIKE
expressions when consuming from a stream while a future version
might not want to compile a regex when consuming from quorum queues.

Module `rabbit_jms_ast` provides such AST helper methods.

The lexer and parser are not performance critical as this work happens
upon receivers attaching to the stream.

The evaluator however is performance critical as message evaluation
happens on the hot path.

 ### LIKE expressions

The evaluator has been optimised to only compile a regex when necessary.
If the LIKE expression-value contains no wildcard or only a single `%`
wildcard, Erlang pattern matching is used as it's more efficient.
Since `_` can match any UTF-8 character, a regex will be compiled with
the `[unicode]` options.

  ### Filter errors

Any errors upon a receiver attaching to a stream causes the filter to
not become active. RabbitMQ will log a warning describing the reason and
will omit the named filter in its attach reply frame. The client lib is
responsible for detaching the link as explained in the AMQP spec:
> The receiving endpoint sets its desired filter, the sending endpoint sets the filter actually in place
(including any filters defaulted at the node). The receiving endpoint MUST check that the filter in
place meets its needs and take responsibility for detaching if it does not.

This applies to lexer and parser errors.

Errors during message evaluation will result in an unknown value.
Conditional operators on unknown are described in the JMS spec. If the
entire selector condition is unknown, the message does not match, and
will therefore not be delivered to the client.

 ## Clients

Support for passing the SQL expression from app to broker is provided by
the Java client in rabbitmq/rabbitmq-amqp-java-client#216
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant