Skip to content

Allow to cancel hs_scan*() #139

@rschu1ze

Description

@rschu1ze

We (ClickHouse) recently encountered some patterns which are extremely expensive to evaluate with vector/hyperscan, for example bounded repeats "x{n,m}" (these are also documented as being expensive). As a mitigation, we now check patterns on a best-effort basis and reject them when they will likely be expensive.

A better solution would be to either

  • add a new method to vector/hyperscan that predicts runtime costs ("fast"/"slow" will be sufficient), or
  • (the preferred alternative) allow canceling the scan. Functions hs_scan_*() (*) are provided callbacks which can stop the scan but they are only called when a match is found. Ideally, a second callback can be provided which is called regularly (every N "steps" - whatever that means in the context of vectorscan). I know that vectorscan attempts to stay API-compatible with hyperscan, so these callbacks could be added as new parameters with default value.

EDIT: Just noticed that pattern compilation, i.e. hs_compile_multi(), becomes slow (not: the scan). A callback for canceling hs_compile_*() would be great.

(*) ClickHouse actually only uses block mode, not streaming or vector modes.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions