Skip to content

feat(logs): add Rust FFI tokenizer bridge with FlatBuffers#49216

Draft
DDuongNguyen wants to merge 1 commit into04-10-feat_logs_add_stateful_encoding_module_plumbing_and_token_typesfrom
04-10-feat_logs_add_rust_ffi_tokenizer_bridge_with_flatbuffers
Draft

feat(logs): add Rust FFI tokenizer bridge with FlatBuffers#49216
DDuongNguyen wants to merge 1 commit into04-10-feat_logs_add_stateful_encoding_module_plumbing_and_token_typesfrom
04-10-feat_logs_add_rust_ffi_tokenizer_bridge_with_flatbuffers

Conversation

@DDuongNguyen
Copy link
Copy Markdown
Contributor

@DDuongNguyen DDuongNguyen commented Apr 10, 2026

What does this PR do?

Adds the Rust FFI tokenizer bridge that connects the Go agent to the patterns Rust library via cgo + FlatBuffers:

  • Rust tokenizer wrapper (pkg/logs/patterns/tokenizer/rust/): cgo bindings to libpatterns, FlatBuffers schema for zero-copy token serialization, TokenConversion to map Rust tokens → Go token.Token types
  • Vendor libraries: Pre-built libpatterns.{dylib,so} for darwin/linux × amd64/arm64, plus the C header
  • Build tasks (tasks/patterns.py): Invoke tasks for compiling the Rust library, running benchmarks, and managing vendor artifacts
  • Build tag integration (tasks/build_tags.py, tasks/agent.py): patterns build tag gating

This is PR 2/6 in a stack. Depends on PR 1 (#49215) for the token.Token types and Tokenizer interface this bridge implements.

Motivation

The Rust patterns library provides high-performance log tokenization (signature extraction, pattern detection). This bridge lets the Go agent call into it without re-implementing the tokenizer in Go, using FlatBuffers to minimize serialization overhead.

Describe how you validated your changes

  • Unit tests for token conversion and roundtrip tokenization (rust_*_test.go)
  • FlatBuffers schema compiles and generates correct Go accessors
  • tasks/patterns.py successfully builds and vendors the library on macOS arm64

How to Review this PR

  1. Start with tokenizer.go — the main RustTokenizer struct and its Tokenize() method
  2. Review token_conversion.go for the FlatBuffers → Go token mapping
  3. Skim flatbuffers/patterns_tokenizer.fbs for the serialization schema
  4. tasks/patterns.py for the build/vendor workflow
  5. The vendor/ directory contains pre-built binaries — verify the README for provenance

Additional Notes

The tokenizer_stub.go provides a no-op implementation when the patterns build tag is disabled, so this doesn't affect builds that don't opt in.

Copy link
Copy Markdown
Contributor Author

DDuongNguyen commented Apr 10, 2026

@DDuongNguyen DDuongNguyen force-pushed the 04-10-feat_logs_add_stateful_encoding_module_plumbing_and_token_types branch from bf7d7b9 to aa058a0 Compare April 14, 2026 17:49
@DDuongNguyen DDuongNguyen force-pushed the 04-10-feat_logs_add_rust_ffi_tokenizer_bridge_with_flatbuffers branch from ff26d66 to 9957229 Compare April 14, 2026 17:49
@DDuongNguyen
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9957229d8f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tasks/build_tags.py
"otlp",
"podman",
"python",
"rust_patterns",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude rust_patterns from default Windows agent tags

Adding rust_patterns to the default AGENT_TAGS enables the Rust tokenizer bridge on Windows builds, but this commit does not vendor a Windows libpatterns artifact (the new vendor/README.md marks windows_amd64 as "Not yet built") while tokenizer.go still links -lpatterns from vendor/windows_amd64. In practice, default Windows agent builds can fail at link time with cannot find -lpatterns unless developers manually prepare platform-specific artifacts; this tag should be gated off on Windows (or enabled only when a Windows binary is present).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

long review PR is complex, plan time to review it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant