Skip to content

Conversation

@alexcrocha
Copy link

@alexcrocha alexcrocha commented Aug 28, 2025

Description

This PR introduces the ruby-rbs crate, which provides safe Rust bindings for the RBS parser. It builds on top of the existing ruby-rbs-sys FFI bindings to offer an idiomatic Rust API.

What this PR does

The ruby-rbs crate includes a build script that automatically generates Rust struct definitions from config.yml:

  • Reads all 68 AST node definitions from the existing config
  • Generates proper Rust module hierarchy matching the RBS AST structure
  • Handles Rust naming conventions (snake_case modules, PascalCase structs)
  • Manages reserved keywords (Use → UseDirective, Self → SelfType)

Example generated code

pub mod ast {
    pub struct Annotation {}

    pub mod declarations {
        pub struct Class {}
        
        pub mod module {
          pub struct SelfType {}
        }
    }
}

What's next

This is step 2 of 4:

  1. ✅ ruby-rbs-sys crate with FFI bindings (completed - base PR)
  2. ✅ Generate empty struct definitions (this PR)
  3. ⏳ Create the Location struct and add location() method to all nodes
  4. ⏳ Implement node-specific fields from config.yml

Each step will build incrementally on the previous one, keeping the PRs small and reviewable.

Base

This builds on top of ruby-rbs-sys (PR) which provides the unsafe FFI bindings to the C library.

Copy link
Author

alexcrocha commented Oct 16, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@alexcrocha alexcrocha changed the title [Draft] Add ruby-rbs Rust crate Add ruby-rbs Rust crate Oct 16, 2025
@alexcrocha alexcrocha marked this pull request as ready for review October 16, 2025 11:28
@alexcrocha alexcrocha requested a review from vinistock October 16, 2025 11:29
Copy link
Member

@vinistock vinistock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring most of the non boiler plate code in build.rs since we greatly simplified it in the next PR in the stack

@alexcrocha alexcrocha mentioned this pull request Oct 22, 2025
This was referenced Nov 27, 2025
alexcrocha and others added 29 commits January 14, 2026 11:52
Since `bool` is a primitive type with direct FFI mapping between C and
Rust, we don't need a wrapper struct like we do for complex types
(`rbs_string_t`, etc.).
Symbol fields in RBS AST nodes store their values as constant IDs that
need to be resolved through the parser's constant pool. This safe
Rust wrapper (`RBSSymbol`) maintains a reference to the parser and
provides access to the symbol's name bytes, similar to how `RBSString`
handles string types.

The build script now generates accessors for `rbs_ast_symbol` fields
that properly pass both the symbol pointer and parser reference to
enable constant pool lookups.
Refactor node structs to use pointer-based access and add NodeList iterator

Changes node generation from storing individual fields to holding a single
pointer to the C struct. This avoids duplicating data in Rust structs and
matches the pattern used in Prism's bindings. We just maintain a thin
wrapper around the C pointer and dereference it in accessor methods.

Adds NodeList/NodeListIter to enable idiomatic Rust iteration over RBS's
linked list structures, and implements Node::new() factory method that
type-checks the C node pointer and constructs the appropriate Rust variant
with proper pointer casting.

Also adds convert_name() helper to generate C identifiers from RBS node
names (snake_case_t for types, UPPER_CASE for enum constants).
Many AST nodes in `config.yml` have location fields (`rbs_location`,
`rbs_location_list`). This change adds the necessary wrapper structs
(`RBSLocation`, `RBSLocationList`) and updates `build.rs` to generate
accessors for these fields.

The `RBSLocation` wrapper includes a reference to the parser to support
future functionality like source extraction.
Enable nested AST traversal by exposing rbs_node and rbs_node_list fields

Nested structure traversal (e.g., class members, constant types) depends on access to rbs_node and rbs_node_list fields. Making these fields accessible aligns the Rust bindings with the C API. Fields named "type" are accessible via type_ to avoid a Rust keyword collision
Adds `test_parse_integer()` which parses an integer literal type alias
and traverses the AST (`TypeAlias` -> `LiteralType` -> `Integer`) using
pattern matching to verify node types and extract values.

This validates that the generated node wrappers enable AST traversal in
pure Rust with proper type safety.

Also adds `Debug` derives and refactors memory management by returning
`SignatureNode` instead of raw pointer, with `Drop` impl to free parser.
Refactor the previous implementation of `Symbol`/`Keyword` handling to
treat them as first-class nodes in the build configuration.

`Keyword` and `Symbol` represent identifiers (interned strings), not
traditional AST nodes. However, the C parser defines them in
`rbs_node_type` (as `RBS_KEYWORD` and `RBS_AST_SYMBOL`) and treats them
as nodes (`rbs_node_t*`) in many contexts (lists, hashes).

Instead of manually defining `RBSSymbol`/`RBSKeyword` structs, we now
inject them into the `config.yml` node list in `build.rs`. This allows
them to be generated as `SymbolNode`/`KeywordNode` variants in the
`Node` enum, enabling polymorphic handling (in Node lists and Hashes)
Add support for RBS hashes (`rbs_hash_t`), which are used in Record
types and Function keyword arguments
Enable walking the AST by generating a `Visit` trait with per-node
visitor methods. It uses double dispatch to route each node type to its
corresponding visitor method. This avoids consumers needing to manually
match on Node variants and allows overriding specific visits while
inheriting default behaviour for others.
Some C struct pointer fields can be NULL (super_class when no parent
class, comment when no doc comment). This metadata allows our Rust
codegen to generate Option<T> return types for these accessors instead
of unconditionally wrapping potentially NULL pointers.
Read `optional: true` annotations from `config.yml` and generate
`Option<T>` return types with null checks, so we don't crash at runtime.

The extracted helper function centralizes the accessor generation logic for
pointer-based field types.
The Visit trait added in #69 provided the scaffolding for AST traversal,
but the visitor functions were empty stubs that didn't recurse into
children nodes. Without this, the visitor pattern is incomplete as we'd
have to manually write traversal logic every time we want to walk the
tree.

This commit adds the generation of visitor functions for child node
traversal. We handle four field types:
- `rbs_node`: single child node
- `rbs_node_list`: list of child nodes
- `rbs_hash`: key-value pairs of nodes
- Wrapper types (`rbs_type_name`, `rbs_namespace`, etc): each with its
own visitor method

Each case handles optional fields to safely skip NULL pointers
Each node already has location data in its C struct, but it wasn't
exposed through the Rust API. This adds a generated `location()` method
to every node type, making it easy to get source ranges for any part of
the AST.

Also removing `parser` from location structs as it is not needed.
Addressing some linting warnings
Adds `location()` accessor to the `Node` enum, delegating to
each variant's `location()` method.

A previous commit added `location()` to individual node types 
but missed the enum itself. This allows getting the location of the 
entire node definition when working with the `Node` enum directly.
Reorder lib.rs structs alphabetically

Improve bindings code formatting
Adds lifetimes to make borrowing relationships clearer so the
Rust compiler can validate and enforce them.
Replaced `*mut T` with `NonNull<T>` for the parser pointer to make the
‘never null’ assumption explicit.

`NonNull<T>` represents a non-null raw pointer (a wrapper around `*mut
T`) that guarantees the pointer is never null.
Some nodes don't use their parser field, but conditionally omitting it
adds significant complexity. Keep parser on all nodes and suppress the
warning on the parser field.
TypeApplicationAnnotation, InstanceVariableAnnotation,
ClassAliasAnnotation, and ModuleAliasAnnotation also need rust_name
fields for rust binding code generation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants