Experiment: Port MySQL-on-SQLite to LALR(1) parser#432
Draft
JanJakes wants to merge 9 commits into
Draft
Conversation
Contributor
🤖 Lexer benchmarkChanges to lexer-related files were detected and triggered a benchmark:
Note: Hosted runners are noisy, and absolute numbers vary. Treat the results with caution and verify them locally. To reproduce locally: |
6ede829 to
bee436b
Compare
bee436b to
076e3db
Compare
076e3db to
40a90cf
Compare
40a90cf to
e75997a
Compare
e75997a to
982dd0f
Compare
982dd0f to
7730323
Compare
7730323 to
9504ce3
Compare
9504ce3 to
d5ddec4
Compare
d5ddec4 to
b59b50b
Compare
b59b50b to
9aa6c4e
Compare
The parse table is data, not a class, so the Composer autoloader does not cover it. Expose its path as a class constant, so consumers can load it without depending on the package file layout: new WP_MySQL_Parser( require WP_MySQL_Parser::PARSE_TABLE_PATH ).
A reduction with no children carries no information: empty optionals (opt_*) and Bison's mid-rule action rules ($@n) only add noise to the tree. Produce no node for them, so consumers see an optional clause only when it is present.
Left-recursive grammar list rules nest through their own rule name
("list: list ',' item | item"). The new accessor collects child nodes
of the whole nested chain in source order, as if the list were flat,
which is how AST consumers want to iterate list items.
With the ANSI_QUOTES SQL mode, MySQL treats double-quoted text as a quoted identifier instead of a string literal. Emit an identifier token for it, so identifier positions accept double-quoted names.
Replace the hand-written recursive parser with the table-driven LALR(1) parser generated from MySQL's official grammar, consumed as a Composer dependency: - Require wordpress/mysql-parser, resolved from the monorepo sibling package via a Composer path repository, and load it through the Composer autoloader in the driver loader. - Drop the old parser machinery (WP_Parser, WP_Parser_Grammar, the lexer, the parse tree classes, and mysql-grammar.php), all provided by the parser package now, and the native parser fork, which is bound to the old grammar contract. - Parse multi-statement input by splitting the token stream on top-level ';' separators, as the grammar parses a single statement (this is how MySQL clients split multi-statement input). - Re-key the statement dispatch to the sql_yacc.yy rule names and map keyword token constants to the grammar keyword table. The translation layer still needs to be ported to the new AST shapes.
Re-key the SQL-to-SQLite translation from the old hand-written grammar to the sql_yacc.yy rule names and tree shapes: - Rewrite the translate() special cases and per-statement handlers (SELECT, INSERT/REPLACE, UPDATE, DELETE, DDL, SHOW, SET, USE, transactions and locking, administration statements). - Iterate grammar lists with the flattened child node accessor, as lists are left-recursive in the new grammar. - Walk JOINs recursively when building the table reference map, as joins nest through the left operand in the new grammar. - Retry parsing with the ANSI_QUOTES SQL mode when a query fails to parse. MySQL rejects double-quoted identifiers without ANSI_QUOTES, but WordPress relies on them (dbDelta can produce double-quoted index names) and the previous parser accepted them.
Re-key CREATE TABLE, ALTER TABLE, and index statement analysis to the sql_yacc.yy rule names and tree shapes. The recorded information schema rows are unchanged: a battery of DDL statements covering all supported data types, constraints, indexes, and table options produces the exact same rows as the previous parser and builder. Multi-column ADD COLUMN (a INT, b INT) is now recorded correctly; the previous builder crashed on it.
The lexer, parser, token data, and parse tree classes are tested in the wordpress/mysql-parser package now: - Remove the lexer and parser test suites from the driver package (the corpus data stays here; the parser package corpus test reads it from the sibling package and skips when it is not available). - Move the parse tree node tests to the parser package and cover the new flattened child node accessor. - Remove the native parser extension tests and tools, which are bound to the old grammar contract. - Update the AST dump and benchmark tools to the new parser API.
The SQLite driver now loads the MySQL parser as a Composer dependency, and the native parser extension bound to the old grammar is gone: - Install the driver Composer dependencies in the WordPress test setup and mount the package vendor directory and the parser package into the WordPress containers. - Bundle the driver's production Composer dependencies into the plugin zip, resolving the path-repository symlink into a real copy of the parser package. - Run the driver test workflow against changes to the parser package and drop the native parser extension jobs and setup scripts. - Install the driver Composer dependencies in the lexer benchmark workflow.
9aa6c4e to
00e9a3a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #429. This experiment ports the MySQL-on-SQLite driver end-to-end from the hand-written recursive parser to the LALR(1) parser, consumed as a proper Composer dependency. All driver tests pass: 543 tests / 7,172 assertions in
mysql-on-sqlite, and 41 tests / 1,051 assertions inmysql-parser, including the 69,577-query MySQL server corpus pin. Net diff: +1,611 / −7,180 lines.Composer-based package reuse
The
wordpress/mysql-parserpackage gets a classmap Composer autoloader (the WordPress-style file naming rules out PSR-4) and exposesWP_MySQL_Parser::PARSE_TABLE_PATH, since the generated parse table is data the autoloader cannot cover. Thewordpress/mysql-on-sqlitepackage requires it through a Composer path repository pointing at the monorepo sibling, so development has a single source of truth (a vendor symlink) and nothing is duplicated: the driver's old parser machinery — grammar, lexer, parse tree classes, and the native Rust parser fork bound to the old grammar contract — is removed entirely.Parser changes the port surfaced
opt_*) and Bison's mid-rule action rules ($@N) carry no information, so they no longer appear in the tree, and consumers see an optional clause only when it is present.WP_Parser_Node::get_flattened_child_nodes()iterates left-recursive grammar lists (list: list ',' item) as if they were flat.Driver port
The statement dispatch, the query translation layer, and the information schema builder are re-keyed to the official
sql_yacc.yyrule names and tree shapes. Multi-statement input is split on top-level;separators, as the grammar parses a single statement (this is how MySQL clients split multi-statement input); the oldcreate_parser()/next_query()API is replaced byparse_mysql_query().The information schema builder was verified byte-exact against the old parser and builder over a DDL battery covering all supported data types, constraints, indexes, and table options. Multi-column
ADD COLUMN (a INT, b INT)is now recorded correctly; the previous builder crashed on it.Deployment and CI
The WordPress Docker environments install the driver's Composer dependencies and mount the package vendor directory and the parser package into the containers. The plugin zip build bundles the driver's production dependencies, resolving the path-repository symlink into a real, pruned copy of the parser package. The driver test workflow now also triggers on parser package changes, and the native parser extension jobs and scripts are removed (
packages/php-ext-wp-mysql-parseris orphaned by this branch).Testing