-
Notifications
You must be signed in to change notification settings - Fork 0
19927: fix: change token consumption to pick to test on EOF in parser #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
If we use `next_token(...)` then in the case of multi-statement query it consumes a semicolon token which leads to the parse error on the correct queries like: ```sql CREATE EXTERNAL TABLE t1 ...; CREATE EXTERNAL TABLE t2 ...; ```
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the SQL parser's token handling mechanism to correctly identify the end of statements. Previously, the parser would consume the next token when checking for an end-of-file or semicolon, which could lead to incorrect parsing of subsequent statements. By switching to peeking at the next token instead of consuming it, the parser can now accurately determine statement boundaries without altering the token stream, thereby enhancing the reliability of multi-statement SQL parsing. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThe parser logic in the SQL module was modified to use lookahead instead of token consumption in two locations: COPY statement parsing and CREATE EXTERNAL TABLE parsing. This change affects how end-of-statement detection works when encountering EOF or semicolon characters. Additionally, a new test was added to validate multi-statement parsing behavior. The test parses a sequence of COPY TO, CREATE EXTERNAL TABLE, and RESET statements separated by semicolons and verifies the resulting AST matches the expected statements. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Pull Request ReviewSummaryThis PR fixes a parser bug where ✅ Code Quality: GoodPositive Aspects:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request correctly addresses an issue where the parser was prematurely consuming tokens when checking for EOF or SemiColon. By switching from next_token() to peek_token(), the parser can now inspect the token without consuming it, ensuring correct behavior for multi-statement parsing and error handling. The addition of test_multistatement is a valuable contribution, providing a comprehensive test case that validates the parser's ability to handle sequences of different SQL statements, including COPY, CREATE EXTERNAL TABLE, and RESET.
| } | ||
| } else { | ||
| let token = self.parser.next_token(); | ||
| let token = self.parser.peek_token(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing next_token() to peek_token() here is a crucial correctness fix. The original implementation would consume the token regardless of whether it was EOF or SemiColon, potentially leading to incorrect parsing or error messages if the token was not one of those. peek_token() allows for inspection without consumption, which is the correct approach in this control flow.
| let token = self.parser.peek_token(); | |
| let token = self.parser.peek_token(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:annoying; category:bug; feedback:The Gemini AI reviewer is not correct to suggest this code change because it will lose the indentation and the code will be harder to read. And most probably cargo fmt --check will complain about it.
| } | ||
| } else { | ||
| let token = self.parser.next_token(); | ||
| let token = self.parser.peek_token(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change from next_token() to peek_token() mirrors the fix on line 661 and addresses the same correctness issue within the parse_create_external_table function. It prevents premature token consumption, ensuring that the parser's state remains consistent when checking for statement delimiters.
| let token = self.parser.peek_token(); | |
| let token = self.parser.peek_token(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:annoying; category:bug; feedback:The Gemini AI reviewer is not correct to suggest this code change because it will lose the indentation and the code will be harder to read. And most probably cargo fmt --check will complain about it.
| fn test_multistatement() { | ||
| let sql = "COPY foo TO bar STORED AS CSV; \ | ||
| CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 'foo.csv'; \ | ||
| RESET var;"; | ||
| let statements = DFParser::parse_sql(sql).unwrap(); | ||
| assert_eq!( | ||
| statements, | ||
| vec![ | ||
| Statement::CopyTo(CopyToStatement { | ||
| source: object_name("foo"), | ||
| target: "bar".to_string(), | ||
| partitioned_by: vec![], | ||
| stored_as: Some("CSV".to_owned()), | ||
| options: vec![], | ||
| }), | ||
| { | ||
| let name = ObjectName::from(vec![Ident::from("t")]); | ||
| let display = None; | ||
| Statement::CreateExternalTable(CreateExternalTable { | ||
| name: name.clone(), | ||
| columns: vec![make_column_def("c1", DataType::Int(display))], | ||
| file_type: "CSV".to_string(), | ||
| location: "foo.csv".into(), | ||
| table_partition_cols: vec![], | ||
| order_exprs: vec![], | ||
| if_not_exists: false, | ||
| or_replace: false, | ||
| temporary: false, | ||
| unbounded: false, | ||
| options: vec![], | ||
| constraints: vec![], | ||
| }) | ||
| }, | ||
| { | ||
| let name = ObjectName::from(vec![Ident::from("var")]); | ||
| Statement::Reset(ResetStatement::Variable(name)) | ||
| } | ||
| ] | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of test_multistatement is excellent. It provides good coverage for the parser's ability to handle multiple statements, which is directly impacted by the peek_token changes. This test ensures the parser correctly identifies and processes a sequence of COPY, CREATE EXTERNAL TABLE, and RESET statements.
🤖 Augment PR SummarySummary: This PR fixes multi-statement parsing by avoiding consumption of statement terminators while parsing certain statements. Changes:
Technical Notes: Leaving 🤖 Was this summary useful? React with 👍 or 👎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review completed. No suggestions at this time.
Comment augment review to trigger a new review at any time.
19927: To review by AI