Skip to content

Conversation

@martin-augment
Copy link
Owner

19927: To review by AI

If we use `next_token(...)` then in the case of multi-statement query
it consumes a semicolon token which leads to the parse error on the
correct queries like:

```sql
CREATE EXTERNAL TABLE t1 ...;
CREATE EXTERNAL TABLE t2 ...;
```
@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the SQL parser's token handling mechanism to correctly identify the end of statements. Previously, the parser would consume the next token when checking for an end-of-file or semicolon, which could lead to incorrect parsing of subsequent statements. By switching to peeking at the next token instead of consuming it, the parser can now accurately determine statement boundaries without altering the token stream, thereby enhancing the reliability of multi-statement SQL parsing.

Highlights

  • Parser Token Consumption Fix: Changed the parser's logic from consuming the next token (next_token()) to merely inspecting it (peek_token()) when checking for EOF or SemiColon to determine statement boundaries. This prevents premature token consumption and ensures correct multi-statement parsing.
  • Multi-statement Parsing Test: Added a new unit test, test_multistatement, to validate the parser's ability to correctly handle and parse multiple SQL statements (e.g., COPY, CREATE EXTERNAL TABLE, RESET) separated by semicolons.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

Walkthrough

The parser logic in the SQL module was modified to use lookahead instead of token consumption in two locations: COPY statement parsing and CREATE EXTERNAL TABLE parsing. This change affects how end-of-statement detection works when encountering EOF or semicolon characters. Additionally, a new test was added to validate multi-statement parsing behavior. The test parses a sequence of COPY TO, CREATE EXTERNAL TABLE, and RESET statements separated by semicolons and verifies the resulting AST matches the expected statements.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Jan 22, 2026

Pull Request Review

Summary

This PR fixes a parser bug where next_token() was incorrectly consuming tokens when checking for EOF or semicolon in the parse_copy() and parse_create_external_table() methods. The fix changes to peek_token() to avoid consuming tokens, and adds a test case for multi-statement parsing.


✅ Code Quality: Good

Positive Aspects:

  1. Correct Bug Fix: The change from next_token() to peek_token() is the right approach. When checking for EOF or semicolon to determine loop termination, the token should only be peeked at, not consumed.

  2. Consistent Pattern: The fix maintains consistency with the existing pattern used elsewhere in the file (line 473 already uses peek_token() for EOF checking in parse_statements()).

  3. Comprehensive Test: The new test_multistatement() test validates that the fix works correctly for multiple statements separated by semicolons, covering all three statement types (COPY, CREATE EXTERNAL TABLE, and RESET).

  4. Minimal Change: The PR makes the smallest possible change to fix the issue, affecting only 2 lines of actual code.


⚠️ Potential Issues: Minor Concerns

1. Test Coverage (Severity: Low)

The test only covers the happy path with trailing semicolons. Consider additional test cases:

  • Multiple statements without trailing semicolons
  • Statements with trailing whitespace
  • Edge cases with just EOF

Recommendation: Add a test case like:

#[test]
fn test_multistatement_no_trailing_semicolon() {
    let sql = "COPY foo TO bar STORED AS CSV; CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 'foo.csv'";
    let statements = DFParser::parse_sql(sql).unwrap();
    assert_eq!(statements.len(), 2);
}

2. Documentation (Severity: Low)

The code doesn't explain why peek_token() is necessary here. Future maintainers might not understand the subtle difference.

Recommendation: Add a comment at lines 661 and 1082:

// Use peek_token() instead of next_token() to check for statement
// terminators without consuming them from the token stream
let token = self.parser.peek_token();

🔍 Specific Code Analysis

Lines 661 and 1082 (the fix):

// Before:
let token = self.parser.next_token();

// After:
let token = self.parser.peek_token();

Analysis: This is the correct fix. The original code would consume the EOF or semicolon token, which could cause issues when parsing multiple statements. The loop should only check for these tokens, not consume them. The parse_statements() method at line 469 is responsible for consuming semicolons.

Test at lines 2029-2069:

The test is well-structured and validates:

  • ✅ COPY statement parsing
  • ✅ CREATE EXTERNAL TABLE parsing
  • ✅ RESET statement parsing
  • ✅ Proper separation by semicolons

Other Considerations

Performance: ✅ Good

  • No negative performance impact
  • May be slightly faster since it avoids unnecessary token consumption

Security: ✅ None

  • No security implications identified

Best Practices: ✅ Mostly Followed

  • Small, focused change
  • Test included
  • Consistent with existing code patterns

Overall Assessment: 8/10

This is a solid bug fix that correctly addresses the token consumption issue. The fix is minimal, correct, and includes a test.

Recommendations:

  1. Consider: Adding one more test case for multi-statement parsing without a trailing semicolon
  2. Nice to Have: Add a brief comment explaining why peek_token() is used
  3. Nice to Have: Update PR description with more context about the bug's impact

Conclusion:

This PR should be merged. The suggested improvements are minor and could be addressed in a follow-up if preferred.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request correctly addresses an issue where the parser was prematurely consuming tokens when checking for EOF or SemiColon. By switching from next_token() to peek_token(), the parser can now inspect the token without consuming it, ensuring correct behavior for multi-statement parsing and error handling. The addition of test_multistatement is a valuable contribution, providing a comprehensive test case that validates the parser's ability to handle sequences of different SQL statements, including COPY, CREATE EXTERNAL TABLE, and RESET.

}
} else {
let token = self.parser.next_token();
let token = self.parser.peek_token();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Changing next_token() to peek_token() here is a crucial correctness fix. The original implementation would consume the token regardless of whether it was EOF or SemiColon, potentially leading to incorrect parsing or error messages if the token was not one of those. peek_token() allows for inspection without consumption, which is the correct approach in this control flow.

Suggested change
let token = self.parser.peek_token();
let token = self.parser.peek_token();

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback:The Gemini AI reviewer is not correct to suggest this code change because it will lose the indentation and the code will be harder to read. And most probably cargo fmt --check will complain about it.

}
} else {
let token = self.parser.next_token();
let token = self.parser.peek_token();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change from next_token() to peek_token() mirrors the fix on line 661 and addresses the same correctness issue within the parse_create_external_table function. It prevents premature token consumption, ensuring that the parser's state remains consistent when checking for statement delimiters.

Suggested change
let token = self.parser.peek_token();
let token = self.parser.peek_token();

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback:The Gemini AI reviewer is not correct to suggest this code change because it will lose the indentation and the code will be harder to read. And most probably cargo fmt --check will complain about it.

Comment on lines +2030 to +2069
fn test_multistatement() {
let sql = "COPY foo TO bar STORED AS CSV; \
CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 'foo.csv'; \
RESET var;";
let statements = DFParser::parse_sql(sql).unwrap();
assert_eq!(
statements,
vec![
Statement::CopyTo(CopyToStatement {
source: object_name("foo"),
target: "bar".to_string(),
partitioned_by: vec![],
stored_as: Some("CSV".to_owned()),
options: vec![],
}),
{
let name = ObjectName::from(vec![Ident::from("t")]);
let display = None;
Statement::CreateExternalTable(CreateExternalTable {
name: name.clone(),
columns: vec![make_column_def("c1", DataType::Int(display))],
file_type: "CSV".to_string(),
location: "foo.csv".into(),
table_partition_cols: vec![],
order_exprs: vec![],
if_not_exists: false,
or_replace: false,
temporary: false,
unbounded: false,
options: vec![],
constraints: vec![],
})
},
{
let name = ObjectName::from(vec![Ident::from("var")]);
Statement::Reset(ResetStatement::Variable(name))
}
]
);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The addition of test_multistatement is excellent. It provides good coverage for the parser's ability to handle multiple statements, which is directly impacted by the peek_token changes. This test ensures the parser correctly identifies and processes a sequence of COPY, CREATE EXTERNAL TABLE, and RESET statements.

@augmentcode
Copy link

augmentcode bot commented Jan 22, 2026

🤖 Augment PR Summary

Summary: This PR fixes multi-statement parsing by avoiding consumption of statement terminators while parsing certain statements.

Changes:

  • Switched internal end-of-statement checks from consuming tokens (`next_token`) to peeking (`peek_token`) in COPY and CREATE EXTERNAL TABLE parsing loops
  • Added a regression test that parses multiple semicolon-delimited statements (COPY, CREATE EXTERNAL TABLE, RESET) in one SQL string

Technical Notes: Leaving ;/EOF unconsumed allows the outer parse_statements loop to consistently handle delimiters and termination.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants