19927: fix: change token consumption to pick to test on EOF in parser #202

martin-augment · 2026-01-22T10:59:07Z

19927: To review by AI

If we use `next_token(...)` then in the case of multi-statement query it consumes a semicolon token which leads to the parse error on the correct queries like: ```sql CREATE EXTERNAL TABLE t1 ...; CREATE EXTERNAL TABLE t2 ...; ```

gemini-code-assist · 2026-01-22T10:59:21Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the SQL parser's token handling mechanism to correctly identify the end of statements. Previously, the parser would consume the next token when checking for an end-of-file or semicolon, which could lead to incorrect parsing of subsequent statements. By switching to peeking at the next token instead of consuming it, the parser can now accurately determine statement boundaries without altering the token stream, thereby enhancing the reliability of multi-statement SQL parsing.

Highlights

Parser Token Consumption Fix: Changed the parser's logic from consuming the next token (next_token()) to merely inspecting it (peek_token()) when checking for EOF or SemiColon to determine statement boundaries. This prevents premature token consumption and ensures correct multi-statement parsing.
Multi-statement Parsing Test: Added a new unit test, test_multistatement, to validate the parser's ability to correctly handle and parse multiple SQL statements (e.g., COPY, CREATE EXTERNAL TABLE, RESET) separated by semicolons.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-01-22T11:00:10Z

Walkthrough

The parser logic in the SQL module was modified to use lookahead instead of token consumption in two locations: COPY statement parsing and CREATE EXTERNAL TABLE parsing. This change affects how end-of-statement detection works when encountering EOF or semicolon characters. Additionally, a new test was added to validate multi-statement parsing behavior. The test parses a sequence of COPY TO, CREATE EXTERNAL TABLE, and RESET statements separated by semicolons and verifies the resulting AST matches the expected statements.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-01-22T11:00:42Z

Pull Request Review

Summary

This PR fixes a parser bug where next_token() was incorrectly consuming tokens when checking for EOF or semicolon in the parse_copy() and parse_create_external_table() methods. The fix changes to peek_token() to avoid consuming tokens, and adds a test case for multi-statement parsing.

✅ Code Quality: Good

Positive Aspects:

Correct Bug Fix: The change from next_token() to peek_token() is the right approach. When checking for EOF or semicolon to determine loop termination, the token should only be peeked at, not consumed.
Consistent Pattern: The fix maintains consistency with the existing pattern used elsewhere in the file (line 473 already uses peek_token() for EOF checking in parse_statements()).
Comprehensive Test: The new test_multistatement() test validates that the fix works correctly for multiple statements separated by semicolons, covering all three statement types (COPY, CREATE EXTERNAL TABLE, and RESET).
Minimal Change: The PR makes the smallest possible change to fix the issue, affecting only 2 lines of actual code.

⚠️ Potential Issues: Minor Concerns

1. Test Coverage (Severity: Low)

The test only covers the happy path with trailing semicolons. Consider additional test cases:

Multiple statements without trailing semicolons
Statements with trailing whitespace
Edge cases with just EOF

Recommendation: Add a test case like:

#[test]
fn test_multistatement_no_trailing_semicolon() {
    let sql = "COPY foo TO bar STORED AS CSV; CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 'foo.csv'";
    let statements = DFParser::parse_sql(sql).unwrap();
    assert_eq!(statements.len(), 2);
}

2. Documentation (Severity: Low)

The code doesn't explain why peek_token() is necessary here. Future maintainers might not understand the subtle difference.

Recommendation: Add a comment at lines 661 and 1082:

// Use peek_token() instead of next_token() to check for statement
// terminators without consuming them from the token stream
let token = self.parser.peek_token();

🔍 Specific Code Analysis

Lines 661 and 1082 (the fix):

// Before:
let token = self.parser.next_token();

// After:
let token = self.parser.peek_token();

Analysis: This is the correct fix. The original code would consume the EOF or semicolon token, which could cause issues when parsing multiple statements. The loop should only check for these tokens, not consume them. The parse_statements() method at line 469 is responsible for consuming semicolons.

Test at lines 2029-2069:

The test is well-structured and validates:

✅ COPY statement parsing
✅ CREATE EXTERNAL TABLE parsing
✅ RESET statement parsing
✅ Proper separation by semicolons

Other Considerations

Performance: ✅ Good

No negative performance impact
May be slightly faster since it avoids unnecessary token consumption

Security: ✅ None

No security implications identified

Best Practices: ✅ Mostly Followed

Small, focused change
Test included
Consistent with existing code patterns

Overall Assessment: 8/10

This is a solid bug fix that correctly addresses the token consumption issue. The fix is minimal, correct, and includes a test.

Recommendations:

Consider: Adding one more test case for multi-statement parsing without a trailing semicolon
Nice to Have: Add a brief comment explaining why peek_token() is used
Nice to Have: Update PR description with more context about the bug's impact

Conclusion:

✅ This PR should be merged. The suggested improvements are minor and could be addressed in a follow-up if preferred.

gemini-code-assist

Code Review

The pull request correctly addresses an issue where the parser was prematurely consuming tokens when checking for EOF or SemiColon. By switching from next_token() to peek_token(), the parser can now inspect the token without consuming it, ensuring correct behavior for multi-statement parsing and error handling. The addition of test_multistatement is a valuable contribution, providing a comprehensive test case that validates the parser's ability to handle sequences of different SQL statements, including COPY, CREATE EXTERNAL TABLE, and RESET.

gemini-code-assist · 2026-01-22T11:00:56Z

datafusion/sql/src/parser.rs

                }
            } else {
-                let token = self.parser.next_token();
+                let token = self.parser.peek_token();


Changing next_token() to peek_token() here is a crucial correctness fix. The original implementation would consume the token regardless of whether it was EOF or SemiColon, potentially leading to incorrect parsing or error messages if the token was not one of those. peek_token() allows for inspection without consumption, which is the correct approach in this control flow.

Suggested change

let token = self.parser.peek_token();

let token = self.parser.peek_token();

value:annoying; category:bug; feedback:The Gemini AI reviewer is not correct to suggest this code change because it will lose the indentation and the code will be harder to read. And most probably cargo fmt --check will complain about it.

gemini-code-assist · 2026-01-22T11:00:56Z

datafusion/sql/src/parser.rs

                }
            } else {
-                let token = self.parser.next_token();
+                let token = self.parser.peek_token();


This change from next_token() to peek_token() mirrors the fix on line 661 and addresses the same correctness issue within the parse_create_external_table function. It prevents premature token consumption, ensuring that the parser's state remains consistent when checking for statement delimiters.

Suggested change

let token = self.parser.peek_token();

let token = self.parser.peek_token();

value:annoying; category:bug; feedback:The Gemini AI reviewer is not correct to suggest this code change because it will lose the indentation and the code will be harder to read. And most probably cargo fmt --check will complain about it.

gemini-code-assist · 2026-01-22T11:00:57Z

datafusion/sql/src/parser.rs

+    fn test_multistatement() {
+        let sql = "COPY foo TO bar STORED AS CSV; \
+             CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 'foo.csv'; \
+             RESET var;";
+        let statements = DFParser::parse_sql(sql).unwrap();
+        assert_eq!(
+            statements,
+            vec![
+                Statement::CopyTo(CopyToStatement {
+                    source: object_name("foo"),
+                    target: "bar".to_string(),
+                    partitioned_by: vec![],
+                    stored_as: Some("CSV".to_owned()),
+                    options: vec![],
+                }),
+                {
+                    let name = ObjectName::from(vec![Ident::from("t")]);
+                    let display = None;
+                    Statement::CreateExternalTable(CreateExternalTable {
+                        name: name.clone(),
+                        columns: vec![make_column_def("c1", DataType::Int(display))],
+                        file_type: "CSV".to_string(),
+                        location: "foo.csv".into(),
+                        table_partition_cols: vec![],
+                        order_exprs: vec![],
+                        if_not_exists: false,
+                        or_replace: false,
+                        temporary: false,
+                        unbounded: false,
+                        options: vec![],
+                        constraints: vec![],
+                    })
+                },
+                {
+                    let name = ObjectName::from(vec![Ident::from("var")]);
+                    Statement::Reset(ResetStatement::Variable(name))
+                }
+            ]
+        );
+    }


The addition of test_multistatement is excellent. It provides good coverage for the parser's ability to handle multiple statements, which is directly impacted by the peek_token changes. This test ensures the parser correctly identifies and processes a sequence of COPY, CREATE EXTERNAL TABLE, and RESET statements.

augmentcode · 2026-01-22T11:02:54Z

🤖 Augment PR Summary

Summary: This PR fixes multi-statement parsing by avoiding consumption of statement terminators while parsing certain statements.

Changes:

Switched internal end-of-statement checks from consuming tokens (`next_token`) to peeking (`peek_token`) in COPY and CREATE EXTERNAL TABLE parsing loops
Added a regression test that parses multiple semicolon-delimited statements (COPY, CREATE EXTERNAL TABLE, RESET) in one SQL string

Technical Notes: Leaving ;/EOF unconsumed allows the outer parse_statements loop to consistently handle delimiters and termination.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

gemini-code-assist bot reviewed Jan 22, 2026

View reviewed changes

augmentcode bot reviewed Jan 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

19927: fix: change token consumption to pick to test on EOF in parser #202

19927: fix: change token consumption to pick to test on EOF in parser #202

Uh oh!

martin-augment commented Jan 22, 2026

Uh oh!

gemini-code-assist bot commented Jan 22, 2026

Uh oh!

coderabbitai bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 22, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Uh oh!

martin-augment Jan 22, 2026

Uh oh!

gemini-code-assist bot Jan 22, 2026

Uh oh!

martin-augment Jan 22, 2026

Uh oh!

gemini-code-assist bot Jan 22, 2026

Uh oh!

augmentcode bot commented Jan 22, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	let token = self.parser.peek_token();
	let token = self.parser.peek_token();

19927: fix: change token consumption to pick to test on EOF in parser #202

Are you sure you want to change the base?

19927: fix: change token consumption to pick to test on EOF in parser #202

Uh oh!

Conversation

martin-augment commented Jan 22, 2026

Uh oh!

gemini-code-assist bot commented Jan 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Jan 22, 2026

Pull Request Review

Summary

✅ Code Quality: Good

Positive Aspects:

⚠️ Potential Issues: Minor Concerns

1. Test Coverage (Severity: Low)

2. Documentation (Severity: Low)

🔍 Specific Code Analysis

Lines 661 and 1082 (the fix):

Test at lines 2029-2069:

Other Considerations

Performance: ✅ Good

Security: ✅ None

Best Practices: ✅ Mostly Followed

Overall Assessment: 8/10

Recommendations:

Conclusion:

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Jan 22, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Jan 22, 2026 •

edited

Loading