-
Notifications
You must be signed in to change notification settings - Fork 0
20001: docs: Automatically update DataFusion version in docs #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request streamlines the maintenance of DataFusion version numbers across the project's documentation. By modifying an existing Python script, it automates the process of updating version strings in critical user-facing files. This change ensures that installation instructions and code examples in the documentation consistently reflect the latest DataFusion release, thereby improving accuracy and reducing the potential for outdated information. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThis pull request updates the version management tooling and documentation references for DataFusion. The ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request automates the process of updating the DataFusion version within documentation files by enhancing a Python script. The changes introduce new regular expressions to accommodate various version formats and expand the list of documentation files to be updated. My review provides suggestions to improve the robustness of the regular expressions and to refactor the script for better maintainability.
| content = re.sub(r'datafusion = "(.+?)"', f'datafusion = "{new_version}"', content) | ||
| content = re.sub(r'datafusion = { version = "(.+?)"', f'datafusion = {{ version = "{new_version}"', content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two re.sub calls can be combined into a single, more robust call. This new regex will handle optional whitespace and both version specification formats (datafusion = "..." and datafusion = { version = "..." }). Using a lambda function for the replacement makes the code clearer and avoids potential escaping issues. This simplifies the code and makes it more resilient to formatting variations.
| content = re.sub(r'datafusion = "(.+?)"', f'datafusion = "{new_version}"', content) | |
| content = re.sub(r'datafusion = { version = "(.+?)"', f'datafusion = {{ version = "{new_version}"', content) | |
| content = re.sub(r'(datafusion\s*=\s*(?:{\s*version\s*=\s*)?)"(.+?)"', lambda m: f'{m.group(1)}"{new_version}"', content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:good-but-wont-fix; category:bug; feedback:The Gemini AI reviewer is correct that the same could be achieved with a single regex pattern but it will be more complex to read and maintain! The regex is used is a script that is executed once per release, so it is not important to be very optimized.
| update_docs("README.md", new_version) | ||
| update_docs("docs/source/download.md", new_version) | ||
| update_docs("docs/source/user-guide/example-usage.md", new_version) | ||
| update_docs("docs/source/user-guide/crate-configuration.md", new_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repeated calls to update_docs can be refactored by putting the file paths into a list and iterating over it. This improves readability and makes it easier to add or remove documentation files in the future.
| update_docs("README.md", new_version) | |
| update_docs("docs/source/download.md", new_version) | |
| update_docs("docs/source/user-guide/example-usage.md", new_version) | |
| update_docs("docs/source/user-guide/crate-configuration.md", new_version) | |
| doc_files_to_update = [ | |
| "README.md", | |
| "docs/source/download.md", | |
| "docs/source/user-guide/example-usage.md", | |
| "docs/source/user-guide/crate-configuration.md", | |
| ] | |
| for doc_file in doc_files_to_update: | |
| update_docs(doc_file, new_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:good-to-have; category:bug; feedback:The Gemini AI reviewer is correct! Using a list of file names and iterating over them in a loop to update each of them would be more easier to read and maintain!
Code Review for PR #208: Automatically Update DataFusion Version in DocsSummaryThis PR enhances the update_datafusion_versions.py script to automatically update version numbers in documentation files. The changes are focused and appropriate for their purpose. Code Quality and Best PracticesStrengths:
Critical Bug - File truncation missing: Potential Bugs or Issues
Performance ConsiderationsThe changes have minimal performance impact. Regex operations are simple and efficient, and adding a few more file updates won't noticeably affect script runtime. Security ConcernsLow risk overall, but consider:
Test CoverageMajor concern: No automated tests
Recommendations:
Additional Observations
RecommendationApprove with modifications - The core changes are sound and improve version update automation. However, please address the file truncation bug before merging. Priority Fixes
Co-Authored-By: Claude Sonnet 4.5 [email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dev/update_datafusion_versions.py (1)
113-120: Missingfd.truncate()after write could leave stale content.When using
r+mode withseek(0)andwrite(), if the new content is shorter than the original, leftover bytes from the old content will remain at the end of the file. While unlikely with version string replacements (versions typically grow), it's safer to truncate.Proposed fix
def update_docs(path: str, new_version: str): print(f"updating docs in {path}") with open(path, 'r+') as fd: content = fd.read() fd.seek(0) content = re.sub(r'datafusion = "(.+?)"', f'datafusion = "{new_version}"', content) content = re.sub(r'datafusion = { version = "(.+?)"', f'datafusion = {{ version = "{new_version}"', content) fd.write(content) + fd.truncate()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| fd.seek(0) | ||
| content = re.sub(r'datafusion = "(.+)"', f'datafusion = "{new_version}"', content) | ||
| content = re.sub(r'datafusion = "(.+?)"', f'datafusion = "{new_version}"', content) | ||
| content = re.sub(r'datafusion = { version = "(.+?)"', f'datafusion = {{ version = "{new_version}"', content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing file truncation causes file corruption
High Severity
The update_docs function opens files with 'r+' mode, reads content, seeks to the beginning, and writes modified content without calling fd.truncate(). When the replacement makes content shorter (e.g., "latest_version" to "52.0.0" in example-usage.md loses 8 characters), leftover bytes from the original file remain at the end, corrupting the file. This PR adds a call to process example-usage.md which triggers this corruption.
Additional Locations (1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The file should be truncated before or after writing the new content. Otherwise it may leave some extra characters at the end from the old content.
🤖 Augment PR SummarySummary: Extends the version update script to also refresh DataFusion version snippets in the docs. 🤖 Was this summary useful? React with 👍 or 👎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| content = re.sub(r'datafusion = "(.+)"', f'datafusion = "{new_version}"', content) | ||
| content = re.sub(r'datafusion = "(.+?)"', f'datafusion = "{new_version}"', content) | ||
| content = re.sub(r'datafusion = { version = "(.+?)"', f'datafusion = {{ version = "{new_version}"', content) | ||
| fd.write(content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The Augment AI reviewer is correct! The file should be truncated before or after writing the new content. Otherwise it may leave some extra characters at the end from the old content.
| content = re.sub(r'datafusion = "(.+?)"', f'datafusion = "{new_version}"', content) | ||
| content = re.sub(r'datafusion = { version = "(.+?)"', f'datafusion = {{ version = "{new_version}"', content) | ||
| fd.write(content) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The Claude AI reviewer is correct! The file should be truncated before or after writing the new content. Otherwise it may leave some extra characters at the end from the old content. |
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The file should be truncated before or after writing the new content. Otherwise it may leave some extra characters at the end from the old content. |
20001: To review by AI