Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions dev/update_datafusion_versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@
'datafusion-benchmarks': 'benchmarks/Cargo.toml',
'datafusion-cli': 'datafusion-cli/Cargo.toml',
'datafusion-examples': 'datafusion-examples/Cargo.toml',
'datafusion-docs': 'docs/Cargo.toml',
}

def update_workspace_version(new_version: str):
Expand Down Expand Up @@ -116,7 +115,8 @@ def update_docs(path: str, new_version: str):
with open(path, 'r+') as fd:
content = fd.read()
fd.seek(0)
content = re.sub(r'datafusion = "(.+)"', f'datafusion = "{new_version}"', content)
content = re.sub(r'datafusion = "(.+?)"', f'datafusion = "{new_version}"', content)
content = re.sub(r'datafusion = { version = "(.+?)"', f'datafusion = {{ version = "{new_version}"', content)
Comment on lines +118 to +119

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The two re.sub calls can be combined into a single, more robust call. This new regex will handle optional whitespace and both version specification formats (datafusion = "..." and datafusion = { version = "..." }). Using a lambda function for the replacement makes the code clearer and avoids potential escaping issues. This simplifies the code and makes it more resilient to formatting variations.

Suggested change
content = re.sub(r'datafusion = "(.+?)"', f'datafusion = "{new_version}"', content)
content = re.sub(r'datafusion = { version = "(.+?)"', f'datafusion = {{ version = "{new_version}"', content)
content = re.sub(r'(datafusion\s*=\s*(?:{\s*version\s*=\s*)?)"(.+?)"', lambda m: f'{m.group(1)}"{new_version}"', content)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-but-wont-fix; category:bug; feedback:The Gemini AI reviewer is correct that the same could be achieved with a single regex pattern but it will be more complex to read and maintain! The regex is used is a script that is executed once per release, so it is not important to be very optimized.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing file truncation causes file corruption

High Severity

The update_docs function opens files with 'r+' mode, reads content, seeks to the beginning, and writes modified content without calling fd.truncate(). When the replacement makes content shorter (e.g., "latest_version" to "52.0.0" in example-usage.md loses 8 characters), leftover bytes from the original file remain at the end, corrupting the file. This PR adds a call to process example-usage.md which triggers this corruption.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The file should be truncated before or after writing the new content. Otherwise it may leave some extra characters at the end from the old content.

fd.write(content)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_docs opens files with r+, seeks to 0, and writes the new content but never truncates; if the replacement ever makes the file shorter, stale trailing bytes will remain at the end of the file.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The file should be truncated before or after writing the new content. Otherwise it may leave some extra characters at the end from the old content.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern r'datafusion = { version = ...' contains an unescaped {, which is a special regex character in Python and can raise re.error at runtime (or match unexpectedly).

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎


Expand Down Expand Up @@ -144,6 +144,9 @@ def main():
update_downstream_versions(cargo_toml, new_version)

update_docs("README.md", new_version)
update_docs("docs/source/download.md", new_version)
update_docs("docs/source/user-guide/example-usage.md", new_version)
update_docs("docs/source/user-guide/crate-configuration.md", new_version)
Comment on lines 146 to +149

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The repeated calls to update_docs can be refactored by putting the file paths into a list and iterating over it. This improves readability and makes it easier to add or remove documentation files in the future.

Suggested change
update_docs("README.md", new_version)
update_docs("docs/source/download.md", new_version)
update_docs("docs/source/user-guide/example-usage.md", new_version)
update_docs("docs/source/user-guide/crate-configuration.md", new_version)
doc_files_to_update = [
"README.md",
"docs/source/download.md",
"docs/source/user-guide/example-usage.md",
"docs/source/user-guide/crate-configuration.md",
]
for doc_file in doc_files_to_update:
update_docs(doc_file, new_version)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback:The Gemini AI reviewer is correct! Using a list of file names and iterating over them in a loop to update each of them would be more easier to read and maintain!



if __name__ == "__main__":
Expand Down
2 changes: 1 addition & 1 deletion docs/source/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ For example:

```toml
[dependencies]
datafusion = "41.0.0"
datafusion = "52.0.0"
```

While DataFusion is distributed via [crates.io] as a convenience, the
Expand Down
2 changes: 1 addition & 1 deletion docs/source/user-guide/crate-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ By default, Datafusion returns errors as a plain text message. You can enable mo
such as backtraces by enabling the `backtrace` feature to your `Cargo.toml` file like this:

```toml
datafusion = { version = "31.0.0", features = ["backtrace"]}
datafusion = { version = "52.0.0", features = ["backtrace"]}
```

Set environment [variables](https://doc.rust-lang.org/std/backtrace/index.html#environment-variables)
Expand Down
2 changes: 1 addition & 1 deletion docs/source/user-guide/example-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Find latest available Datafusion version on [DataFusion's
crates.io] page. Add the dependency to your `Cargo.toml` file:

```toml
datafusion = "latest_version"
datafusion = "52.0.0"
tokio = { version = "1.0", features = ["rt-multi-thread"] }
```

Expand Down