-
Notifications
You must be signed in to change notification settings - Fork 0
19924: Support JSON arrays reader/parse for datafusion #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
590f97e
00692af
a74b812
796d2c2
9db814a
2c727d2
98aa48e
7ac7bc4
3cf0546
b4ecc69
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3065,6 +3065,25 @@ config_namespace! { | |||||||||||||||||||||||||||||||||||||||||||||||
| /// If not specified, the default level for the compression algorithm is used. | ||||||||||||||||||||||||||||||||||||||||||||||||
| pub compression_level: Option<u32>, default = None | ||||||||||||||||||||||||||||||||||||||||||||||||
| pub schema_infer_max_rec: Option<usize>, default = None | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// The JSON format to use when reading files. | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// When `true` (default), expects newline-delimited JSON (NDJSON): | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// ```text | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// {"key1": 1, "key2": "val"} | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// {"key1": 2, "key2": "vals"} | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// ``` | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// When `false`, expects JSON array format: | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// ```text | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// [ | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// {"key1": 1, "key2": "val"}, | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// {"key1": 2, "key2": "vals"} | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// ] | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// ``` | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// Note: JSON array format requires loading the entire file into memory. | ||||||||||||||||||||||||||||||||||||||||||||||||
| /// For large files, newline-delimited format is recommended. | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+3084
to
+3085
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment Consider rephrasing to be more precise. For example, you could mention that streaming sources might be buffered. The documentation in
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback: The Gemini AI reviewer is correct! The new version of the proposed changes still loads the complete JSON array file in the memory for GetResultPayload::Stream and this may lead to out of memory errors for huge files. The file is loaded as a stream from the ObjectStore, so it should be processed as a stream. Prevents out of memory failures when loading a big JSON array file. |
||||||||||||||||||||||||||||||||||||||||||||||||
| pub newline_delimited: bool, default = true | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+3068
to
+3086
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarify the JSON array memory note. The comment says JSON array format requires loading the entire file into memory. If the new ✏️ Suggested doc tweak- /// Note: JSON array format requires loading the entire file into memory.
- /// For large files, newline-delimited format is recommended.
+ /// Note: JSON array parsing can be more memory-intensive than NDJSON.
+ /// For large files, NDJSON is recommended.📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The new version of the proposed changes still loads the complete JSON array file in the memory and this may lead to out of memory errors for huge files. The file is loaded as a stream from the ObjectStore, so it should be processed as a stream. Prevents out of memory failures when loading a big JSON array file. |
||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs state JSON array format "requires loading the entire file into memory", but the implementation uses streaming conversion for
GetResultPayload::Filepaths; this comment may be misleading for users trying to choose between formats.🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The new version of the proposed changes still loads the complete JSON array file in the memory for GetResultPayload::Stream and this may lead to out of memory errors for huge files. The file is loaded as a stream from the ObjectStore, so it should be processed as a stream. Prevents out of memory failures when loading a big JSON array file.