Skip to content

Commit f5c1048

Browse files
authored
Add json_extract_columns function (#10)
1 parent d6b8103 commit f5c1048

3 files changed

Lines changed: 469 additions & 21 deletions

File tree

README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ This extension provides a set of utility functions to work with JSON data, focus
1313

1414
- **`json_flatten(json[, separator])`**: Recursively flattens nested JSON objects and arrays into a single-level object with path keys (default separator: `.`).
1515
- **`json_add_prefix(json, text)`**: Adds a string prefix to every top-level key in a JSON object.
16+
- **`json_extract_columns(json, columns[, separator])`**: Pulls selected root keys into a struct of `VARCHAR` fields using regex patterns.
1617
- **`json_group_merge(json [ORDER BY ...])`**: Streams JSON patches with RFC 7396 merge semantics without materializing intermediate lists.
1718

1819
## Quick Start
@@ -124,6 +125,33 @@ SELECT json_add_prefix('{"user": {"name": "Alice"}, "count": 5}', 'data_');
124125

125126
**Note:** This function requires the input to be a JSON object. It will raise an error if given a JSON array or primitive value.
126127

128+
### `json_extract_columns(json, columns[, separator]) -> struct`
129+
130+
Extracts selected root-level fields into a struct of `VARCHAR` columns. The first argument must be a JSON object value (not an array or primitive). `columns` must be a constant JSON object mapping output column names to RE2 regex patterns evaluated against each top-level key (partial matches by default; add anchors to tighten). Patterns are case-sensitive unless you supply inline flags such as `(?i)`. Output columns follow the mapping order.
131+
132+
`separator` defaults to `''` and is inserted between multiple matches for the same column in the order keys appear in the input object. It can be empty but cannot be `NULL` (even when the JSON input is `NULL`). Columns with no matches return `NULL`.
133+
134+
Values are stringified: strings pass through unquoted; arrays, objects, numbers, booleans, and `null` become their JSON text.
135+
136+
**Examples:**
137+
```sql
138+
SELECT (json_extract_columns('{"id": 5, "name": "duck"}',
139+
'{"id":"^id$","name":"^name$"}', ',')).id AS id;
140+
-- Result: 5
141+
142+
SELECT (json_extract_columns('{"a":1,"a2":2,"b":3}',
143+
'{"a":"^a","b":"^b$"}', '|')).a AS a_values;
144+
-- Result: 1|2
145+
146+
SELECT (json_extract_columns('{"Key": "Value"}',
147+
'{"k":"(?i)^key$"}', ',')).k AS case_insensitive;
148+
-- Result: Value
149+
150+
SELECT (json_extract_columns('{"x":"a","xx":"b"}',
151+
'{"col":"x"}')).col AS default_separator;
152+
-- Result: ab
153+
```
154+
127155
### `json_group_merge(json_expr [, treat_null_values] [ORDER BY ...]) -> json`
128156

129157
Applies a sequence of JSON patches using [RFC 7396](https://datatracker.ietf.org/doc/html/rfc7396) merge semantics. Inputs can be `JSON` values or `VARCHAR` text that parses as JSON. SQL `NULL` rows are skipped, and the aggregate returns `'{}'::json` when no non-null inputs are provided.
@@ -175,6 +203,7 @@ FROM (VALUES ('{"keep":1}'::json, 1), ('{"keep":null}'::json, 2)) AS t(patch, ts
175203

176204
- `json_flatten()` returns an error for malformed JSON
177205
- `json_add_prefix()` requires a JSON object (not array or primitive value)
206+
- `json_extract_columns()` requires a JSON object input and a constant JSON object of string regex patterns; it raises on invalid regexes, NULL separators, non-string object keys, or mismatched input shapes
178207
- `json_group_merge()` surfaces DuckDB JSON parse errors for invalid text and raises on merge buffers that exceed DuckDB limits
179208
- Maximum nesting depth: 1000 levels
180209
- Empty objects (`{}`) and arrays (`[]`) are omitted from flattened output

0 commit comments

Comments
 (0)