Skip to content

Commit 194cd32

Browse files
ZetaSQL TeamKimiWaRokkuWoKikanai
ZetaSQL Team
authored andcommitted
Export of internal ZetaSQL changes.
-- Change by ZetaSQL Team <[email protected]>: Updated the instructions for running execute_query with docker on MacOS with M1/M2 chips. -- Change by ZetaSQL Team <[email protected]>: Add a note about MacOS users seeing the error `execute_query_macos cannot be opened because the developer cannot be verified.` -- Change by ZetaSQL Team <[email protected]>: Refactoring in preparation for UPDATE constructor. -- Change by ZetaSQL Team <[email protected]>: Change the ZetaSQL Dockerfile to support different build modes. -- Change by Jeff Shute <[email protected]>: Add tests that check that a sql file runs successfully in execute_query. -- Change by ZetaSQL Team <[email protected]>: add a new TO_JSON signature that supports arg `unsupported_fiels`. -- Change by Jeff Shute <[email protected]>: Add some more example queries in examples/pipe_queries. -- Change by Brandon Dolphin <[email protected]>: Begin adding Measure type to TypeProto. -- Change by ZetaSQL Team <[email protected]>: Add per column OPTIONS and WITH COLUMN OPTIONS to analyzer. -- Change by ZetaSQL Team <[email protected]>: Handle lambda functions directly in the BuiltinFunctionRegistry scalar function APIs. -- Change by ZetaSQL Team <[email protected]>: Add per column OPTIONS and WITH COLUMN OPTIONS to analyzer. -- Change by ZetaSQL Team <[email protected]>: Add optional_ref library. -- Change by John Fremlin <[email protected]>: Add a testcase for deeply nested structs and arrays in JSON -- Change by ZetaSQL Team <[email protected]>: Update the ZetaSQL documentation: -- Change by John Fremlin <[email protected]>: Truncate output for deeply nested array expressions in unparser -- Change by ZetaSQL Team <[email protected]>: Update pipe syntax docs with TW peer review edits -- Change by Jeff Shute <[email protected]>: Fix execute_query command line help. -- Change by ZetaSQL Team <[email protected]>: add a new named arg `unsupported_fiels` for the TO_JSON function. -- Change by John Fremlin <[email protected]>: Truncate output for deeply nested CASE expressions in unparser -- Change by ZetaSQL Team <[email protected]>: Add MAP_REPLACE signatures, and reference implementation for KV pairs version -- Change by ZetaSQL Team <[email protected]>: Remove unnecessarily explicit function registrations from reference_impl/function.cc -- Change by Jeff Shute <[email protected]>: Adjust text area size so results are more visible. -- Change by ZetaSQL Team <[email protected]>: Disable formatting of SQL inside non-multiline string literals. -- Change by ZetaSQL Team <[email protected]>: Fixed issue with formatting SQL inside string literals when input string contains \r\n line endings. -- Change by ZetaSQL Team <[email protected]>: Format textproto inside annotated string literal. -- Change by ZetaSQL Team <[email protected]>: Disambiguate between open and close brackets annotations for braced constructor syntax. -- Change by Jeff Shute <[email protected]>: Improve multi-statement output in execute_query web. -- Change by ZetaSQL Team <[email protected]>: add a new named arg `unsupported_fiels` for the TO_JSON function. -- Change by ZetaSQL Team <[email protected]>: add a new built-in enum `UnsupportedFields` to be used by TO_JSON. -- Change by ZetaSQL Team <[email protected]>: Record parse location for OrderByItem iff record type is not PARSE_LOCATION_RECORD_NONE. -- Change by ZetaSQL Team <[email protected]>: Unify Lambda and non-lambda AlgebrizeFunctionCall codepaths -- Change by ZetaSQL Team <[email protected]>: small formatting updates for named arguments -- Change by ZetaSQL Team <[email protected]>: Fix the example Docker image name in the ZetaSQL doc. -- Change by ZetaSQL Team <[email protected]>: Refactor the parse AST and the grammar to use postfix table operators (e.g. TABLESAMPLE) on ASTTableExpression. -- Change by ZetaSQL Team <[email protected]>: Fix ZetaSQL documentation. GitOrigin-RevId: a68e25b308dadf3e78c4d22ec41adf72f8b08e5b Change-Id: I586b6974dbdb4e2bb4c99ba641ef96916ec33ba6
1 parent f30c319 commit 194cd32

File tree

220 files changed

+5902
-6180
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

220 files changed

+5902
-6180
lines changed

Dockerfile

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -52,17 +52,17 @@ USER zetasql
5252

5353
ENV BAZEL_ARGS="--config=g++"
5454

55-
# Pre-build the binary for execute_query so that users can try out zetasql
56-
# directly. Users can modify the target in the docker file or enter the
57-
# container and build other targets as needed.
58-
RUN cd zetasql && \
59-
CC=/usr/bin/gcc CXX=/usr/bin/g++ \
60-
bazel build ${BAZEL_ARGS} -c opt //zetasql/tools/execute_query:execute_query
61-
62-
# Create a shortcut for execute_query.
6355
ENV HOME=/home/zetasql
6456
RUN mkdir -p $HOME/bin
65-
RUN ln -s /zetasql/bazel-bin/zetasql/tools/execute_query/execute_query $HOME/bin/execute_query
57+
58+
# Supported MODE:
59+
# - `build` (default): Builds all ZetaSQL targets.
60+
# - `execute_query`: Installs the `execute_query` tool only. Erases all other
61+
# build artifacts.
62+
ARG MODE=build
63+
64+
RUN cd zetasql && ./docker_build.sh $MODE
65+
6666
ENV PATH=$PATH:$HOME/bin
6767

6868
WORKDIR /zetasql

README.md

Lines changed: 47 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -11,21 +11,24 @@ giving errors for unuspported features.
1111
ZetaSQL's compliance test suite can be used to validate query engine
1212
implementations are correct and consistent.
1313

14-
ZetaSQL implements the ZetaSQL language, which is used across several of
14+
ZetaSQL implements the GoogleSQL language, which is used across several of
1515
Google's SQL products, both publicly and internally, including BigQuery,
1616
Spanner, F1, BigTable, Dremel, Procella, and others.
1717

18-
ZetaSQL and ZetaSQL have been described in these publications:
18+
GoogleSQL and ZetaSQL have been described in these publications:
1919

20-
* (CDMS 2022) [ZetaSQL: A SQL Language as a Component](https://cdmsworkshop.github.io/2022/Slides/Fri_C2.5_DavidWilhite.pptx) (Slides)
20+
* (CDMS 2022) [GoogleSQL: A SQL Language as a Component](https://cdmsworkshop.github.io/2022/Slides/Fri_C2.5_DavidWilhite.pptx) (Slides)
2121
* (SIGMOD 2017) [Spanner: Becoming a SQL System](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46103.pdf) -- See section 6.
22-
* (VLDB 2024) [SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL](https://research.google/pubs/pub1005959/) -- Describes ZetaSQL's new pipe query syntax.
22+
* (VLDB 2024) [SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL](https://research.google/pubs/pub1005959/) -- Describes GoogleSQL's new pipe query syntax.
2323

2424
Some other documentation:
2525

2626
* [ZetaSQL Language Reference](docs/README.md)
2727
* [ZetaSQL Resolved AST](docs/resolved_ast.md), documenting the intermediate representation produced by the ZetaSQL analyzer.
2828
* [ZetaSQL Toolkit](https://github.com/GoogleCloudPlatform/zetasql-toolkit), a project using ZetaSQL to analyze and understand queries against BigQuery, and other ZetaSQL engines.
29+
* Pipe query syntax
30+
* See the [reference documentation](https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md) and [research paper](https://research.google/pubs/pub1005959/).
31+
* See some [example scripts](zetasql/examples/pipe_queries) and [TPC-H queries](zetasql/examples/tpch).
2932

3033
## Project Overview
3134

@@ -62,7 +65,8 @@ You can run it using binaries from
6265
instructions below.
6366

6467
There are some runnable example queries in
65-
[tpch examples](../zetasql/examples/tpch/README.md).
68+
[`zetasql/examples/tpch`](zetasql/examples/tpch) and
69+
[`zetasql/examples/pipe_queries`](zetasql/examples/pipe_queries).
6670

6771
### Getting and Running `execute_query`
6872
#### Pre-built Binaries
@@ -72,14 +76,20 @@ the [Releases](https://github.com/google/zetasql/releases) page. You can run
7276
the downloaded binary like:
7377

7478
```bash
79+
chmod +x execute_query_linux
7580
./execute_query_linux --web
7681
```
7782

83+
MacOS users may see the error `execute_query_macos cannot be opened because the developer cannot be verified.`.
84+
You can right click the `execute_query_macos` file, click "open", and then you
85+
should be able to run the binary.
86+
7887
Note the prebuilt binaries require GCC-9+ and tzdata. If you run into dependency
79-
issues, you can try running `execute_query` with Docker. See the
80-
[Run with Docker](#run-with-docker) section.
88+
issues or if the binary is incompatible with your platform, you can try running
89+
`execute_query` with Docker. See the [Run with Docker](#run-with-docker)
90+
section.
8191

82-
#### Running from a bazel build
92+
#### Running from a Bazel Build
8393

8494
You can build `execute_query` with Bazel from source and run it by:
8595

@@ -89,14 +99,29 @@ bazel run zetasql/tools/execute_query:execute_query -- --web
8999

90100
#### Run with Docker
91101

92-
You can run `execute_query` using Docker. First download the pre-built Docker
93-
image `zetasql` or build your own from Dockerfile. See the instructions in the
94-
[Build With Docker](#build-with-docker) section.
102+
You can run `execute_query` using Docker. Download the pre-built Docker image
103+
file `zetasql_docker.tar.gz` from the
104+
[Releases](https://github.com/google/zetasql/releases) page, and load the image
105+
using:
106+
107+
```bash
108+
sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar.gz
109+
```
110+
111+
The Docker image name is `zetasql`. (You can also build a Docker image locally
112+
using the instructions in the [Build with Docker](#build-with-docker) section.)
95113

96-
Assuming your Docker image name is MyZetaSQLImage, run:
114+
You can then run `execute_query` using:
97115

98116
```bash
99-
sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage execute_query --web
117+
sudo docker run --init -it -h=$(hostname) -p 8080:8080 zetasql execute_query --web
118+
```
119+
120+
If you are using MacOS with an Apple M1/M2 chip, add the additional argument
121+
`--platform=linux/amd64`:
122+
123+
```bash
124+
sudo docker run --init -it -h=$(hostname) -p 8080:8080 --platform linux/amd64 zetasql execute_query --web
100125
```
101126

102127
Argument descriptions:
@@ -106,6 +131,7 @@ Argument descriptions:
106131
* `-h=$(hostname)`: Makes the hostname of the container the same as that of the
107132
host.
108133
* `-p 8080:8080`: Sets up port forwarding.
134+
* `zetasql`: The docker image name.
109135

110136
`-h=$(hostname)` and `-p 8080:8080` together make the URL address of the
111137
web server accessible from the host machine.
@@ -114,7 +140,7 @@ Alternatively, you can run this to start a bash shell, and then run
114140
`execute_query` inside:
115141

116142
```bash
117-
sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage
143+
sudo docker run --init -it -h=$(hostname) -p 8080:8080 my-zetasql-image
118144

119145
# Inside the container bash shell
120146
execute_query --web
@@ -149,7 +175,7 @@ bazel build ...
149175
bazel run //zetasql/tools/execute_query:execute_query -- --web
150176

151177
# The built binary can be found under bazel-bin and run directly.
152-
bazel-bin/tools/execute_query:execute_query --web
178+
bazel-bin/zetasql/tools/execute_query/execute_query --web
153179

154180
# Build and run a test.
155181
bazel test //zetasql/parser:parser_set_test
@@ -165,28 +191,30 @@ version can be found in the `zetasql_deps_step_2.bzl` file.
165191
ZetaSQL also provides a `Dockerfile` which configures all the dependencies so
166192
that users can build ZetaSQL more easily across different platforms.
167193

168-
To build the Docker image locally (called MyZetaSQLImage here), run:
194+
To build the Docker image locally (called `my-zetasql-image` here), run:
169195

170196
```bash
171-
sudo docker build . -t MyZetaSQLImage -f Dockerfile
197+
sudo docker build . -t my-zetasql-image -f Dockerfile
172198
```
173199

174200
Alternatively, ZetaSQL provides pre-built Docker images named `zetasql`. See the
175201
[Releases](https://github.com/google/zetasql/releases) page. You can load the
176202
downloaded image by:
177203

178204
```bash
179-
sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar
205+
sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar.gz
180206
```
181207

182208
To run builds or other commands inside the Docker environment, run this command
183209
to open a bash shell inside the container:
184210

185211
```bash
186212
# Start a bash shell running inside the Docker container.
187-
sudo docker run -it MyZetaSQLImage
213+
sudo docker run -it my-zetasql-image
188214
```
189215

216+
Replace `my-zetasql-image` with `zetasql` if you use the pre-built Docker image.
217+
190218
Then you can run the commands from the [Build with Bazel](#build-with-bazel)
191219
section above.
192220

docker_build.sh

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
#!/bin/bash
2+
#
3+
# Copyright 2024 Google LLC
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
set -e
18+
set -x
19+
20+
MODE=$1
21+
22+
CC=/usr/bin/gcc
23+
CXX=/usr/bin/g++
24+
25+
if [ "$MODE" = "build" ]; then
26+
# Build everything.
27+
bazel build ${BAZEL_ARGS} -c opt ...
28+
elif [ "$MODE" = "execute_query" ]; then
29+
# Install the execute_query tool.
30+
bazel build ${BAZEL_ARGS} -c opt --dynamic_mode=off //zetasql/tools/execute_query:execute_query
31+
# Move the generated binary to the home directory so that users can run it
32+
# directly.
33+
cp /zetasql/bazel-bin/zetasql/tools/execute_query/execute_query $HOME/bin/execute_query
34+
# Remove the downloaded and generated artifacts to keep the image small.
35+
bazel clean --expunge
36+
else
37+
echo "Unknown mode: $MODE"
38+
echo "Supported modes are: build, execute_query"
39+
exit 1
40+
fi

docs/README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,15 @@
77
The topics in this section provide the reference information you need to work
88
with ZetaSQL:
99

10-
* [Lexical Structure and Syntax](https://github.com/google/zetasql/blob/master/docs/lexical.md)
11-
* [Expressions, Functions, and Operators](https://github.com/google/zetasql/blob/master/docs/functions-and-operators.md)
12-
* [Data Types](https://github.com/google/zetasql/blob/master/docs/data-types.md)
13-
* [Query Syntax](https://github.com/google/zetasql/blob/master/docs/query-syntax.md)
14-
* [Data Manipulation Language Reference](https://github.com/google/zetasql/blob/master/docs/data-manipulation-language.md)
15-
* [Data Model](https://github.com/google/zetasql/blob/master/docs/data-model.md)
16-
* [Data Definition Language Reference](https://github.com/google/zetasql/blob/master/docs/data-definition-language.md)
17-
* [Modules](https://github.com/google/zetasql/blob/master/docs/modules.md)
10+
* [Lexical Structure and Syntax](lexical.md)
11+
* [Expressions, Functions, and Operators](functions-and-operators.md)
12+
* [Data Types](data-types.md)
13+
* [Query Syntax](query-syntax.md)
14+
* [Pipe Query Syntax](pipe-syntax.md)
15+
* [Data Manipulation Language Reference](data-manipulation-language.md)
16+
* [Data Model](data-model.md)
17+
* [Data Definition Language Reference](data-definition-language.md)
18+
* [Modules](modules.md)
1819

1920
## License
2021

docs/aggregate-dp-functions.md

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ determine the optimal privacy parameters for your dataset and organization.
186186
WITH DIFFERENTIAL_PRIVACY ...
187187
AVG(
188188
expression,
189-
[contribution_bounds_per_group => (lower_bound, upper_bound)]
189+
[ contribution_bounds_per_group => (lower_bound, upper_bound) ]
190190
)
191191
```
192192

@@ -201,9 +201,9 @@ and can support the following arguments:
201201

202202
+ `expression`: The input expression. This can be any numeric input type,
203203
such as `INT64`.
204-
+ `contribution_bounds_per_group`: The
205-
[contribution bounds named argument][dp-clamped-named].
206-
Perform clamping per each group separately before performing intermediate
204+
+ `contribution_bounds_per_group`: A named argument with a
205+
[contribution bound][dp-clamped-named].
206+
Performs clamping for each group separately before performing intermediate
207207
grouping on the privacy unit column.
208208

209209
**Return type**
@@ -330,7 +330,7 @@ noise, see [Remove noise][dp-noise].
330330
WITH DIFFERENTIAL_PRIVACY ...
331331
COUNT(
332332
*,
333-
[contribution_bounds_per_group => (lower_bound, upper_bound)]
333+
[ contribution_bounds_per_group => (lower_bound, upper_bound) ]
334334
)
335335
```
336336

@@ -343,9 +343,9 @@ is an aggregation across a privacy unit column.
343343
This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax]
344344
and can support the following argument:
345345

346-
+ `contribution_bounds_per_group`: The
347-
[contribution bounds named argument][dp-clamped-named].
348-
Perform clamping per each group separately before performing intermediate
346+
+ `contribution_bounds_per_group`: A named argument with a
347+
[contribution bound][dp-clamped-named].
348+
Performs clamping for each group separately before performing intermediate
349349
grouping on the privacy unit column.
350350

351351
**Return type**
@@ -468,9 +468,9 @@ and can support these arguments:
468468

469469
+ `expression`: The input expression. This expression can be any
470470
numeric input type, such as `INT64`.
471-
+ `contribution_bounds_per_group`: The
472-
[contribution bounds named argument][dp-clamped-named].
473-
Perform clamping per each group separately before performing intermediate
471+
+ `contribution_bounds_per_group`: A named argument with a
472+
[contribution bound][dp-clamped-named].
473+
Performs clamping per each group separately before performing intermediate
474474
grouping on the privacy unit column.
475475

476476
**Return type**
@@ -609,9 +609,9 @@ and can support these arguments:
609609
such as `INT64`. `NULL` values are always ignored.
610610
+ `percentile`: The percentile to compute. The percentile must be a literal in
611611
the range `[0, 1]`.
612-
+ `contribution_bounds_per_row`: The
613-
[contribution bounds named argument][dp-clamped-named].
614-
Perform clamping per each row separately before performing intermediate
612+
+ `contribution_bounds_per_row`: A named argument with a
613+
[contribution bounds][dp-clamped-named].
614+
Performs clamping for each row separately before performing intermediate
615615
grouping on the privacy unit column.
616616

617617
`NUMERIC` and `BIGNUMERIC` arguments are not allowed.
@@ -689,7 +689,7 @@ GROUP BY item;
689689
WITH DIFFERENTIAL_PRIVACY ...
690690
SUM(
691691
expression,
692-
[contribution_bounds_per_group => (lower_bound, upper_bound)]
692+
[ contribution_bounds_per_group => (lower_bound, upper_bound) ]
693693
)
694694
```
695695

@@ -703,10 +703,9 @@ and can support these arguments:
703703

704704
+ `expression`: The input expression. This can be any numeric input type,
705705
such as `INT64`. `NULL` values are always ignored.
706-
+ `contribution_bounds_per_group`: The
707-
[contribution bounds named argument][dp-clamped-named].
708-
Perform clamping per each group separately before performing intermediate
709-
grouping on the privacy unit column.
706+
+ `contribution_bounds_per_group`: A named argument with a
707+
[contribution bound][dp-clamped-named]. Performs clamping for each group
708+
separately before performing intermediate grouping on the privacy unit column.
710709

711710
**Return type**
712711

@@ -830,7 +829,7 @@ noise, see [Use differential privacy][dp-noise].
830829
WITH DIFFERENTIAL_PRIVACY ...
831830
VAR_POP(
832831
expression,
833-
[contribution_bounds_per_row => (lower_bound, upper_bound)]
832+
[ contribution_bounds_per_row => (lower_bound, upper_bound) ]
834833
)
835834
```
836835

@@ -847,9 +846,9 @@ can support these arguments:
847846

848847
+ `expression`: The input expression. This can be any numeric input type,
849848
such as `INT64`. `NULL`s are always ignored.
850-
+ `contribution_bounds_per_row`: The
851-
[contribution bounds named argument][dp-clamped-named].
852-
Perform clamping per each row separately before performing intermediate
849+
+ `contribution_bounds_per_row`: A named argument with a
850+
[contribution bound][dp-clamped-named].
851+
Performs clamping for each row separately before performing intermediate
853852
grouping on individual user values.
854853

855854
`NUMERIC` and `BIGNUMERIC` arguments are not allowed.

docs/aggregate-function-calls.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44

55
# Aggregate function calls
66

7-
An aggregate function is a function that summarizes the rows of a group into a
8-
single value. When an aggregate function is used with the `OVER` clause, it
9-
becomes a window function, which computes values over a group of rows and then
10-
returns a single result for each row.
7+
An aggregate function summarizes the rows of a group into a single value. When
8+
an aggregate function is used with the `OVER` clause, it becomes a window
9+
function, which computes values over a group of rows and then returns a single
10+
result for each row.
1111

1212
## Aggregate function call syntax
1313

0 commit comments

Comments
 (0)