Skip to content

Commit 475aa36

Browse files
committed
Superset: Setup tutorials
... pulling more content from the community forum.
1 parent 8946a4b commit 475aa36

File tree

3 files changed

+284
-6
lines changed

3 files changed

+284
-6
lines changed

docs/integrate/superset/index.md

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -74,14 +74,28 @@ crate://<username>:<password>@<clustername>.cratedb.net:4200/?ssl=true
7474

7575
::::{grid}
7676

77+
:::{grid-item-card} Tutorial: Set up Apache Superset with CrateDB
78+
:link: superset-tutorial
79+
:link-type: ref
80+
Learn how to install Apache Superset, and how to connect it with CrateDB.
81+
:::
82+
83+
::::
84+
85+
86+
:::{rubric} Blog
87+
:::
88+
89+
::::{grid}
90+
7791
:::{grid-item-card} Blog: Open‑source data warehousing and visualization
7892
:link: https://cratedb.com/blog/use-cratedb-and-apache-superset-for-open-source-data-warehousing-and-visualization
7993
:link-type: url
8094
Use CrateDB and Apache Superset for open-source data warehousing and visualization.
8195
:::
8296

83-
:::{grid-item-card} Blog: Time‑series visualization
84-
:link: https://preset.io/blog/timeseries-cratedb-superset/
97+
:::{grid-item-card} Blog: Introduction to time-series visualization
98+
:link: https://cratedb.com/blog/introduction-to-time-series-visualization-in-cratedb-and-superset
8599
:link-type: url
86100
Introduction to time‑series visualization in CrateDB and Apache Superset.
87101
:::
@@ -143,8 +157,7 @@ from the time-series dataset.
143157

144158
:::{rubric} Development
145159
:::
146-
- [Set up Apache Superset with CrateDB]
147-
- [Set up an Apache Superset development sandbox with CrateDB]
160+
- {ref}`superset-sandbox`
148161
- [Verify Apache Superset with CrateDB]
149162

150163

@@ -153,6 +166,13 @@ from the time-series dataset.
153166
[CrateDB and Apache Superset]
154167
```
155168

169+
:::{toctree}
170+
:maxdepth: 1
171+
:hidden:
172+
Tutorial <tutorial>
173+
Sandbox <sandbox>
174+
:::
175+
156176

157177
[Apache Superset]: https://superset.apache.org/
158178
[CrateDB and Apache Superset]: https://cratedb.com/integrations/cratedb-and-apache-superset
@@ -162,6 +182,4 @@ from the time-series dataset.
162182
[how to install database drivers in Docker Images]: https://superset.apache.org/docs/configuration/databases#installing-drivers-in-docker-images
163183
[Preset]: https://preset.io/
164184
[Preset Cloud]: https://preset.io/product/
165-
[Set up Apache Superset with CrateDB]: https://community.cratedb.com/t/set-up-apache-superset-with-cratedb/1716
166-
[Set up an Apache Superset development sandbox with CrateDB]: https://community.cratedb.com/t/set-up-an-apache-superset-development-sandbox-with-cratedb/1163
167185
[Verify Apache Superset with CrateDB]: https://github.com/crate/cratedb-examples/tree/main/application/apache-superset

docs/integrate/superset/sandbox.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
(superset-sandbox)=
2+
# Set up an Apache Superset development sandbox with CrateDB
3+
4+
## Introduction
5+
This is a little walkthrough about how to quickly spawn a development sandbox with Apache Superset, in order to work on the CrateDB Python driver with live code reloading.
6+
7+
## Prerequisites
8+
You will need Bash, Docker, Git, and Python to be installed on your workstation. All other prerequisites will be installed into your working tree.
9+
10+
## Setup
11+
12+
### CrateDB
13+
14+
Start CrateDB using Docker.
15+
```console
16+
docker run --rm --publish=4200:4200 --publish=5432:5432 --name=cratedb --env CRATE_HEAP_SIZE=1g crate:latest -Cdiscovery.type=single-node
17+
```
18+
19+
Create an example table and insert a single record.
20+
```console
21+
docker run --interactive --rm --network=host crate:latest crash <<EOF
22+
CREATE TABLE IF NOT EXISTS testdrive (
23+
ts TIMESTAMP,
24+
tstz TIMESTAMPTZ,
25+
val INTEGER CHECK (val >= 0),
26+
str TEXT NOT NULL,
27+
str2 TEXT,
28+
PRIMARY KEY(val, str, str2)
29+
);
30+
INSERT INTO testdrive (ts, tstz, val, str, str2) VALUES (now(), now(), 42, 'foobar', 'bazqux');
31+
EOF
32+
```
33+
34+
If you need more data to explore, follow [how to load 2.6M records from the NYC Yellowcab dataset into CrateDB](https://community.cratedb.com/t/quickly-starting-cratedb-with-2-5m-records-of-the-nyc-yellowcab-dataset/1162) instead.
35+
36+
37+
### Sandbox
38+
39+
#### Install Apache Superset from source
40+
You can copy this whole section verbatim into your terminal.
41+
```console
42+
# Acquire sources.
43+
git clone https://github.com/apache/superset --depth=1
44+
cd superset
45+
46+
# Create Python virtualenv.
47+
python3 -m venv .venv
48+
source .venv/bin/activate
49+
pip install -r requirements/local.txt
50+
51+
# Setup Node.js 16 with NPM 7.
52+
export NODEJS_VERSION=16.15.1
53+
export NPM_VERSION=7
54+
source /dev/stdin <<<"$(curl -s https://raw.githubusercontent.com/cicerops/supernode/main/supernode)"
55+
56+
# Run provisioning steps for Apache Superset.
57+
superset db upgrade
58+
superset fab create-admin --username=admin --password=admin --firstname=admin --lastname=admin [email protected]
59+
superset init
60+
```
61+
62+
#### Link the SQLAlchemy dialect for CrateDB
63+
In order to link the filesystem location of the Python driver into the sandbox environment, install the package in ["editable" mode](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs).
64+
```console
65+
pip install --editable=/path/to/sqlalchemy-cratedb
66+
```
67+
If you don't have the sources yet, you can obtain them from the Git repository using `git clone https://github.com/crate/sqlalchemy-cratedb`.
68+
69+
#### Start backend
70+
By using the `--reload` option, changes on the Python code will be automatically picked up.
71+
```console
72+
# Invoke development web server with code reload machinery.
73+
FLASK_ENV=development superset run -p 8088 --with-threads --reload --debugger
74+
```
75+
76+
#### Build and start frontend
77+
In another console, but also within the same virtualenv, you will need to build the frontend and run its development web server.
78+
```console
79+
source .venv/bin/activate
80+
cd superset-frontend
81+
npm install
82+
npm run dev-server
83+
```
84+
85+
86+
## Usage
87+
88+
### User interface
89+
You should be ready to go. Now,
90+
91+
- navigate to `http://localhost:4200/#!/console` for exploring the CrateDB Admin UI.
92+
- navigate to `http://localhost:9000/superset/sqllab/` for exploring your data in Apache Superset, log in with admin/admin.
93+
94+
### Create a database connection
95+
For creating a database connection to CrateDB in Apache Superset, you can either use the user interface, or the HTTP API. Those steps will create the connection using the HTTP API, saving a few clicks and keystrokes.
96+
```console
97+
# Authenticate and acquire a JWT token.
98+
AUTH_TOKEN=$(http --session=superset http://localhost:8088/api/v1/security/login username=admin password=admin provider=db | jq -r .access_token)
99+
100+
# Acquire a CSRF token.
101+
CSRF_TOKEN=$(http --session=superset http://localhost:8088/api/v1/security/csrf_token/ Authorization:"Bearer ${AUTH_TOKEN}" | jq -r .result)
102+
103+
# Create a data source item / database connection.
104+
http --session=superset http://localhost:8088/api/v1/database/ database_name="CrateDB Testdrive" engine=crate sqlalchemy_uri=crate://crate@localhost:4200 Authorization:"Bearer ${AUTH_TOKEN}" X-CSRFToken:"${CSRF_TOKEN}"
105+
```
106+
107+
### Hacking
108+
Now, you can just go ahead and edit code on the CrateDB Python driver located on your workstation. The application will notice your changes and pick them up by reloading the daemon environment. Please make sure to watch the output on the first console, where `superset run` has been invoked, for any anomalies or stacktraces.
109+
110+
## Clean up
111+
1. Both development web servers of Apache Superset (backend and frontend) can be terminated by hitting `CTRL+C`.
112+
2. The CrateDB database instance running in a container can be terminated by invoking `docker rm cratedb --force`.
113+
3. The metadata database of Apache Superset, where user accounts and database connections are stored, can be deleted by invoking `rm ~/.superset/superset.db`.
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
(superset-tutorial)=
2+
# Set up Apache Superset with CrateDB
3+
4+
## Introduction
5+
This walkthrough will guide you through the process to quickly set up a Python environment with [Apache Superset](https://superset.apache.org/), load data into [CrateDB](https://cratedb.com/product), and integrate both with each other.
6+
7+
It has been derived from a [corresponding recipe](https://github.com/crate/cratedb-examples/tree/main/application/apache-superset) we use on our CI systems to verify connectivity between Apache Superset and CrateDB.
8+
9+
## Prerequisites
10+
You will need Bash, Docker, and Python to be installed on your workstation. All other prerequisites will be installed into your working tree.
11+
12+
## Install
13+
14+
Set up a Python environment, and install and configure Apache Superset. You can locate the installation within an arbitrary folder on your workstation, for example `~/dev/cratedb-superset`.
15+
```console
16+
# Create and activate Python virtualenv.
17+
python3 -m venv .venv
18+
source .venv/bin/activate
19+
20+
# Install Apache Superset, CrateDB driver, and HTTPie http client.
21+
pip install apache-superset sqlalchemy-cratedb httpie
22+
```
23+
24+
You need to create a `superset_config.py` file, to configure an individual `SECRET_KEY` for your application.
25+
```console
26+
echo "SECRET_KEY = '$(docker run --rm alpine/openssl rand -base64 42)'" > superset_config.py
27+
```
28+
29+
This sequence of commands initializes the metadata database at `~/.superset/superset.db`, and provisions a superuser account.
30+
```console
31+
# Configure and initialize Apache Superset.
32+
export FLASK_APP=superset
33+
export SUPERSET_CONFIG_PATH=superset_config.py
34+
superset db upgrade
35+
superset fab create-admin --username=admin --password=admin --firstname=admin --lastname=admin [email protected]
36+
superset init
37+
```
38+
39+
## Start services
40+
41+
Start CrateDB using Docker.
42+
```console
43+
docker run --interactive --rm --pull=always \
44+
--publish=4200:4200 --publish=5432:5432 \
45+
--name=cratedb \
46+
--env CRATE_HEAP_SIZE=2g \
47+
crate:latest -Cdiscovery.type=single-node
48+
```
49+
50+
Run Superset server.
51+
```console
52+
superset run --port=9000 --with-threads
53+
```
54+
55+
56+
## Load data
57+
Import six million records worth of data from the venerable NYC Yellowcab taxi ride dataset. Depending on the speed of the internet connection between the location of your database instance, and AWS S3, where data is loaded from, it may take about one minute of time.
58+
59+
This is a one-shot command using the [crash](https://cratedb.com/docs/crate/crash/) database shell running in a Docker container, which includes a relevant SQL DDL statement to create the database table schema, and a `COPY FROM` statement to import data from a compressed JSON file located on AWS S3.
60+
```console
61+
docker run --interactive --rm --network=host crate:latest crash <<EOF
62+
DROP TABLE IF EXISTS yellowcab;
63+
CREATE TABLE yellowcab (
64+
"pickup" geo_point,
65+
"dropoff" geo_point,
66+
"congestion_surcharge" REAL,
67+
"dolocationid" INTEGER,
68+
"extra" REAL,
69+
"fare_amount" REAL,
70+
"improvement_surcharge" REAL,
71+
"mta_tax" REAL,
72+
"passenger_count" INTEGER,
73+
"payment_type" INTEGER,
74+
"pickup_datetime" TIMESTAMP WITH TIME ZONE,
75+
"pulocationid" INTEGER,
76+
"ratecodeid" INTEGER,
77+
"store_and_fwd_flag" TEXT,
78+
"tip_amount" REAL,
79+
"tolls_amount" REAL,
80+
"total_amount" REAL,
81+
"trip_distance" REAL,
82+
"vendorid" INTEGER,
83+
"month" AS date_format('%Y-%c', pickup_datetime)
84+
) CLUSTERED INTO 12 SHARDS PARTITIONED BY (month);
85+
86+
COPY yellowcab
87+
FROM 'https://s3.amazonaws.com/crate.sampledata/nyc.yellowcab/yc.2019.07.gz'
88+
WITH ("compression"='gzip', "format"='json')
89+
RETURN SUMMARY;
90+
91+
REFRESH TABLE yellowcab;
92+
SELECT COUNT(*) FROM yellowcab;
93+
94+
EOF
95+
```
96+
97+
98+
## Usage
99+
100+
You can operate CrateDB and Superset interactively, using the integrated web-based user interfaces. Alternatively, you can use their HTTP APIs.
101+
102+
### Web user interface
103+
You should be ready to go. Now, you can explore the loaded data through user interfaces of CrateDB and Apache Superset.
104+
105+
- navigate to `http://localhost:4200/#!/console` for exploring the CrateDB Admin UI.
106+
- navigate to `http://localhost:9000/sqllab/` for exploring your data in Apache Superset, log in with admin/admin.
107+
108+
In order to work with data in Apache Superset, before being able to create dashboards, you will need to establish connectivity between Apache Superset and CrateDB. To do that, you will [connect a database instance](https://superset.apache.org/docs/databases/db-connection-ui/) and [register a database table](https://superset.apache.org/docs/creating-charts-dashboards/creating-your-first-dashboard/#registering-a-new-table) as a dataset.
109+
110+
111+
### HTTP API
112+
Using [Apache Superset's HTTP API](https://superset.apache.org/docs/api), you can automate the provisioning process. The commands outlined below are using [HTTPie](https://httpie.io/docs/cli) for that purpose, saving a few clicks and keystrokes.
113+
114+
**Connect a database instance**
115+
```console
116+
# Authenticate and acquire a JWT token.
117+
AUTH_TOKEN=$(http --session=superset http://localhost:9000/api/v1/security/login username=admin password=admin provider=db | jq -r .access_token)
118+
119+
# Acquire a CSRF token.
120+
CSRF_TOKEN=$(http --session=superset http://localhost:9000/api/v1/security/csrf_token/ Authorization:"Bearer ${AUTH_TOKEN}" | jq -r .result)
121+
122+
# Create a data source item / database connection.
123+
http --session=superset http://localhost:9000/api/v1/database/ \
124+
database_name="CrateDB Testdrive" engine=crate \
125+
sqlalchemy_uri=crate://crate@localhost:4200 \
126+
Authorization:"Bearer ${AUTH_TOKEN}" \
127+
X-CSRFToken:"${CSRF_TOKEN}"
128+
```
129+
130+
**Register a database table**
131+
```console
132+
# Register database table as dataset.
133+
http --session=superset http://localhost:9000/api/v1/dataset/ \
134+
Authorization:"Bearer ${AUTH_TOKEN}" \
135+
X-CSRFToken:"${CSRF_TOKEN}" \
136+
database=1 schema=doc table_name=yellowcab
137+
```
138+
139+
Now, you can navigate to the Superset Web UI for exploring your newly created dataset, in order to create a dashboard.
140+
141+
- `http://localhost:9000/explore/?datasource_type=table&datasource_id=1`
142+
143+
144+
## Clean up
145+
1. The development web server of Apache Superset can be terminated by hitting `CTRL+C`.
146+
2. The CrateDB database instance running in a container can be terminated by either hitting `CTRL+C`, or by invoking `docker rm cratedb --force`.
147+
3. The metadata database of Apache Superset, where user accounts and database connections are stored, can be deleted by invoking `rm ~/.superset/superset.db`.

0 commit comments

Comments
 (0)