Skip to content

Commit 5f1e30f

Browse files
authored
Update ADR 29 on generic database connections to current state (#840)
* Update ADR 29 on generic database connections to current state * fixup link
1 parent f94808f commit 5f1e30f

1 file changed

Lines changed: 27 additions & 171 deletions

File tree

modules/contributor/pages/adr/ADR029-database-connection.adoc

Lines changed: 27 additions & 171 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ Technical Story: https://github.com/stackabletech/issues/issues/238
1717

1818
NOTE: We might want to incorporate changes to address https://github.com/stackabletech/issues/issues/681, maybe as V2?
1919

20+
NOTE: Parts of this document might be out of date. The source of truth is in https://github.com/stackabletech/operator-rs/tree/main/crates/stackable-operator/src/database_connections[the finished implementation in operator-rs]
21+
2022
== Context and Problem Statement
2123

2224
Many products supported by the Stackable Data Platform require databases to store metadata. Currently there is no uniform, consistent way to define database connections. In addition, some Stackable operators define database credentials to be provided inline and in plain text in the cluster definitions.
@@ -179,16 +181,14 @@ NOTE: This proposal was rejected because for the same reason as the first propos
179181

180182
=== (accepted) Product supported and a generic DB specifications.
181183

182-
It seems that an unique, platform wide mechanism to describe database connections that also fulfills all acceptance criteria is not feasable. Database drivers and product configurations are too diverse and cannot be forced into a type safe specification.
184+
It seems that an unique, platform wide mechanism to describe database connections that also fulfills all acceptance criteria is not feasible. Database drivers and product configurations are too diverse and cannot be forced into a type safe specification.
183185

184186
Thus the single, global connection manifest needs to split into two different categories, each covering a subset of the acceptance criteria:
185187

186188
1. A database specific mechanism. This allows to catch misconfigurations early, it promotes good documentation and uniformity inside the platform.
187189
2. An operator specific mechanism. This is a wildcard that can be used to configure database connections that are not officially supported by the products but that can still be partially validated early.
188190

189-
The first mechanism requires the operator framwork to provide predefined structures and supporting functions for widely available database systems such as: PostgreSQL, MySQL, MariaDB, Oracle, SQLite, Derby, Redis and so on. This doesn't mean that all products can be configured with all DB implementations. The product definitions will only allow the subset that is officially supported by the products.
190-
191-
The second mechanism is operator/product specific and it contains mostly a pass-through list of relevant **product properties**. There is at least one exception, and that is the handling of user credentials which still need to be provisioned in a secure way (as long as the product supports it).
191+
The first mechanism requires the operator framework to provide predefined structures and supporting functions for widely available database systems such as: PostgreSQL, MySQL, MariaDB, Oracle, SQLite, Derby, Redis and so on. This doesn't mean that all products can be configured with all DB implementations. The product definitions will only allow the subset that is officially supported by the products. For that, every product operator defines a complex enum of exactly the databases it supports.
192192

193193
==== Database specific manifests
194194

@@ -198,189 +198,45 @@ Support for the following database systems is planned. Additional systems may be
198198

199199
[source,yaml]
200200
postgresql:
201-
host: postgresql # mandatory
201+
host: my-airflow.default.svc.cluster.local # mandatory
202+
database: my_database # mandatory
202203
port: 5432 # optional, default is 5432
203-
instance: my-database # mandatory
204-
credentials: my-application-credentials # mandatory. key username and password
205-
parameters: {} # optional
206-
tls: secure-connection-class-name # optional
207-
auth: authentication-class-name # optional. authentication class to use.
204+
credentialsSecretName: airflow-postgresql-credentials # mandatory
205+
parameters:
206+
createDatabaseIfNotExist: true
207+
foo: bar
208208

209209
PostgreSQL supports multiple authentication mechanisms as described https://www.postgresql.org/docs/9.1/auth-pg-hba-conf.html[here].
210210

211211
2.) MySQL
212212

213213
[source,yaml]
214214
mysql:
215-
host: mysql # mandatory
215+
host: my-airflow.default.svc.cluster.local
216+
database: my_database
216217
port: 3306 # optional, default is 3306
217-
instance: my-database # mandatory
218-
credentials: my-application-credentials # mandatory. key username and password
219-
parameters: {} # optional
220-
tls: secure-connection-class-name # optional
221-
auth: authentication-class-name # optional. authentication class to use.
218+
credentialsSecretName: airflow-mysql-credentials # mandatory
219+
parameters:
220+
createDatabaseIfNotExist: true
221+
foo: bar
222222

223223
MySQL supports multiple authentication mechanisms as described https://dev.mysql.com/doc/refman/8.0/en/socket-pluggable-authentication.html[here].
224224

225-
3.) Derby
226-
227-
Derby is used often as an embedded database for testing and prototyping ideas and implementations. It's not recommended for production use-cases.
228-
229-
[source,yaml]
230-
derby:
231-
location: /tmp/my-database/ # optional, defaults to /tmp/derby-<some-suffix>/derby.db
232-
233-
234-
==== Product specific manifests
235-
236-
1.) Apache Druid
237-
238-
Apache Druid clusters can be configured any of the DB specific manifests from above. In addition, a DB generic configuration can pe specified:
239-
240-
The following example shows how to configure the metadata storage for a Druid cluster using either one of the supported back-ends or a generic system. In a production setting only the PostgreSQL or MySQL manifests should be used.
241-
242-
[source,yaml]
243-
generic:
244-
driver: postgresql # mandatory
245-
uri: jdbc:postgresql://<host>/druid?foo;bar # mandatory
246-
credentialsSecret: my-secret # mandatory. key username + password
247-
248-
The above is translated into the following Java properties:
249-
250-
[source]
251-
druid.metadata.storage.type=postgresql
252-
druid.metadata.storage.connector.connectURI=jdbc:postgresql://<host>/druid?foo;bar
253-
druid.metadata.storage.connector.user=druid
254-
druid.metadata.storage.connector.password=druid
255-
256-
2.) Apache Superset
225+
3.) Redis
257226

258-
NOTE: Superset supports a very wide range of database systems as described https://superset.apache.org/user-docs/databases/#installing-database-drivers[here]. Not all of them are suitable for metadata storage.
259-
260-
Connections to Apache Hive, Apache Druid and Trino clusters deployed as part of the SDP platform can be automated by using discovery configuration maps. In this case, the only attribute to configure is the name of the discovery config map of the appropriate system.
261-
262-
In addition, a generic way to configure a database connection looks as follows:
227+
We need Redis e.g. for celery brokers or result databases.
263228

264229
[source,yaml]
265-
generic:
266-
secret: superset-metadata-secret # mandatory. A secret naming with one entry called "key". Used to encrypt metadata and session cookies.
267-
template: postgresql://{{SUPERSET_DB_USER}}:{{SUPERSET_DB_PASS}}@postgres.default.svc.local/superset&param1=value1&param2=value2 # mandatory
268-
templateSecret: my-secret # optional
269-
SUPERSET_DB_USER: ...
270-
SUPERSET_DB_PASS: ...
271-
272-
The template attribute allows to specify the full connection string as required by Superset (and the underlying SQLAlchemy framework). Variables in the template are specified within `{{` and `}}` markers and their contents is replaced with the corresponding field in the `templateSecret` object.
273-
274-
3.) Apache Hive
275-
276-
For production environments, we recommend PostgreSQL back-end and for development, Derby.
277-
278-
A generic connection can be configured as follows:
279-
280-
[source,yaml]
281-
generic:
282-
driver: org.postgresql.Driver # mandatory
283-
uri: jdbc:postgresql://postgresql.us-west-2.rds.amazonaws.com:5432/mypgdb # mandatory
284-
credentialsSecret: my-secret # mandatory (?). key username + password
285-
286-
4.) Apache Airflow
230+
redis:
231+
host: my-redis # mandatory
232+
port: 6379 # optional, default is 6379
233+
databaseId: 13 # optional, defaults to 0
234+
credentialsSecretName: redis-credentials # mandatory
287235

288-
A generic Airflow database connection can be configured in a similar fashion with Superset:
236+
4.) Derby
289237

290-
[source,yaml]
291-
generic:
292-
template: postgresql://{{AIRFLOW_DB_USER}}:{{AIRFLOW_DB_PASS}}@postgres.default.svc.local/superset&param1=value1&param2=value2 # mandatory
293-
templateSecret: my-secret # optional
294-
AIRFLOW_DB_USER: ...
295-
AIRFLOW_DB_PASS: ...
296-
297-
The resulting CRDs look like:
238+
Derby is used often as an embedded database for testing and prototyping ideas and implementations. It's not recommended for production use-cases.
298239

299240
[source,yaml]
300-
----
301-
kind: DruidCluster
302-
spec:
303-
clusterConfig:
304-
metadataDatabase:
305-
postgresql:
306-
host: postgresql # mandatory
307-
port: 5432 # defaults to some port number - depending on whether tls is enabled
308-
database: druid # mandatory
309-
credentials: postgresql-credentials # mandatory. key username and password
310-
parameters: {} # optional BTreeMap<String, String>
311-
mysql:
312-
host: mysql # mandatory
313-
port: 3306 # defaults to some port number - depending on whether tls is enabled
314-
database: druid # mandatory
315-
credentials: mysql-credentials # mandatory. key username and password
316-
parameters: {} # optional BTreeMap<String, String>
317-
derby:
318-
location: /tmp/derby/ # optional, defaults to /tmp/derby-<some-suffix>/derby.db
319-
generic:
320-
driver: postgresql # mandatory
321-
uri: jdbc:postgresql://<host>/druid?foo;bar # mandatory
322-
credentialsSecret: my-secret # mandatory. key username + password
323-
# druid.metadata.storage.type=postgresql
324-
# druid.metadata.storage.connector.connectURI=jdbc:postgresql://<host>/druid
325-
# druid.metadata.storage.connector.user=druid
326-
# druid.metadata.storage.connector.password=druid
327-
---
328-
kind: SupersetCluster
329-
spec:
330-
clusterConfig:
331-
metadataDatabase:
332-
postgresql:
333-
host: postgresql # mandatory
334-
port: 5432 # defaults to some port number - depending on whether tls is enabled
335-
database: superset # mandatory
336-
credentials: postgresql-credentials # mandatory. key username and password
337-
parameters: {} # optional BTreeMap<String, String>
338-
mysql:
339-
host: mysql # mandatory
340-
port: 3306 # defaults to some port number - depending on whether tls is enabled
341-
database: superset # mandatory
342-
credentials: mysql-credentials # mandatory. key username and password
343-
parameters: {} # optional BTreeMap<String, String>
344-
sqlite:
345-
location: /tmp/sqlite/ # optional, defaults to /tmp/sqlite-<some-suffix>/derby.db
346-
generic:
347-
uriSecret: my-secret # mandatory. key uri
348-
# postgresql://{username}:{password}@{host}:{port}/{database}?sslmode=require
349-
kind: HiveCluster
350-
spec:
351-
clusterConfig:
352-
metadataDatabase:
353-
postgresql:
354-
host: postgresql # mandatory
355-
port: 5432 # defaults to some port number - depending on whether tls is enabled
356-
database: druid # mandatory
357-
credentials: postgresql-credentials # mandatory. key username and password
358-
parameters: {} # optional BTreeMap<String, String>
359-
derby:
360-
location: /tmp/derby/ # optional, defaults to /tmp/derby-<some-suffix>/derby.db
361-
# Missing: MS-SQL server, Oracle
362-
generic:
363-
driver: org.postgresql.Driver # mandatory
364-
uri: jdbc:postgresql://postgresql.us-west-2.rds.amazonaws.com:5432/mypgdb # mandatory
365-
credentialsSecret: my-secret # mandatory (?). key username + password
366-
# <property>
367-
# <name>javax.jdo.option.ConnectionURL</name>
368-
# <value>jdbc:postgresql://postgresql.us-west-2.rds.amazonaws.com:5432/mypgdb</value>
369-
# <description>PostgreSQL JDBC driver connection URL</description>
370-
# </property>
371-
# <property>
372-
# <name>javax.jdo.option.ConnectionDriverName</name>
373-
# <value>org.postgresql.Driver</value>
374-
# <description>PostgreSQL metastore driver class name</description>
375-
# </property>
376-
# <property>
377-
# <name>javax.jdo.option.ConnectionUserName</name>
378-
# <value>database_username</value>
379-
# <description>the username for the DB instance</description>
380-
# </property>
381-
# <property>
382-
# <name>javax.jdo.option.ConnectionPassword</name>
383-
# <value>database_password</value>
384-
# <description>the password for the DB instance</description>
385-
# </property>
386-
----
241+
derby:
242+
location: /tmp/my-database/ # optional, defaults to /tmp/derby/{unique_database_name}/derby.db

0 commit comments

Comments
 (0)