Skip to content

Commit 5607858

Browse files
author
Bhooshan Mogal
authored
Merge pull request #73 from melburnerodrigues/postgres-plugin
Added CloudSQL PostgreSQL source, sink and action plugins.
2 parents f30c56b + b9d4628 commit 5607858

18 files changed

+1703
-0
lines changed
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# PostgreSQL Action
2+
3+
4+
Description
5+
-----------
6+
Action that runs a PostgreSQL command on a CloudSQL PostgreSQL instance.
7+
8+
9+
Use Case
10+
--------
11+
The action can be used whenever you want to run a PostgreSQL command before or after a data pipeline.
12+
For example, you may want to run a SQL update command on a database before the pipeline source pulls data from tables.
13+
14+
15+
Properties
16+
----------
17+
**Driver Name:** Name of the JDBC driver to use.
18+
19+
**Database Command:** Database command to execute.
20+
21+
**Database:** PostgreSQL database name.
22+
23+
**Connection Name:** The CloudSQL instance to connect to in the format <PROJECT_ID>:\<REGION>:<INSTANCE_NAME>.
24+
Can be found in the instance overview page.
25+
26+
**CloudSQL Instance Type:** Whether the CloudSQL instance to connect to is private or public. Defaults to 'Public'.
27+
28+
**Username:** User identity for connecting to the specified database.
29+
30+
**Password:** Password to use to connect to the specified database.
31+
32+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
33+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
34+
35+
**Connection Timeout** The timeout value used for socket connect operations. If connecting to the server takes longer
36+
than this value, the connection is broken.The timeout is specified in seconds and a value of zero means that it is
37+
disabled.
38+
39+
40+
Examples
41+
--------
42+
**Connecting to a public CloudSQL PostgreSQL instance**
43+
44+
Suppose you want to execute a query against a CloudSQL PostgreSQL database named "prod", as "postgres" user with "postgres"
45+
password (Get the latest version of the CloudSQL socket factory jar with driver and dependencies
46+
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases)), then configure plugin with:
47+
48+
49+
```
50+
Driver Name: "cloudsql-postgresql"
51+
Database Command: "UPDATE table_name SET price = 20 WHERE ID = 6"
52+
Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME]
53+
CloudSQL Instance Type: "Public"
54+
Database: "prod"
55+
Username: "postgres"
56+
Password: "postgres"
57+
```
58+
59+
60+
**Connecting to a private CloudSQL PostgreSQL instance**
61+
62+
If you want to connect to a private CloudSQL PostgreSQL instance, create a Compute Engine VM that runs the CloudSQL Proxy
63+
docker image using the following command
64+
65+
```
66+
# Set the environment variables
67+
export PROJECT=[project_id]
68+
export REGION=[vm-region]
69+
export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit
70+
1 --uri --project=${PROJECT}| sed 's/.*\///'`
71+
export SUBNET=[vpc-subnet-name]
72+
export NAME=[gce-vm-name]
73+
export POSTGRESQL_CONN=[postgresql-instance-connection-name]
74+
75+
# Create a Compute Engine VM
76+
gcloud beta compute --project=${PROJECT_ID} instances create ${INSTANCE_NAME}
77+
--zone=${ZONE} --machine-type=g1-small --subnet=${SUBNE} --no-address
78+
--metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306
79+
gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy
80+
-instances=${POSTGRESQL_CONNECTION_NAME}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE
81+
--scopes=https://www.googleapis.com/auth/cloud-platform
82+
--image=cos-69-10895-385-0 --image-project=cos-cloud
83+
```
84+
85+
Optionally, you can promote the internal IP address of the VM running the Proxy image to a static IP using
86+
87+
```
88+
# Get the VM internal IP
89+
export IP=`gcloud compute instances describe ${NAME} --zone ${ZONE} |
90+
grep "networkIP" | awk '{print $2}'`
91+
92+
# Promote the VM internal IP to static IP
93+
gcloud compute addresses create postgresql-proxy --addresses ${IP} --region
94+
${REGION} --subnet ${SUBNET}
95+
```
96+
97+
Get the latest version of the CloudSQL socket factory jar with driver and dependencies from
98+
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases), then configure plugin with:
99+
100+
```
101+
Driver Name: "cloudsql-postgresql"
102+
Database Command: "UPDATE table_name SET price = 20 WHERE ID = 6"
103+
Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME]
104+
CloudSQL Instance Type: "Private"
105+
Database: "prod"
106+
Username: "postgres"
107+
Password: "postgres"
108+
```
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# CloudSQL PostgreSQL Batch Sink
2+
3+
4+
Description
5+
-----------
6+
Writes records to a CloudSQL PostgreSQL table. Each record will be written to a row in the table.
7+
8+
9+
Use Case
10+
--------
11+
This sink is used whenever you need to write to a CloudSQL PostgreSQL table.
12+
Suppose you periodically build a recommendation model for products on your online store.
13+
The model is stored in a GCS bucket and you want to export the contents
14+
of the bucket to a CloudSQL PostgreSQL table where it can be served to your users.
15+
16+
Column names would be auto detected from input schema.
17+
18+
Properties
19+
----------
20+
**Reference Name:** Name used to uniquely identify this sink for lineage, annotating metadata, etc.
21+
22+
**Driver Name:** Name of the JDBC driver to use.
23+
24+
**Database:** CloudSQL PostgreSQL database name.
25+
26+
**Connection Name:** The CloudSQL instance to connect to in the format <PROJECT_ID>:\<REGION>:<INSTANCE_NAME>.
27+
Can be found in the instance overview page.
28+
29+
**CloudSQL Instance Type:** Whether the CloudSQL instance to connect to is private or public. Defaults to 'Public'.
30+
31+
**Table Name:** Name of the table to export to.
32+
33+
**Username:** User identity for connecting to the specified database.
34+
35+
**Password:** Password to use to connect to the specified database.
36+
37+
**Transaction Isolation Level:** Transaction isolation level for queries run by this sink.
38+
39+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
40+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
41+
42+
**Connection Timeout** The timeout value used for socket connect operations. If connecting to the server takes longer
43+
than this value, the connection is broken.The timeout is specified in seconds and a value of zero means that it is
44+
disabled.
45+
46+
47+
Examples
48+
--------
49+
**Connecting to a public CloudSQL PostgreSQL instance**
50+
51+
Suppose you want to write output records to "users" table of CloudSQL PostgreSQL database named "prod", as "postgres"
52+
user with "postgres" password (Get the latest version of the CloudSQL socket factory jar with driver and dependencies
53+
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases)), then configure plugin with:
54+
55+
56+
```
57+
Reference Name: "sink1"
58+
Driver Name: "cloudsql-postgresql"
59+
Database: "prod"
60+
Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME]
61+
CloudSQL Instance Type: "Public"
62+
Table Name: "users"
63+
Username: "postgres"
64+
Password: "postgres"
65+
```
66+
67+
68+
**Connecting to a private CloudSQL PostgreSQL instance**
69+
70+
If you want to connect to a private CloudSQL PostgreSQL instance, create a Compute Engine VM that runs the CloudSQL Proxy
71+
docker image using the following command
72+
73+
```
74+
# Set the environment variables
75+
export PROJECT=[project_id]
76+
export REGION=[vm-region]
77+
export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit
78+
1 --uri --project=${PROJECT}| sed 's/.*\///'`
79+
export SUBNET=[vpc-subnet-name]
80+
export NAME=[gce-vm-name]
81+
export POSTGRESQL_CONN=[postgresql-instance-connection-name]
82+
83+
# Create a Compute Engine VM
84+
gcloud beta compute --project=${PROJECT_ID} instances create ${INSTANCE_NAME}
85+
--zone=${ZONE} --machine-type=g1-small --subnet=${SUBNE} --no-address
86+
--metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306
87+
gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy
88+
-instances=${POSTGRESQL_CONNECTION_NAME}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE
89+
--scopes=https://www.googleapis.com/auth/cloud-platform
90+
--image=cos-69-10895-385-0 --image-project=cos-cloud
91+
```
92+
93+
Optionally, you can promote the internal IP address of the VM running the Proxy image to a static IP using
94+
95+
```
96+
# Get the VM internal IP
97+
export IP=`gcloud compute instances describe ${NAME} --zone ${ZONE} |
98+
grep "networkIP" | awk '{print $2}'`
99+
100+
# Promote the VM internal IP to static IP
101+
gcloud compute addresses create postgresql-proxy --addresses ${IP} --region
102+
${REGION} --subnet ${SUBNET}
103+
```
104+
105+
Get the latest version of the CloudSQL socket factory jar with driver and dependencies from
106+
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases), then configure plugin with:
107+
108+
```
109+
Reference Name: "sink1"
110+
Driver Name: "cloudsql-postgresql"
111+
Database: "prod"
112+
Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME]
113+
CloudSQL Instance Type: "Private"
114+
Table Name: "users"
115+
Username: "postgres"
116+
Password: "postgres"
117+
```
118+
119+
120+
Data Types Mapping
121+
------------------
122+
All PostgreSQL specific data types mapped to string and can have multiple input formats and one 'canonical' output form.
123+
Please, refer to PostgreSQL data types documentation to figure out proper formats.
124+
125+
| PostgreSQL Data Type | CDAP Schema Data Type | Comment |
126+
|-----------------------------------------------------|-----------------------|----------------------------------------------|
127+
| bigint | long | |
128+
| bit(n) | string | string with '0' and '1' chars exact n length |
129+
| bit varying(n) | string | string with '0' and '1' chars max n length |
130+
| boolean | boolean | |
131+
| bytea | bytes | |
132+
| character | string | |
133+
| character varying | string | |
134+
| double precision | double | |
135+
| integer | int | |
136+
| numeric(precision, scale)/decimal(precision, scale) | decimal | |
137+
| real | float | |
138+
| smallint | int | |
139+
| text | string | |
140+
| date | date | |
141+
| time [ (p) ] [ without time zone ] | time | |
142+
| time [ (p) ] with time zone | string | |
143+
| timestamp [ (p) ] [ without time zone ] | timestamp | |
144+
| timestamp [ (p) ] with time zone | timestamp | stored in UTC format in database |
145+
| xml | string | |
146+
| tsquery | string | |
147+
| tsvector | string | |
148+
| uuid | string | |
149+
| box | string | |
150+
| cidr | string | |
151+
| circle | string | |
152+
| inet | string | |
153+
| interval | string | |
154+
| json | string | |
155+
| jsonb | string | |
156+
| line | string | |
157+
| lseg | string | |
158+
| macaddr | string | |
159+
| macaddr8 | string | |
160+
| money | string | |
161+
| path | string | |
162+
| point | string | |
163+
| polygon | string | |

0 commit comments

Comments
 (0)