|
| 1 | +# CloudSQL PostgreSQL Batch Sink |
| 2 | + |
| 3 | + |
| 4 | +Description |
| 5 | +----------- |
| 6 | +Writes records to a CloudSQL PostgreSQL table. Each record will be written to a row in the table. |
| 7 | + |
| 8 | + |
| 9 | +Use Case |
| 10 | +-------- |
| 11 | +This sink is used whenever you need to write to a CloudSQL PostgreSQL table. |
| 12 | +Suppose you periodically build a recommendation model for products on your online store. |
| 13 | +The model is stored in a GCS bucket and you want to export the contents |
| 14 | +of the bucket to a CloudSQL PostgreSQL table where it can be served to your users. |
| 15 | + |
| 16 | +Column names would be auto detected from input schema. |
| 17 | + |
| 18 | +Properties |
| 19 | +---------- |
| 20 | +**Reference Name:** Name used to uniquely identify this sink for lineage, annotating metadata, etc. |
| 21 | + |
| 22 | +**Driver Name:** Name of the JDBC driver to use. |
| 23 | + |
| 24 | +**Database:** CloudSQL PostgreSQL database name. |
| 25 | + |
| 26 | +**Connection Name:** The CloudSQL instance to connect to in the format <PROJECT_ID>:\<REGION>:<INSTANCE_NAME>. |
| 27 | +Can be found in the instance overview page. |
| 28 | + |
| 29 | +**CloudSQL Instance Type:** Whether the CloudSQL instance to connect to is private or public. Defaults to 'Public'. |
| 30 | + |
| 31 | +**Table Name:** Name of the table to export to. |
| 32 | + |
| 33 | +**Username:** User identity for connecting to the specified database. |
| 34 | + |
| 35 | +**Password:** Password to use to connect to the specified database. |
| 36 | + |
| 37 | +**Transaction Isolation Level:** Transaction isolation level for queries run by this sink. |
| 38 | + |
| 39 | +**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments |
| 40 | +will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations. |
| 41 | + |
| 42 | +**Connection Timeout** The timeout value used for socket connect operations. If connecting to the server takes longer |
| 43 | +than this value, the connection is broken.The timeout is specified in seconds and a value of zero means that it is |
| 44 | +disabled. |
| 45 | + |
| 46 | + |
| 47 | +Examples |
| 48 | +-------- |
| 49 | +**Connecting to a public CloudSQL PostgreSQL instance** |
| 50 | + |
| 51 | +Suppose you want to write output records to "users" table of CloudSQL PostgreSQL database named "prod", as "postgres" |
| 52 | +user with "postgres" password (Get the latest version of the CloudSQL socket factory jar with driver and dependencies |
| 53 | +[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases)), then configure plugin with: |
| 54 | + |
| 55 | + |
| 56 | +``` |
| 57 | +Reference Name: "sink1" |
| 58 | +Driver Name: "cloudsql-postgresql" |
| 59 | +Database: "prod" |
| 60 | +Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME] |
| 61 | +CloudSQL Instance Type: "Public" |
| 62 | +Table Name: "users" |
| 63 | +Username: "postgres" |
| 64 | +Password: "postgres" |
| 65 | +``` |
| 66 | + |
| 67 | + |
| 68 | +**Connecting to a private CloudSQL PostgreSQL instance** |
| 69 | + |
| 70 | +If you want to connect to a private CloudSQL PostgreSQL instance, create a Compute Engine VM that runs the CloudSQL Proxy |
| 71 | +docker image using the following command |
| 72 | + |
| 73 | +``` |
| 74 | +# Set the environment variables |
| 75 | +export PROJECT=[project_id] |
| 76 | +export REGION=[vm-region] |
| 77 | +export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit |
| 78 | +1 --uri --project=${PROJECT}| sed 's/.*\///'` |
| 79 | +export SUBNET=[vpc-subnet-name] |
| 80 | +export NAME=[gce-vm-name] |
| 81 | +export POSTGRESQL_CONN=[postgresql-instance-connection-name] |
| 82 | +
|
| 83 | +# Create a Compute Engine VM |
| 84 | +gcloud beta compute --project=${PROJECT_ID} instances create ${INSTANCE_NAME} |
| 85 | +--zone=${ZONE} --machine-type=g1-small --subnet=${SUBNE} --no-address |
| 86 | +--metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306 |
| 87 | +gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy |
| 88 | +-instances=${POSTGRESQL_CONNECTION_NAME}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE |
| 89 | +--scopes=https://www.googleapis.com/auth/cloud-platform |
| 90 | +--image=cos-69-10895-385-0 --image-project=cos-cloud |
| 91 | +``` |
| 92 | + |
| 93 | +Optionally, you can promote the internal IP address of the VM running the Proxy image to a static IP using |
| 94 | + |
| 95 | +``` |
| 96 | +# Get the VM internal IP |
| 97 | +export IP=`gcloud compute instances describe ${NAME} --zone ${ZONE} | |
| 98 | +grep "networkIP" | awk '{print $2}'` |
| 99 | +
|
| 100 | +# Promote the VM internal IP to static IP |
| 101 | +gcloud compute addresses create postgresql-proxy --addresses ${IP} --region |
| 102 | +${REGION} --subnet ${SUBNET} |
| 103 | +``` |
| 104 | + |
| 105 | +Get the latest version of the CloudSQL socket factory jar with driver and dependencies from |
| 106 | +[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases), then configure plugin with: |
| 107 | + |
| 108 | +``` |
| 109 | +Reference Name: "sink1" |
| 110 | +Driver Name: "cloudsql-postgresql" |
| 111 | +Database: "prod" |
| 112 | +Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME] |
| 113 | +CloudSQL Instance Type: "Private" |
| 114 | +Table Name: "users" |
| 115 | +Username: "postgres" |
| 116 | +Password: "postgres" |
| 117 | +``` |
| 118 | + |
| 119 | + |
| 120 | +Data Types Mapping |
| 121 | +------------------ |
| 122 | +All PostgreSQL specific data types mapped to string and can have multiple input formats and one 'canonical' output form. |
| 123 | +Please, refer to PostgreSQL data types documentation to figure out proper formats. |
| 124 | + |
| 125 | +| PostgreSQL Data Type | CDAP Schema Data Type | Comment | |
| 126 | +|-----------------------------------------------------|-----------------------|----------------------------------------------| |
| 127 | +| bigint | long | | |
| 128 | +| bit(n) | string | string with '0' and '1' chars exact n length | |
| 129 | +| bit varying(n) | string | string with '0' and '1' chars max n length | |
| 130 | +| boolean | boolean | | |
| 131 | +| bytea | bytes | | |
| 132 | +| character | string | | |
| 133 | +| character varying | string | | |
| 134 | +| double precision | double | | |
| 135 | +| integer | int | | |
| 136 | +| numeric(precision, scale)/decimal(precision, scale) | decimal | | |
| 137 | +| real | float | | |
| 138 | +| smallint | int | | |
| 139 | +| text | string | | |
| 140 | +| date | date | | |
| 141 | +| time [ (p) ] [ without time zone ] | time | | |
| 142 | +| time [ (p) ] with time zone | string | | |
| 143 | +| timestamp [ (p) ] [ without time zone ] | timestamp | | |
| 144 | +| timestamp [ (p) ] with time zone | timestamp | stored in UTC format in database | |
| 145 | +| xml | string | | |
| 146 | +| tsquery | string | | |
| 147 | +| tsvector | string | | |
| 148 | +| uuid | string | | |
| 149 | +| box | string | | |
| 150 | +| cidr | string | | |
| 151 | +| circle | string | | |
| 152 | +| inet | string | | |
| 153 | +| interval | string | | |
| 154 | +| json | string | | |
| 155 | +| jsonb | string | | |
| 156 | +| line | string | | |
| 157 | +| lseg | string | | |
| 158 | +| macaddr | string | | |
| 159 | +| macaddr8 | string | | |
| 160 | +| money | string | | |
| 161 | +| path | string | | |
| 162 | +| point | string | | |
| 163 | +| polygon | string | | |
0 commit comments