diff --git a/src/current/_data/redirects.yml b/src/current/_data/redirects.yml index 2f19e620a15..bb861ce5543 100644 --- a/src/current/_data/redirects.yml +++ b/src/current/_data/redirects.yml @@ -251,6 +251,10 @@ sources: ['grant-roles.md'] versions: ['v21.1'] +- destination: how-does-a-changefeed-work.md + sources: ['how-does-an-enterprise-changefeed-work.md'] + versions: ['v25.2'] + - destination: kubernetes-overview.md sources: ['operate-cockroachdb-kubernetes.md'] versions: ['v21.2'] diff --git a/src/current/_includes/v25.2/cdc/cdc-schema-locked-example.md b/src/current/_includes/v25.2/cdc/cdc-schema-locked-example.md index 0908749d4de..5af4d0a248c 100644 --- a/src/current/_includes/v25.2/cdc/cdc-schema-locked-example.md +++ b/src/current/_includes/v25.2/cdc/cdc-schema-locked-example.md @@ -1,4 +1,4 @@ -Use the `schema_locked` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}) to disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on a watched table, which allows the changefeed to take a fast path that avoids checking if there are schema changes that could require synchronization between [changefeed aggregators]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). This helps to decrease the latency between a write committing to a table and it emitting to the [changefeed's sink]({% link {{ page.version.version }}/changefeed-sinks.md %}). Enabling `schema_locked` +Use the `schema_locked` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}) to disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on a watched table, which allows the changefeed to take a fast path that avoids checking if there are schema changes that could require synchronization between [changefeed aggregators]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). This helps to decrease the latency between a write committing to a table and it emitting to the [changefeed's sink]({% link {{ page.version.version }}/changefeed-sinks.md %}). Enabling `schema_locked` Enable `schema_locked` on the watched table with the [`ALTER TABLE`]({% link {{ page.version.version }}/alter-table.md %}) statement: diff --git a/src/current/_includes/v25.2/cdc/core-csv.md b/src/current/_includes/v25.2/cdc/core-csv.md deleted file mode 100644 index 0901eed2def..00000000000 --- a/src/current/_includes/v25.2/cdc/core-csv.md +++ /dev/null @@ -1,3 +0,0 @@ -{{site.data.alerts.callout_info}} -To determine how wide the columns need to be, the default `table` display format in `cockroach sql` buffers the results it receives from the server before printing them to the console. When consuming basic changefeed data using `cockroach sql`, it's important to use a display format like `csv` that does not buffer its results. To set the display format, use the [`--format=csv` flag]({% link {{ page.version.version }}/cockroach-sql.md %}#sql-flag-format) when starting the [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}), or set the [`\set display_format=csv` option]({% link {{ page.version.version }}/cockroach-sql.md %}#client-side-options) once the SQL client is open. -{{site.data.alerts.end}} diff --git a/src/current/_includes/v25.2/cdc/core-url.md b/src/current/_includes/v25.2/cdc/core-url.md deleted file mode 100644 index 029e0ac40b7..00000000000 --- a/src/current/_includes/v25.2/cdc/core-url.md +++ /dev/null @@ -1,3 +0,0 @@ -{{site.data.alerts.callout_info}} -Because basic changefeeds return results differently than other SQL statements, they require a dedicated database connection with specific settings around result buffering. In normal operation, CockroachDB improves performance by buffering results server-side before returning them to a client; however, result buffering is automatically turned off for basic changefeeds. basic changefeeds also have different cancellation behavior than other queries: they can only be canceled by closing the underlying connection or issuing a [`CANCEL QUERY`]({% link {{ page.version.version }}/cancel-query.md %}) statement on a separate connection. Combined, these attributes of changefeeds mean that applications should explicitly create dedicated connections to consume changefeed data, instead of using a connection pool as most client drivers do by default. -{{site.data.alerts.end}} diff --git a/src/current/_includes/v25.2/cdc/create-core-changefeed-avro.md b/src/current/_includes/v25.2/cdc/create-sinkless-changefeed-avro.md similarity index 78% rename from src/current/_includes/v25.2/cdc/create-core-changefeed-avro.md rename to src/current/_includes/v25.2/cdc/create-sinkless-changefeed-avro.md index 53dab65cff2..4579133f83e 100644 --- a/src/current/_includes/v25.2/cdc/create-core-changefeed-avro.md +++ b/src/current/_includes/v25.2/cdc/create-sinkless-changefeed-avro.md @@ -1,4 +1,4 @@ -In this example, you'll set up a basic changefeed for a single-node cluster that emits Avro records. CockroachDB's Avro binary encoding convention uses the [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html) to store Avro schemas. +In this example, you'll set up a sinkless changefeed for a single-node cluster that emits Avro records. CockroachDB's Avro binary encoding convention uses the [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html) to store Avro schemas. 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: @@ -28,36 +28,36 @@ In this example, you'll set up a basic changefeed for a single-node cluster that $ cockroach sql --url="postgresql://root@127.0.0.1:26257?sslmode=disable" --format=csv ~~~ - {% include {{ page.version.version }}/cdc/core-url.md %} + {% include {{ page.version.version }}/cdc/sinkless-url.md %} - {% include {{ page.version.version }}/cdc/core-csv.md %} + {% include {{ page.version.version }}/cdc/sinkless-csv.md %} 1. Enable the `kv.rangefeed.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}): {% include_cached copy-clipboard.html %} ~~~ sql - > SET CLUSTER SETTING kv.rangefeed.enabled = true; + SET CLUSTER SETTING kv.rangefeed.enabled = true; ~~~ 1. Create table `bar`: {% include_cached copy-clipboard.html %} ~~~ sql - > CREATE TABLE bar (a INT PRIMARY KEY); + CREATE TABLE bar (a INT PRIMARY KEY); ~~~ 1. Insert a row into the table: {% include_cached copy-clipboard.html %} ~~~ sql - > INSERT INTO bar VALUES (0); + INSERT INTO bar VALUES (0); ~~~ 1. Start the basic changefeed: {% include_cached copy-clipboard.html %} ~~~ sql - > EXPERIMENTAL CHANGEFEED FOR bar WITH format = avro, confluent_schema_registry = 'http://localhost:8081'; + CREATE CHANGEFEED FOR TABLE bar WITH format = avro, confluent_schema_registry = 'http://localhost:8081'; ~~~ ~~~ @@ -69,16 +69,16 @@ In this example, you'll set up a basic changefeed for a single-node cluster that {% include_cached copy-clipboard.html %} ~~~ shell - $ cockroach sql --insecure -e "INSERT INTO bar VALUES (1)" + cockroach sql --insecure -e "INSERT INTO bar VALUES (1)" ~~~ -1. Back in the terminal where the basic changefeed is streaming, the output will appear: +1. Back in the terminal where the changefeed is streaming, the output will appear: ~~~ bar,\000\000\000\000\001\002\002,\000\000\000\000\002\002\002\002 ~~~ - Note that records may take a couple of seconds to display in the basic changefeed. + Note that records may take a couple of seconds to display in the changefeed. 1. To stop streaming the changefeed, enter **CTRL+C** into the terminal where the changefeed is running. diff --git a/src/current/_includes/v25.2/cdc/create-core-changefeed.md b/src/current/_includes/v25.2/cdc/create-sinkless-changefeed.md similarity index 78% rename from src/current/_includes/v25.2/cdc/create-core-changefeed.md rename to src/current/_includes/v25.2/cdc/create-sinkless-changefeed.md index df2264501a0..c053f5bc643 100644 --- a/src/current/_includes/v25.2/cdc/create-core-changefeed.md +++ b/src/current/_includes/v25.2/cdc/create-sinkless-changefeed.md @@ -1,4 +1,4 @@ -In this example, you'll set up a basic changefeed for a single-node cluster. +In this example, you'll set up a sinkless changefeed for a single-node cluster. 1. In a terminal window, start `cockroach`: @@ -14,41 +14,41 @@ In this example, you'll set up a basic changefeed for a single-node cluster. {% include_cached copy-clipboard.html %} ~~~ shell - $ cockroach sql \ + cockroach sql \ --url="postgresql://root@127.0.0.1:26257?sslmode=disable" \ --format=csv ~~~ - {% include {{ page.version.version }}/cdc/core-url.md %} + {% include {{ page.version.version }}/cdc/sinkless-url.md %} - {% include {{ page.version.version }}/cdc/core-csv.md %} + {% include {{ page.version.version }}/cdc/sinkless-csv.md %} 1. Enable the `kv.rangefeed.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}): {% include_cached copy-clipboard.html %} ~~~ sql - > SET CLUSTER SETTING kv.rangefeed.enabled = true; + SET CLUSTER SETTING kv.rangefeed.enabled = true; ~~~ 1. Create table `foo`: {% include_cached copy-clipboard.html %} ~~~ sql - > CREATE TABLE foo (a INT PRIMARY KEY); + CREATE TABLE foo (a INT PRIMARY KEY); ~~~ 1. Insert a row into the table: {% include_cached copy-clipboard.html %} ~~~ sql - > INSERT INTO foo VALUES (0); + INSERT INTO foo VALUES (0); ~~~ -1. Start the basic changefeed: +1. Start the sinkless changefeed: {% include_cached copy-clipboard.html %} ~~~ sql - > EXPERIMENTAL CHANGEFEED FOR foo; + CREATE CHANGEFEED FOR TABLE foo; ~~~ ~~~ table,key,value @@ -62,13 +62,13 @@ In this example, you'll set up a basic changefeed for a single-node cluster. $ cockroach sql --insecure -e "INSERT INTO foo VALUES (1)" ~~~ -1. Back in the terminal where the basic changefeed is streaming, the following output has appeared: +1. Back in the terminal where the changefeed is streaming, the following output has appeared: ~~~ foo,[1],"{""after"": {""a"": 1}}" ~~~ - Note that records may take a couple of seconds to display in the basic changefeed. + Note that records may take a couple of seconds to display in the changefeed. 1. To stop streaming the changefeed, enter **CTRL+C** into the terminal where the changefeed is running. diff --git a/src/current/_includes/v25.2/cdc/examples-license-workload.md b/src/current/_includes/v25.2/cdc/examples-license-workload.md index 32d395aaed8..02bf7e7fe57 100644 --- a/src/current/_includes/v25.2/cdc/examples-license-workload.md +++ b/src/current/_includes/v25.2/cdc/examples-license-workload.md @@ -1,5 +1,3 @@ -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} diff --git a/src/current/_includes/v25.2/cdc/lagging-ranges.md b/src/current/_includes/v25.2/cdc/lagging-ranges.md index 8316c347dda..be1257d2c90 100644 --- a/src/current/_includes/v25.2/cdc/lagging-ranges.md +++ b/src/current/_includes/v25.2/cdc/lagging-ranges.md @@ -5,7 +5,7 @@ Use the `changefeed.lagging_ranges` metric to track the number of [ranges]({% li - `lagging_ranges_polling_interval` sets the interval rate for when lagging ranges are checked and the `lagging_ranges` metric is updated. Polling adds latency to the `lagging_ranges` metric being updated. For example, if a range falls behind by 3 minutes, the metric may not update until an additional minute afterward. - **Default:** `1m` -Use the `changefeed.total_ranges` metric to monitor the number of ranges that are watched by [aggregator processors]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) participating in the changefeed job. If you're experiencing lagging ranges, `changefeed.total_ranges` may indicate that the number of ranges watched by aggregator processors in the job is unbalanced. You may want to try [pausing]({% link {{ page.version.version }}/pause-job.md %}) the changefeed and then [resuming]({% link {{ page.version.version }}/resume-job.md %}) it, so that the changefeed replans the work in the cluster. `changefeed.total_ranges` shares the same polling interval as the `changefeed.lagging_ranges` metric, which is controlled by the `lagging_ranges_polling_interval` option. +Use the `changefeed.total_ranges` metric to monitor the number of ranges that are watched by [aggregator processors]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) participating in the changefeed job. If you're experiencing lagging ranges, `changefeed.total_ranges` may indicate that the number of ranges watched by aggregator processors in the job is unbalanced. You may want to try [pausing]({% link {{ page.version.version }}/pause-job.md %}) the changefeed and then [resuming]({% link {{ page.version.version }}/resume-job.md %}) it, so that the changefeed replans the work in the cluster. `changefeed.total_ranges` shares the same polling interval as the `changefeed.lagging_ranges` metric, which is controlled by the `lagging_ranges_polling_interval` option. {{site.data.alerts.callout_success}} You can use the [`metrics_label`]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#using-changefeed-metrics-labels) option to track the `lagging_ranges` and `total_ranges` metric per changefeed. diff --git a/src/current/_includes/v25.2/cdc/modify-changefeed.md b/src/current/_includes/v25.2/cdc/modify-changefeed.md index fde29d8687e..caa4ff069e2 100644 --- a/src/current/_includes/v25.2/cdc/modify-changefeed.md +++ b/src/current/_includes/v25.2/cdc/modify-changefeed.md @@ -1,4 +1,4 @@ -To modify an {{ site.data.products.enterprise }} changefeed, [pause]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}#pause) the job and then use: +To modify a changefeed, [pause]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}#pause) the job and then use: ~~~ sql ALTER CHANGEFEED job_id {ADD table DROP table SET option UNSET option}; diff --git a/src/current/_includes/v25.2/cdc/msk-tutorial-crdb-setup.md b/src/current/_includes/v25.2/cdc/msk-tutorial-crdb-setup.md index 40de46f2af7..2d58e8a5543 100644 --- a/src/current/_includes/v25.2/cdc/msk-tutorial-crdb-setup.md +++ b/src/current/_includes/v25.2/cdc/msk-tutorial-crdb-setup.md @@ -21,10 +21,6 @@ cockroach sql --insecure ~~~ - {{site.data.alerts.callout_info}} - To set your {{ site.data.products.enterprise }} license, refer to the [Licensing FAQs]({% link {{ page.version.version }}/licensing-faqs.md %}#set-a-license) page. - {{site.data.alerts.end}} - 1. Enable the `kv.rangefeed.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}): {% include_cached copy-clipboard.html %} diff --git a/src/current/_includes/v25.2/cdc/show-changefeed-job.md b/src/current/_includes/v25.2/cdc/show-changefeed-job.md index 02893bc7766..2e31e862975 100644 --- a/src/current/_includes/v25.2/cdc/show-changefeed-job.md +++ b/src/current/_includes/v25.2/cdc/show-changefeed-job.md @@ -10,7 +10,7 @@ SHOW CHANGEFEED JOBS; (2 rows) ~~~ -To show an individual {{ site.data.products.enterprise }} changefeed: +To show an individual changefeed: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/_includes/v25.2/cdc/sinkless-csv.md b/src/current/_includes/v25.2/cdc/sinkless-csv.md new file mode 100644 index 00000000000..e55fe7d5193 --- /dev/null +++ b/src/current/_includes/v25.2/cdc/sinkless-csv.md @@ -0,0 +1,3 @@ +{{site.data.alerts.callout_info}} +To determine how wide the columns need to be, the default `table` display format in `cockroach sql` buffers the results it receives from the server before printing them to the console. When consuming sinkless changefeed data using `cockroach sql`, it's important to use a display format like `csv` that does not buffer its results. To set the display format, use the [`--format=csv` flag]({% link {{ page.version.version }}/cockroach-sql.md %}#sql-flag-format) when starting the [built-in SQL client]({% link {{ page.version.version }}/cockroach-sql.md %}), or set the [`\set display_format=csv` option]({% link {{ page.version.version }}/cockroach-sql.md %}#client-side-options) once the SQL client is open. +{{site.data.alerts.end}} diff --git a/src/current/_includes/v25.2/cdc/sinkless-url.md b/src/current/_includes/v25.2/cdc/sinkless-url.md new file mode 100644 index 00000000000..59d87edb640 --- /dev/null +++ b/src/current/_includes/v25.2/cdc/sinkless-url.md @@ -0,0 +1,3 @@ +{{site.data.alerts.callout_info}} +Sinkless changefeeds return results differently than other SQL statements, which means that they require a dedicated database connection with specific settings around result buffering. In normal operation, CockroachDB improves performance by buffering results server-side before returning them to a client; however, result buffering is automatically turned off for sinkless changefeeds. Also, sinkless changefeeds have different cancellation behavior than other queries: they can only be canceled by closing the underlying connection or issuing a [`CANCEL QUERY`]({% link {{ page.version.version }}/cancel-query.md %}) statement on a separate connection. Combined, these attributes of changefeeds mean that applications should explicitly create dedicated connections to consume changefeed data, instead of using a connection pool as most client drivers do by default. +{{site.data.alerts.end}} diff --git a/src/current/_includes/v25.2/cdc/sql-cluster-settings-example.md b/src/current/_includes/v25.2/cdc/sql-cluster-settings-example.md index fa2887967a1..0bb48fb4c4b 100644 --- a/src/current/_includes/v25.2/cdc/sql-cluster-settings-example.md +++ b/src/current/_includes/v25.2/cdc/sql-cluster-settings-example.md @@ -2,26 +2,14 @@ {% include_cached copy-clipboard.html %} ~~~ shell - $ cockroach sql --insecure - ~~~ - -1. Set your organization name and [{{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#types-of-licenses) key: - - {% include_cached copy-clipboard.html %} - ~~~ sql - > SET CLUSTER SETTING cluster.organization = ''; - ~~~ - - {% include_cached copy-clipboard.html %} - ~~~ sql - > SET CLUSTER SETTING enterprise.license = ''; + cockroach sql --insecure ~~~ 1. Enable the `kv.rangefeed.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}): {% include_cached copy-clipboard.html %} ~~~ sql - > SET CLUSTER SETTING kv.rangefeed.enabled = true; + SET CLUSTER SETTING kv.rangefeed.enabled = true; ~~~ {% include {{ page.version.version }}/cdc/cdc-cloud-rangefeed.md %} diff --git a/src/current/_includes/v25.2/sidebar-data/stream-data.json b/src/current/_includes/v25.2/sidebar-data/stream-data.json index 020861b68be..489c127e88c 100644 --- a/src/current/_includes/v25.2/sidebar-data/stream-data.json +++ b/src/current/_includes/v25.2/sidebar-data/stream-data.json @@ -150,9 +150,9 @@ "title": "Technical Overview", "items": [ { - "title": "How Does an Enterprise Changefeed Work?", + "title": "How Does a Changefeed Work?", "urls": [ - "/${VERSION}/how-does-an-enterprise-changefeed-work.html" + "/${VERSION}/how-does-a-changefeed-work.html" ] } ] diff --git a/src/current/cockroachcloud/costs.md b/src/current/cockroachcloud/costs.md index 8792a223eeb..cd2688d39bf 100644 --- a/src/current/cockroachcloud/costs.md +++ b/src/current/cockroachcloud/costs.md @@ -284,7 +284,7 @@ This is the usage for any data leaving CockroachDB such as SQL data being sent t ### Change data capture (changefeeds) -For change data capture (CDC), all CockroachDB {{ site.data.products.cloud }} clusters can use [Enterprise changefeeds]({% link {{ site.current_cloud_version}}/how-does-an-enterprise-changefeed-work.md %}). +For change data capture (CDC), all CockroachDB {{ site.data.products.cloud }} clusters can use [Enterprise changefeeds]({% link {{ site.current_cloud_version}}/how-does-a-changefeed-work.md %}).
diff --git a/src/current/cockroachcloud/stream-changefeed-to-snowflake-aws.md b/src/current/cockroachcloud/stream-changefeed-to-snowflake-aws.md index 7bad08e466e..2a8423f3969 100644 --- a/src/current/cockroachcloud/stream-changefeed-to-snowflake-aws.md +++ b/src/current/cockroachcloud/stream-changefeed-to-snowflake-aws.md @@ -7,7 +7,7 @@ docs_area: stream_data While CockroachDB is an excellent system of record, it also needs to coexist with other systems. For example, you might want to keep your data mirrored in full-text indexes, analytics engines, or big data pipelines. -This page demonstrates how to use an [{{ site.data.products.enterprise }} changefeed](../{{site.current_cloud_version}}/create-changefeed.html) to stream row-level changes to [Snowflake](https://www.snowflake.com/), an online analytical processing (OLAP) database. +This page demonstrates how to use a [changefeed](../{{site.current_cloud_version}}/create-changefeed.html) to stream row-level changes to [Snowflake](https://www.snowflake.com/), an online analytical processing (OLAP) database. {{site.data.alerts.callout_info}} Snowflake is optimized for inserts and batch rewrites over streaming updates. This tutorial sets up a changefeed to stream data to S3 with Snowpipe sending changes to Snowflake. Snowpipe imports previously unseen files and does not address uniqueness for primary keys, which means that target tables in Snowflake can contain multiple records per primary key. @@ -91,11 +91,11 @@ Every change to a watched row is emitted as a record in a configurable format (i 1. Create an S3 bucket where streaming updates from the watched tables will be collected. - You will need the name of the S3 bucket when you [create your changefeed](#step-7-create-an-enterprise-changefeed). Ensure you have a set of IAM credentials with write access on the S3 bucket that you will use during [changefeed setup](#step-7-create-an-enterprise-changefeed). + You will need the name of the S3 bucket when you [create your changefeed](#step-7-create-a-changefeed). Ensure you have a set of IAM credentials with write access on the S3 bucket that you will use during [changefeed setup](#step-7-create-a-changefeed). -## Step 7. Create an enterprise changefeed +## Step 7. Create a changefeed -Back in the built-in SQL shell, [create an enterprise changefeed](../{{site.current_cloud_version}}/create-changefeed.html). Replace the placeholders with your AWS access key ID and AWS secret access key: +Back in the built-in SQL shell, [create a changefeed](../{{site.current_cloud_version}}/create-changefeed.html). Replace the placeholders with your AWS access key ID and AWS secret access key: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/images/changefeed-structure.png b/src/current/images/changefeed-structure.png new file mode 100644 index 00000000000..b802cb45038 Binary files /dev/null and b/src/current/images/changefeed-structure.png differ diff --git a/src/current/images/v25.2/changefeed-structure.png b/src/current/images/v25.2/changefeed-structure.png deleted file mode 100644 index 3b09f8f15e9..00000000000 Binary files a/src/current/images/v25.2/changefeed-structure.png and /dev/null differ diff --git a/src/current/v25.2/advanced-changefeed-configuration.md b/src/current/v25.2/advanced-changefeed-configuration.md index 0a64c56e0f6..0004031b065 100644 --- a/src/current/v25.2/advanced-changefeed-configuration.md +++ b/src/current/v25.2/advanced-changefeed-configuration.md @@ -63,13 +63,13 @@ Adjusting `kv.closed_timestamp.target_duration` could have a detrimental impact `kv.closed_timestamp.target_duration` controls the target [closed timestamp]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) lag duration, which determines how far behind the current time CockroachDB will attempt to maintain the closed timestamp. For example, with the default value of `3s`, if the current time is `12:30:00` then CockroachDB will attempt to keep the closed timestamp at `12:29:57` by possibly retrying or aborting ongoing writes that are below this time. -A changefeed aggregates checkpoints across all ranges, and once the timestamp on all the ranges advances, the changefeed can then [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). In the context of changefeeds, `kv.closed_timestamp.target_duration` affects how old the checkpoints will be, which will determine the latency before changefeeds can consider the history of an event complete. +A changefeed aggregates checkpoints across all ranges, and once the timestamp on all the ranges advances, the changefeed can then [checkpoint]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). In the context of changefeeds, `kv.closed_timestamp.target_duration` affects how old the checkpoints will be, which will determine the latency before changefeeds can consider the history of an event complete. #### `kv.rangefeed.closed_timestamp_refresh_interval` **Default:** `3s` -This setting controls the interval at which [closed timestamp]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) updates are delivered to [rangefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) and in turn emitted as a [changefeed checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). +This setting controls the interval at which [closed timestamp]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) updates are delivered to [rangefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) and in turn emitted as a [changefeed checkpoint]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). Increasing the interval value will lengthen the delay between each checkpoint, which will increase the latency of changefeed checkpoints, but reduce the impact on SQL latency due to [overload]({% link {{ page.version.version }}/admission-control.md %}#use-cases-for-admission-control) on the cluster. This happens because every range with a rangefeed has to emit a checkpoint event with this `3s` interval. As an example, 1 million ranges would result in 330,000 events per second, which would use more CPU resources. @@ -117,7 +117,7 @@ Before tuning these settings, we recommend reading details on our [changefeed at ### Pausing changefeeds and garbage collection -By default, [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) will protect changefeed data from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) up to the time of the [_checkpoint_]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). Protected timestamps will protect changefeed data from garbage collection if the downstream [changefeed sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) is unavailable until you either [cancel]({% link {{ page.version.version }}/cancel-job.md %}) the changefeed or the sink becomes available once again. +By default, [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) will protect changefeed data from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) up to the time of the [_checkpoint_]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). Protected timestamps will protect changefeed data from garbage collection if the downstream [changefeed sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) is unavailable until you either [cancel]({% link {{ page.version.version }}/cancel-job.md %}) the changefeed or the sink becomes available once again. However, if the changefeed lags too far behind, the protected changes could lead to an accumulation of garbage. This could result in increased disk usage and degraded performance for some workloads. @@ -175,7 +175,7 @@ When designing a system that needs to emit a lot of changefeed messages, whether When a changefeed emits a [resolved]({% link {{ page.version.version }}/create-changefeed.md %}#resolved) message, it force flushes all outstanding messages that have buffered, which will diminish your changefeed's throughput while the flush completes. Therefore, if you are aiming for higher throughput, we suggest setting the duration higher (e.g., 10 minutes), or **not** using the `resolved` option. -If you are setting the `resolved` option when you are aiming for high throughput, you must also consider the [`min_checkpoint_frequency`]({% link {{ page.version.version }}/create-changefeed.md %}#min-checkpoint-frequency) option, which defaults to `30s`. This option controls how often nodes flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). As a result, `resolved` messages will not be emitted more frequently than the configured `min_checkpoint_frequency`. Set this option to at least as long as your `resolved` option duration. +If you are setting the `resolved` option when you are aiming for high throughput, you must also consider the [`min_checkpoint_frequency`]({% link {{ page.version.version }}/create-changefeed.md %}#min-checkpoint-frequency) option, which defaults to `30s`. This option controls how often nodes flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). As a result, `resolved` messages will not be emitted more frequently than the configured `min_checkpoint_frequency`. Set this option to at least as long as your `resolved` option duration. ### Batching and buffering messages diff --git a/src/current/v25.2/cdc-queries.md b/src/current/v25.2/cdc-queries.md index 043f89efd79..82dc3d0a119 100644 --- a/src/current/v25.2/cdc-queries.md +++ b/src/current/v25.2/cdc-queries.md @@ -63,7 +63,7 @@ Function | Description --------------------------+---------------------- `changefeed_creation_timestamp()` | Returns the decimal MVCC timestamp when the changefeed was created. Use this function to build CDC queries that restrict emitted events by time. `changefeed_creation_timestamp()` can serve a similar purpose to the [`now()` time function]({% link {{ page.version.version }}/functions-and-operators.md %}#date-and-time-functions), which is not supported with CDC queries. `event_op()` | Returns a string describing the type of event. If a changefeed is running with the [`diff`]({% link {{ page.version.version }}/create-changefeed.md %}#diff) option, then this function returns `'insert'`, `'update'`, or `'delete'`. If a changefeed is running without the `diff` option, it is not possible to determine an update from an insert, so `event_op()` returns [`'upsert'`](https://www.cockroachlabs.com/blog/sql-upsert/) or `'delete'`.

If you're using CDC queries to filter only for the type of change operation, we recommend specifying the [`envelope=enriched` option]({% link {{ page.version.version }}/changefeed-message-envelopes.md %}#route-events-based-on-operation-type) for this metadata instead. -`event_schema_timestamp()` | Returns the timestamp of [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) events that cause a [changefeed message]({% link {{ page.version.version }}/changefeed-messages.md %}) to emit. When the schema change event does not result in a table backfill or scan, `event_schema_timestamp()` will return the event's timestamp. When the schema change event does result in a table backfill or scan, `event_schema_timestamp()` will return the timestamp at which the backfill/scan is read — the [high-water mark time]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) of the changefeed. +`event_schema_timestamp()` | Returns the timestamp of [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) events that cause a [changefeed message]({% link {{ page.version.version }}/changefeed-messages.md %}) to emit. When the schema change event does not result in a table backfill or scan, `event_schema_timestamp()` will return the event's timestamp. When the schema change event does result in a table backfill or scan, `event_schema_timestamp()` will return the timestamp at which the backfill/scan is read — the [high-water mark time]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) of the changefeed. You can also use the following functions in CDC queries: diff --git a/src/current/v25.2/change-data-capture-overview.md b/src/current/v25.2/change-data-capture-overview.md index a8c44044f7f..f56e7784a32 100644 --- a/src/current/v25.2/change-data-capture-overview.md +++ b/src/current/v25.2/change-data-capture-overview.md @@ -6,29 +6,31 @@ docs_area: stream_data key: stream-data-out-of-cockroachdb-using-changefeeds.html --- -Change data capture (CDC) detects row-level data changes in CockroachDB and sends the change as a message to a configurable sink for downstream processing purposes. While CockroachDB is an excellent system of record, it also needs to coexist with other systems. +**Change data capture (CDC)** detects row-level data changes in CockroachDB and emits those changes as messages for downstream processing. While CockroachDB is an excellent system of record, CDC allows it to integrate with other systems in your data ecosystem. For example, you might want to: - Stream messages to Kafka to trigger notifications in an application. -- Keep your data mirrored in full-text indexes, analytics engines, or big data pipelines. -- Export a snaphot of tables to backfill new applications. -- Send updates to data stores for machine learning models. +- Mirror your data in full-text indexes, analytics engines, or big data pipelines. +- Export a snapshot of tables to backfill new applications. +- Feed updates to data stores powering machine learning models. {% include common/define-watched-cdc.md %} ## Stream row-level changes with changefeeds -Changefeeds are customizable _jobs_ that track row-level changes and send data in realtime in a preferred format to your specified destination, known as a _sink_. Every row change will be emitted at least once and the first emit of every event for the same key will be ordered by timestamp. +Changefeeds are customizable _jobs_ that monitor row-level changes in a table and emit updates in real time. These updates are delivered in your preferred format to a specified destination, known as a _sink_. -CockroachDB has two implementations of changefeeds: +In production, changefeeds are typically configured with an external sink such as Kafka or cloud storage. However, for development and testing purposes, _sinkless changefeeds_ allow you to stream change data directly to your SQL client. + +Each emitted row change is delivered at least once, and the first emit of every event for the same key is ordered by timestamp. - - + + @@ -44,7 +46,7 @@ CockroachDB has two implementations of changefeeds: Product availability - + @@ -52,15 +54,15 @@ CockroachDB has two implementations of changefeeds: Message delivery - + - - + + @@ -75,7 +77,7 @@ CockroachDB has two implementations of changefeeds: - + @@ -100,14 +102,14 @@ CockroachDB has two implementations of changefeeds: Message format - + - + @@ -125,10 +127,10 @@ CockroachDB has two implementations of changefeeds: To get started with changefeeds in CockroachDB, refer to: -- [Create and Configure Changefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}): Learn about the fundamentals of using SQL statements to create and manage Enterprise and basic changefeeds. +- [Create and Configure Changefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}): Learn about the fundamentals of using SQL statements to create and manage changefeeds. - [Changefeed Sinks]({% link {{ page.version.version }}/changefeed-sinks.md %}): The downstream system to which the changefeed emits changes. Learn about the supported sinks and configuration capabilities. -- [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}): The change events that emit from the changefeed to your sink. Learn about how messages are ordered at your sink and the options to configure and format messages. -- [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}): Step-by-step examples for connecting to each changefeed sink. +- [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}): The change events that emit from the changefeed. Learn about how messages are ordered and the options to configure and format messages. +- [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}): Step-by-step examples for connecting to changefeed sinks or running sinkless changefeeds. ### Authenticate to your changefeed sink @@ -161,7 +163,7 @@ For detail on how protected timestamps and garbage collection interact with chan ### Filter your change data with CDC queries -_Change data capture queries_ allow you to define and filter the change data emitted to your sink when you create an Enterprise changefeed. +_Change data capture queries_ allow you to define and filter the change data emitted to your sink when you create an changefeed. For example, you can use CDC queries to: diff --git a/src/current/v25.2/changefeed-best-practices.md b/src/current/v25.2/changefeed-best-practices.md index aefb6ec1dfb..880e133394e 100644 --- a/src/current/v25.2/changefeed-best-practices.md +++ b/src/current/v25.2/changefeed-best-practices.md @@ -31,7 +31,7 @@ When you are running more than 10 changefeeds on a cluster, it is important to m To maintain a high number of changefeeds in your cluster: -- Connect to different nodes to [create]({% link {{ page.version.version }}/create-changefeed.md %}) each changefeed. The node on which you start the changefeed will become the _coordinator_ node for the changefeed job. The coordinator node acts as an administrator: keeping track of all other nodes during job execution and the changefeed work as it completes. As a result, this node will use more resources for the changefeed job. For more detail, refer to [How does an Enterprise changefeed work?]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). +- Connect to different nodes to [create]({% link {{ page.version.version }}/create-changefeed.md %}) each changefeed. The node on which you start the changefeed will become the _coordinator_ node for the changefeed job. The coordinator node acts as an administrator: keeping track of all other nodes during job execution and the changefeed work as it completes. As a result, this node will use more resources for the changefeed job. For more detail, refer to [How does a changefeed work?]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). - Consider logically grouping the target tables into one changefeed. When a changefeed [pauses]({% link {{ page.version.version }}/pause-job.md %}), it will stop emitting messages for the target tables. Grouping tables of related data into a single changefeed may make sense for your workload. However, we do not recommend watching hundreds of tables in a single changefeed. For more detail on protecting data from garbage collection when a changefeed is paused, refer to [Garbage collection and changefeeds]({% link {{ page.version.version }}/protect-changefeed-data.md %}). ## Monitor changefeeds diff --git a/src/current/v25.2/changefeed-examples.md b/src/current/v25.2/changefeed-examples.md index d083e452785..852ca21e36a 100644 --- a/src/current/v25.2/changefeed-examples.md +++ b/src/current/v25.2/changefeed-examples.md @@ -5,11 +5,11 @@ toc: true docs_area: stream_data --- -This page provides step-by-step examples for using Core and {{ site.data.products.enterprise }} changefeeds. Creating {{ site.data.products.enterprise }} changefeeds is available on CockroachDB {{ site.data.products.basic }}, {{ site.data.products.standard }}, {{ site.data.products.advanced }}, and with an [{{ site.data.products.enterprise }} license](licensing-faqs.html#types-of-licenses) on CockroachDB {{ site.data.products.core }} clusters. Basic changefeeds are available in all products. +This page provides quick setup guides for connecting changefeeds to sinks and for using sinkless changefeeds. -For a summary of Core and {{ site.data.products.enterprise }} changefeed features, refer to the [Change Data Capture Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %}) page. +For a summary of changefeed features, refer to the [Change Data Capture Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %}) page. -{{ site.data.products.enterprise }} changefeeds can connect to the following sinks: +Changefeeds can emit messages to the following sinks: - [Kafka](#create-a-changefeed-connected-to-kafka) - [Google Cloud Pub/Sub](#create-a-changefeed-connected-to-a-google-cloud-pub-sub-sink) @@ -22,14 +22,12 @@ Refer to the [Changefeed Sinks]({% link {{ page.version.version }}/changefeed-si {% include {{ page.version.version }}/cdc/recommendation-monitoring-pts.md %} -Use the following filters to show usage examples for either **Enterprise** or **Core** changefeeds: -
- - + +
-
+
Before you run the examples, verify that you have the `CHANGEFEED` privilege in order to create and manage changefeed jobs. Refer to [Required privileges]({% link {{ page.version.version }}/create-changefeed.md %}#required-privileges) for more details. @@ -41,8 +39,6 @@ Before you run the examples, verify that you have the `CHANGEFEED` privilege in In this example, you'll set up a changefeed for a single-node cluster that is connected to a Kafka sink. The changefeed will watch two tables. -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} @@ -182,8 +178,6 @@ In this example, you'll set up a changefeed for a single-node cluster that is co In this example, you'll set up a changefeed for a single-node cluster that is connected to a Kafka sink and emits [Avro](https://avro.apache.org/docs/1.8.2/spec.html) records. The changefeed will watch two tables. -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} @@ -485,8 +479,6 @@ You'll need access to a [Google Cloud Project](https://cloud.google.com/resource In this example, you'll set up a changefeed for a single-node cluster that is connected to an AWS S3 sink. The changefeed watches two tables. Note that you can set up changefeeds for any of [these cloud storage providers]({% link {{ page.version.version }}/changefeed-sinks.md %}#cloud-storage-sink). -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} @@ -603,8 +595,6 @@ In this example, you'll set up a changefeed for a single-node cluster that is co In this example, you'll set up a changefeed for a single-node cluster that is connected to a local HTTP server via a webhook. For this example, you'll use an [example HTTP server](https://github.com/cockroachlabs/cdc-webhook-sink-test-server/tree/master/go-https-server) to test out the webhook sink. -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} @@ -773,24 +763,23 @@ In this example, you'll set up a changefeed for a single-node cluster that is co
-
+
-Basic changefeeds stream row-level changes to a client until the underlying SQL connection is closed. +Sinkless changefeeds stream row-level changes to a client until the underlying SQL connection is closed. -## Create a basic changefeed +## Create a sinkless changefeed -{% include {{ page.version.version }}/cdc/create-core-changefeed.md %} +{% include {{ page.version.version }}/cdc/create-sinkless-changefeed.md %} -## Create a basic changefeed using Avro +## Create a sinkless changefeed using Avro -{% include {{ page.version.version }}/cdc/create-core-changefeed-avro.md %} +{% include {{ page.version.version }}/cdc/create-sinkless-changefeed-avro.md %} -For further information on basic changefeeds, see [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.version.version }}/changefeed-for.md %}). +For further information on sinkless changefeeds, refer to the [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}#create-a-sinkless-changefeed) page.
## See also -- [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.version.version }}/changefeed-for.md %}) - [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}) - [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}) diff --git a/src/current/v25.2/changefeed-for.md b/src/current/v25.2/changefeed-for.md index 4a0829c7ebb..6481a7f33dc 100644 --- a/src/current/v25.2/changefeed-for.md +++ b/src/current/v25.2/changefeed-for.md @@ -5,7 +5,11 @@ toc: true docs_area: reference.sql --- -The `EXPERIMENTAL CHANGEFEED FOR` [statement]({% link {{ page.version.version }}/sql-statements.md %}) creates a new basic changefeed, which streams row-level changes to the client indefinitely until the underlying connection is closed or the changefeed is canceled. A basic changefeed can watch one table or multiple tables in a comma-separated list. +{{site.data.alerts.callout_danger}} +The `EXPERIMENTAL CHANGEFEED FOR` statement is **deprecated** as of v25.2 and will be removed in a future release. For the same functionality, use the [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}#create-a-sinkless-changefeed) statement to create a sinkless changefeed. +{{site.data.alerts.end}} + +The `EXPERIMENTAL CHANGEFEED FOR` [statement]({% link {{ page.version.version }}/sql-statements.md %}) creates a new sinkless changefeed, which streams row-level changes to the client indefinitely until the underlying connection is closed or the changefeed is canceled. A sinkless changefeed can watch one table or multiple tables in a comma-separated list. For more information, see [Change Data Capture Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %}). @@ -23,7 +27,7 @@ There is continued support for the [legacy privilege model](#legacy-privilege-mo To create a changefeed with `EXPERIMENTAL CHANGEFEED FOR`, a user must have the `SELECT` privilege on the changefeed's source tables. -You can [grant]({% link {{ page.version.version }}/grant.md %}#grant-privileges-on-specific-tables-in-a-database) a user the `SELECT` privilege to allow them to create basic changefeeds on a specific table: +You can [grant]({% link {{ page.version.version }}/grant.md %}#grant-privileges-on-specific-tables-in-a-database) a user the `SELECT` privilege to allow them to create sinkless changefeeds on a specific table: {% include_cached copy-clipboard.html %} ~~~sql @@ -36,9 +40,9 @@ Changefeeds can only be created by superusers, i.e., [members of the `admin` rol ## Considerations -- Because basic changefeeds return results differently than other SQL statements, they require a dedicated database connection with specific settings around result buffering. In normal operation, CockroachDB improves performance by buffering results server-side before returning them to a client; however, result buffering is automatically turned off for basic changefeeds. basic changefeeds also have different cancellation behavior than other queries: they can only be canceled by closing the underlying connection or issuing a [`CANCEL QUERY`]({% link {{ page.version.version }}/cancel-query.md %}) statement on a separate connection. Combined, these attributes of changefeeds mean that applications should explicitly create dedicated connections to consume changefeed data, instead of using a connection pool as most client drivers do by default. +- Because sinkless changefeeds return results differently than other SQL statements, they require a dedicated database connection with specific settings around result buffering. In normal operation, CockroachDB improves performance by buffering results server-side before returning them to a client; however, result buffering is automatically turned off for sinkless changefeeds. Also, sinkless changefeeds have different cancellation behavior than other queries: they can only be canceled by closing the underlying connection or issuing a [`CANCEL QUERY`]({% link {{ page.version.version }}/cancel-query.md %}) statement on a separate connection. Combined, these attributes of changefeeds mean that applications should explicitly create dedicated connections to consume changefeed data, instead of using a connection pool as most client drivers do by default. - This cancellation behavior (i.e., close the underlying connection to cancel the changefeed) also extends to client driver usage; in particular, when a client driver calls `Rows.Close()` after encountering errors for a stream of rows. The pgwire protocol requires that the rows be consumed before the connection is again usable, but in the case of a basic changefeed, the rows are never consumed. It is therefore critical that you close the connection, otherwise the application will be blocked forever on `Rows.Close()`. + This cancellation behavior (i.e., close the underlying connection to cancel the changefeed) also extends to client driver usage; in particular, when a client driver calls `Rows.Close()` after encountering errors for a stream of rows. The pgwire protocol requires that the rows be consumed before the connection is again usable, but in the case of a sinkless changefeed, the rows are never consumed. It is therefore critical that you close the connection, otherwise the application will be blocked forever on `Rows.Close()`. - In most cases, each version of a row will be emitted once. However, some infrequent conditions (e.g., node failures, network partitions) will cause them to be repeated. This gives our changefeeds an at-least-once delivery guarantee. For more information, see [Ordering Guarantees]({% link {{ page.version.version }}/changefeed-messages.md %}#ordering-and-delivery-guarantees). - As of v22.1, changefeeds filter out [`VIRTUAL` computed columns]({% link {{ page.version.version }}/computed-columns.md %}) from events by default. This is a [backward-incompatible change]({% link releases/v22.1.md %}#v22-1-0-backward-incompatible-changes). To maintain the changefeed behavior in previous versions where [`NULL`]({% link {{ page.version.version }}/null-handling.md %}) values are emitted for virtual computed columns, see the [`virtual_columns`]({% link {{ page.version.version }}/changefeed-for.md %}#virtual-columns) option for more detail. @@ -69,7 +73,7 @@ Option | Value | Description `envelope` | `wrapped` / `enriched` / `bare` / `key_only` / `row` | `wrapped` the default envelope structure for changefeed messages containing an array of the primary key, a top-level field for the type of message, and the current state of the row (or `null` for deleted rows).

Refer to [Changefeed Message Envelopes]({% link {{ page.version.version }}/changefeed-message-envelopes.md %}) page for more detail on each envelope.

Default: `envelope=wrapped`. `format` | `json` / `avro` / `csv` / `parquet` | Format of the emitted message.

`avro`: For mappings of CockroachDB types to Avro types, [refer to the table]({% link {{ page.version.version }}/changefeed-messages.md %}#avro-types) and detail on [Avro limitations](#avro-limitations). **Note:** [`confluent_schema_registry`](#confluent-registry) is required with `format=avro`.

`csv`: You cannot combine `format=csv` with the `diff` or [`resolved`](#resolved-option) options. Changefeeds use the same CSV format as the [`EXPORT`](export.html) statement. Refer to [Export data with changefeeds]({% link {{ page.version.version }}/export-data-with-changefeeds.md %}) for details using these options to create a changefeed as an alternative to `EXPORT`. **Note:** [`initial_scan = 'only'`](#initial-scan) is required with `format=csv`.

`parquet`: Cloud storage is the only supported sink. The `topic_in_value` option is not compatible with `parquet` format.

Default: `format=json`. `initial_scan` / `no_initial_scan` / `initial_scan_only` | N/A | Control whether or not an initial scan will occur at the start time of a changefeed. `initial_scan_only` will perform an initial scan and then the changefeed job will complete with a `successful` status. You cannot use [`end_time`](#end-time) and `initial_scan_only` simultaneously.

If none of these options are specified, an initial scan will occur if there is no [`cursor`](#cursor-option), and will not occur if there is one. This preserves the behavior from previous releases.

You cannot specify `initial_scan` and `no_initial_scan` or `no_initial_scan and` `initial_scan_only` simultaneously.

Default: `initial_scan`
If used in conjunction with `cursor`, an initial scan will be performed at the cursor timestamp. If no `cursor` is specified, the initial scan is performed at `now()`. -`min_checkpoint_frequency` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Controls how often nodes flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). Changefeeds will wait for at least the specified duration before a flushing. This can help you control the flush frequency to achieve better throughput. If this is set to `0s`, a node will flush as long as the high-water mark has increased for the ranges that particular node is processing. If a changefeed is resumed, then `min_checkpoint_frequency` is the amount of time that changefeed will need to catch up. That is, it could emit duplicate messages during this time.

**Note:** [`resolved`](#resolved-option) messages will not be emitted more frequently than the configured `min_checkpoint_frequency` (but may be emitted less frequently). Since `min_checkpoint_frequency` defaults to `30s`, you **must** configure `min_checkpoint_frequency` to at least the desired `resolved` message frequency if you require `resolved` messages more frequently than `30s`.

**Default:** `30s` +`min_checkpoint_frequency` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Controls how often nodes flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). Changefeeds will wait for at least the specified duration before a flushing. This can help you control the flush frequency to achieve better throughput. If this is set to `0s`, a node will flush as long as the high-water mark has increased for the ranges that particular node is processing. If a changefeed is resumed, then `min_checkpoint_frequency` is the amount of time that changefeed will need to catch up. That is, it could emit duplicate messages during this time.

**Note:** [`resolved`](#resolved-option) messages will not be emitted more frequently than the configured `min_checkpoint_frequency` (but may be emitted less frequently). Since `min_checkpoint_frequency` defaults to `30s`, you **must** configure `min_checkpoint_frequency` to at least the desired `resolved` message frequency if you require `resolved` messages more frequently than `30s`.

**Default:** `30s` `mvcc_timestamp` | N/A | Include the [MVCC]({% link {{ page.version.version }}/architecture/storage-layer.md %}#mvcc) timestamp for each emitted row in a changefeed. With the `mvcc_timestamp` option, each emitted row will always contain its MVCC timestamp, even during the changefeed's initial backfill. `resolved` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Emit [resolved timestamps]({% link {{ page.version.version }}/changefeed-messages.md %}#resolved-messages) for the changefeed. Resolved timestamps do not emit until all ranges in the changefeed have progressed to a specific point in time.

Set a minimum amount of time that the changefeed's high-water mark (overall resolved timestamp) must advance by before another resolved timestamp is emitted. Example: `resolved='10s'`. This option will **only** emit a resolved timestamp if the timestamp has advanced (and by at least the optional duration, if set). If a duration is unspecified, all resolved timestamps are emitted as the high-water mark advances.

**Note:** If you set `resolved` lower than `30s`, then you **must** also set the [`min_checkpoint_frequency`](#min-checkpoint-frequency) option to at minimum the same value as `resolved`, because `resolved` messages may be emitted less frequently than `min_checkpoint_frequency`, but cannot be emitted more frequently.

Refer to [Resolved messages]({% link {{ page.version.version }}/changefeed-messages.md %}#resolved-messages) for more detail. `split_column_families` | N/A | Target a table with multiple columns families. Emit messages for each column family in the target table. Each message will include the label: `table.family`. @@ -95,14 +99,14 @@ To start a changefeed: EXPERIMENTAL CHANGEFEED FOR cdc_test; ~~~ -In the terminal where the basic changefeed is streaming, the output will appear: +In the terminal where the sinkless changefeed is streaming, the output will appear: ~~~ table,key,value cdc_test,[0],"{""after"": {""a"": 0}}" ~~~ -For step-by-step guidance on creating a basic changefeed, see the [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}) page. +For step-by-step guidance on creating a sinkless changefeed, see the [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}) page. ### Create a changefeed with Avro @@ -113,14 +117,14 @@ To start a changefeed in Avro format: EXPERIMENTAL CHANGEFEED FOR cdc_test WITH format = avro, confluent_schema_registry = 'http://localhost:8081'; ~~~ -In the terminal where the basic changefeed is streaming, the output will appear: +In the terminal where the sinkless changefeed is streaming, the output will appear: ~~~ table,key,value cdc_test,\000\000\000\000\001\002\000,\000\000\000\000\002\002\002\000 ~~~ -For step-by-step guidance on creating a basic changefeed with Avro, see the [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}) page. +For step-by-step guidance on creating a sinkless changefeed with Avro, see the [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}) page. ### Create a changefeed on a table with column families @@ -138,7 +142,7 @@ To create a changefeed on a table and output changes for each column family, use EXPERIMENTAL CHANGEFEED FOR TABLE cdc_test WITH split_column_families; ~~~ -For step-by-step guidance creating a basic changefeed on a table with multiple column families, see the [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}) page. +For step-by-step guidance creating a sinkless changefeed on a table with multiple column families, see the [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}) page. ## See also diff --git a/src/current/v25.2/changefeed-messages.md b/src/current/v25.2/changefeed-messages.md index 9a755db4b31..616bf71ca7f 100644 --- a/src/current/v25.2/changefeed-messages.md +++ b/src/current/v25.2/changefeed-messages.md @@ -99,7 +99,7 @@ As an example, you run the following sequence of SQL statements to create a chan {"after": {"id": 4, "name": "Danny", "office": "los angeles"}, "key": [4], "updated": "1701102561022789676.0000000000"} ~~~ - The messages received at the sink are in order by timestamp **for each key**. Here, the update for key `[1]` is emitted before the insertion of key `[2]` even though the timestamp for the update to key `[1]` is higher. That is, if you follow the sequence of updates for a particular key at the sink, they will be in the correct timestamp order. However, if a changefeed starts to re-emit messages after the last [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}), it may not emit all duplicate messages between the first duplicate message and new updates to the table. For details on when changefeeds might re-emit messages, refer to [Duplicate messages](#duplicate-messages). + The messages received at the sink are in order by timestamp **for each key**. Here, the update for key `[1]` is emitted before the insertion of key `[2]` even though the timestamp for the update to key `[1]` is higher. That is, if you follow the sequence of updates for a particular key at the sink, they will be in the correct timestamp order. However, if a changefeed starts to re-emit messages after the last [checkpoint]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}), it may not emit all duplicate messages between the first duplicate message and new updates to the table. For details on when changefeeds might re-emit messages, refer to [Duplicate messages](#duplicate-messages). The `updated` option adds an `updated` timestamp to each emitted row. You can also use the [`resolved` option](#resolved-messages) to emit a `resolved` timestamp message to each Kafka partition, or to a separate file at a cloud storage sink. A `resolved` timestamp guarantees that no (previously unseen) rows with a lower update timestamp will be emitted on that partition. @@ -185,9 +185,9 @@ In some unusual situations you may receive a delete message for a row without fi ## Resolved messages -When you create a changefeed with the [`resolved` option]({% link {{ page.version.version }}/create-changefeed.md %}#resolved), the changefeed will emit resolved timestamp messages in a format dependent on the connected [sink]({% link {{ page.version.version }}/changefeed-sinks.md %}). The resolved timestamp is the high-water mark that guarantees that no previously unseen rows with an [earlier update timestamp](#ordering-and-delivery-guarantees) will be emitted to the sink. That is, resolved timestamp messages do not emit until the changefeed job has reached a [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). +When you create a changefeed with the [`resolved` option]({% link {{ page.version.version }}/create-changefeed.md %}#resolved), the changefeed will emit resolved timestamp messages in a format dependent on the connected [sink]({% link {{ page.version.version }}/changefeed-sinks.md %}). The resolved timestamp is the high-water mark that guarantees that no previously unseen rows with an [earlier update timestamp](#ordering-and-delivery-guarantees) will be emitted to the sink. That is, resolved timestamp messages do not emit until the changefeed job has reached a [checkpoint]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). -When you specify the `resolved` option at changefeed creation, the [job's coordinating node]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) will send the resolved timestamp to each endpoint at the sink. For example, each [Kafka]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka) partition will receive a resolved timestamp message, or a [cloud storage sink]({% link {{ page.version.version }}/changefeed-sinks.md %}#cloud-storage-sink) will receive a resolved timestamp file. +When you specify the `resolved` option at changefeed creation, the [job's coordinating node]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) will send the resolved timestamp to each endpoint at the sink. For example, each [Kafka]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka) partition will receive a resolved timestamp message, or a [cloud storage sink]({% link {{ page.version.version }}/changefeed-sinks.md %}#cloud-storage-sink) will receive a resolved timestamp file. There are three different ways to configure resolved timestamp messages: @@ -452,7 +452,7 @@ The following sections outline the limitations and type mapping for relevant for ### Avro -The following sections provide information on Avro usage with CockroachDB changefeeds. Creating a changefeed using Avro is available in Core and {{ site.data.products.enterprise }} changefeeds with the [`confluent_schema_registry`](create-changefeed.html#confluent-schema-registry) option. +The following sections provide information on Avro usage with CockroachDB changefeeds. Creating a changefeed using Avro is available with the [`confluent_schema_registry`](create-changefeed.html#confluent-schema-registry) option. #### Avro limitations diff --git a/src/current/v25.2/changefeed-monitoring-guide.md b/src/current/v25.2/changefeed-monitoring-guide.md index b02c0c10d8b..28d0cec6af5 100644 --- a/src/current/v25.2/changefeed-monitoring-guide.md +++ b/src/current/v25.2/changefeed-monitoring-guide.md @@ -9,7 +9,7 @@ CockroachDB [changefeeds]({% link {{ page.version.version }}/change-data-capture This guide provides recommendations for monitoring and alerting on changefeeds throughout the pipeline to ensure reliable operation and quick problem detection. {{site.data.alerts.callout_success}} -For details on how changefeeds work as jobs in CockroachDB, refer to the [technical overview]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). +For details on how changefeeds work as jobs in CockroachDB, refer to the [technical overview]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). {{site.data.alerts.end}} ## Overview @@ -42,7 +42,7 @@ Metrics names in Prometheus replace the `.` with `_`. In Datadog, metrics names - Use with [metrics labels]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#using-changefeed-metrics-labels). - Investigation needed: If `changefeed.max_behind_nanos` is consistently increasing. - `(now() - changefeed.checkpoint_progress)` - - Description: The progress of changefeed [checkpointing]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). Indicates how recently the changefeed state was persisted durably. Critical for monitoring changefeed [recovery capability]({% link {{ page.version.version }}/changefeed-messages.md %}#duplicate-messages). + - Description: The progress of changefeed [checkpointing]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). Indicates how recently the changefeed state was persisted durably. Critical for monitoring changefeed [recovery capability]({% link {{ page.version.version }}/changefeed-messages.md %}#duplicate-messages). - Investigation needed: If checkpointing falls too far behind the current time. - Impact: - Slow processing of changes and updates to downstream sinks. diff --git a/src/current/v25.2/changefeed-sinks.md b/src/current/v25.2/changefeed-sinks.md index 6e64a755688..bc36b76a046 100644 --- a/src/current/v25.2/changefeed-sinks.md +++ b/src/current/v25.2/changefeed-sinks.md @@ -5,7 +5,7 @@ toc: true docs_area: stream_data --- -{{ site.data.products.enterprise }} changefeeds emit messages to configurable downstream sinks. This page details the URIs, parameters, and configurations available for each changefeed sink. +Changefeeds emit messages to configurable downstream sinks. This page details the URIs, parameters, and configurations available for each changefeed sink. CockroachDB supports the following sinks: diff --git a/src/current/v25.2/changefeeds-in-multi-region-deployments.md b/src/current/v25.2/changefeeds-in-multi-region-deployments.md index 13eb78fd8f6..af2aa349360 100644 --- a/src/current/v25.2/changefeeds-in-multi-region-deployments.md +++ b/src/current/v25.2/changefeeds-in-multi-region-deployments.md @@ -12,7 +12,7 @@ This page describes features that you can use for changefeeds running on multi-r ## Run a changefeed job by locality -Use the `execution_locality` option to set locality filter requirements that a node must meet to take part in executing a [changefeed]({% link {{ page.version.version }}/create-changefeed.md %}) job. This will pin the [coordination of the changefeed job]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) and the nodes that process the [changefeed messages]({% link {{ page.version.version }}/changefeed-messages.md %}) to the defined locality. +Use the `execution_locality` option to set locality filter requirements that a node must meet to take part in executing a [changefeed]({% link {{ page.version.version }}/create-changefeed.md %}) job. This will pin the [coordination of the changefeed job]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) and the nodes that process the [changefeed messages]({% link {{ page.version.version }}/changefeed-messages.md %}) to the defined locality. Defining an execution locality for a changefeed job, could be useful in the following cases: @@ -51,7 +51,7 @@ Once the coordinating node is determined, nodes that match the locality requirem When a node matching the locality filter takes part in the changefeed job, that node will read from the closest [replica]({% link {{ page.version.version }}/architecture/reads-and-writes-overview.md %}#architecture-replica). If the node is a replica, it can read from itself. In the scenario where no replicas are available in the region of the assigned node, it may then read from a replica in a different region. As a result, you may want to consider [placing replicas]({% link {{ page.version.version }}/configure-replication-zones.md %}), including potentially [non-voting replicas]({% link {{ page.version.version }}/architecture/replication-layer.md %}#non-voting-replicas) that will have less impact on read latency, in the locality or region that you plan on pinning for changefeed job execution. -For an overview of how a changefeed job works, refer to the [How does an Enterprise changefeed work?]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) page. +For an overview of how a changefeed job works, refer to the [How does a changefeed work?]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) page. ## Run changefeeds on regional by row tables diff --git a/src/current/v25.2/changefeeds-on-tables-with-column-families.md b/src/current/v25.2/changefeeds-on-tables-with-column-families.md index ffc0a2e5a32..58117caeb13 100644 --- a/src/current/v25.2/changefeeds-on-tables-with-column-families.md +++ b/src/current/v25.2/changefeeds-on-tables-with-column-families.md @@ -28,7 +28,7 @@ CREATE CHANGEFEED FOR TABLE {table} FAMILY {family} INTO {sink}; ~~~ {{site.data.alerts.callout_info}} -You can also use [basic changefeeds]({% link {{ page.version.version }}/changefeeds-on-tables-with-column-families.md %}?filters=core#create-a-basic-changefeed-on-a-table-with-column-families) on tables with column families by using the [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.version.version }}/changefeed-for.md %}) statement with `split_column_families` or the `FAMILY` keyword. +You can also use [sinkless changefeeds]({% link {{ page.version.version }}/changefeeds-on-tables-with-column-families.md %}?filters=sinkless#create-a-sinkless-changefeed-on-a-table-with-column-families) on tables with column families by using the [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}) statement without a sink with `split_column_families` or the `FAMILY` keyword. {{site.data.alerts.end}} If a table has multiple column families, the `FAMILY` keyword will ensure the changefeed emits messages for **each** column family you define with `FAMILY` in the `CREATE CHANGEFEED` statement. If you do not specify `FAMILY`, then the changefeed will emit messages for **all** the table's column families. @@ -83,21 +83,17 @@ The output shows the `primary` column family with `4` in the value (`{"id":4,"na - Creating a changefeed with [CDC queries]({% link {{ page.version.version }}/cdc-queries.md %}) is not supported on tables with more than one column family. - When you create a changefeed on a table with more than one column family, the changefeed will emit messages per column family in separate streams. As a result, [changefeed messages]({% link {{ page.version.version }}/changefeed-messages.md %}) for different column families will arrive at the [sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) under separate topics. For more details, refer to [Message format](#message-format). -For examples of starting changefeeds on tables with column families, see the following examples for Enterprise and basic changefeeds. -
- - + +
-
+
## Create a changefeed on a table with column families In this example, you'll set up changefeeds on two tables that have [column families]({% link {{ page.version.version }}/column-families.md %}). You'll use a single-node cluster sending changes to a webhook sink for this example, but you can use any [changefeed sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) to work with tables that include column families. -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} @@ -112,18 +108,6 @@ In this example, you'll set up changefeeds on two tables that have [column famil cockroach sql --insecure ~~~ -1. Set your organization and license key: - - {% include_cached copy-clipboard.html %} - ~~~ sql - SET CLUSTER SETTING cluster.organization = ''; - ~~~ - - {% include_cached copy-clipboard.html %} - ~~~ sql - SET CLUSTER SETTING enterprise.license = ''; - ~~~ - 1. Enable the `kv.rangefeed.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}): {% include_cached copy-clipboard.html %} @@ -299,11 +283,11 @@ In this example, you'll set up changefeeds on two tables that have [column famil
-
+
-## Create a basic changefeed on a table with column families +## Create a sinkless changefeed on a table with column families -In this example, you'll set up basic changefeeds on two tables that have [column families]({% link {{ page.version.version }}/column-families.md %}). You'll use a single-node cluster with the basic changefeed sending changes to the client. +In this example, you'll set up a sinkless changefeed on two tables that have [column families]({% link {{ page.version.version }}/column-families.md %}). You'll use a single-node cluster with the changefeed sending changes to the client. 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: @@ -385,7 +369,7 @@ In this example, you'll set up basic changefeeds on two tables that have [column {% include_cached copy-clipboard.html %} ~~~ sql - EXPERIMENTAL CHANGEFEED FOR TABLE office_dogs FAMILY employee; + CREATE CHANGEFEED FOR TABLE office_dogs FAMILY employee; ~~~ You'll receive one message for each of the inserts that affects the specified column family: @@ -406,7 +390,7 @@ In this example, you'll set up basic changefeeds on two tables that have [column {% include_cached copy-clipboard.html %} ~~~ sql - EXPERIMENTAL CHANGEFEED FOR TABLE office_dogs FAMILY employee, TABLE office_plants FAMILY dog_friendly; + CREATE CHANGEFEED FOR TABLE office_dogs FAMILY employee, TABLE office_plants FAMILY dog_friendly; ~~~ You'll receive one message for each insert that affects the specified column families: @@ -428,14 +412,14 @@ In this example, you'll set up basic changefeeds on two tables that have [column {{site.data.alerts.callout_info}} To create a changefeed specifying two families on **one** table, ensure that you define the table and family in both instances: - `EXPERIMENTAL CHANGEFEED FOR TABLE office_dogs FAMILY employee, TABLE office_dogs FAMILY dogs;` + `CREATE CHANGEFEED FOR TABLE office_dogs FAMILY employee, TABLE office_dogs FAMILY dogs;` {{site.data.alerts.end}} 1. To create a changefeed that emits messages for all column families in a table, use the [`split_column_families`]({% link {{ page.version.version }}/changefeed-for.md %}#split-column-families) option: {% include_cached copy-clipboard.html %} ~~~ sql - EXPERIMENTAL CHANGEFEED FOR TABLE office_dogs WITH split_column_families; + CREATE CHANGEFEED FOR TABLE office_dogs WITH split_column_families; ~~~ In your other terminal window, insert some more values: diff --git a/src/current/v25.2/connect-to-a-changefeed-kafka-sink-with-oauth-using-okta.md b/src/current/v25.2/connect-to-a-changefeed-kafka-sink-with-oauth-using-okta.md index 8e2590bad9d..8a07f935ca6 100644 --- a/src/current/v25.2/connect-to-a-changefeed-kafka-sink-with-oauth-using-okta.md +++ b/src/current/v25.2/connect-to-a-changefeed-kafka-sink-with-oauth-using-okta.md @@ -5,7 +5,7 @@ toc: true docs_area: stream_data --- -CockroachDB {{ site.data.products.enterprise }} [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) can stream change data out to [Apache Kafka](https://kafka.apache.org/) using OAuth authentication. +CockroachDB [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) can stream change data out to [Apache Kafka](https://kafka.apache.org/) using OAuth authentication. {% include {{ page.version.version }}/cdc/oauth-description.md %} diff --git a/src/current/v25.2/create-and-configure-changefeeds.md b/src/current/v25.2/create-and-configure-changefeeds.md index de4c4651ee1..ed508a64820 100644 --- a/src/current/v25.2/create-and-configure-changefeeds.md +++ b/src/current/v25.2/create-and-configure-changefeeds.md @@ -1,11 +1,11 @@ --- title: Create and Configure Changefeeds -summary: Create and configure a changefeed job for Core and Enterprise. +summary: Create and configure a changefeed emitting to a sink or a sinkless changefeed. toc: true docs_area: stream_data --- -Core and {{ site.data.products.enterprise }} changefeeds offer different levels of configurability. {{ site.data.products.enterprise }} changefeeds allow for active changefeed jobs to be [paused](#pause), [resumed](#resume), and [canceled](#cancel). +Changefeeds offer different levels of configurability. Changefeeds emitting to a sink allow for active changefeed jobs to be [paused](#pause), [resumed](#resume), and [canceled](#cancel). Sinkless changefeeds stream changes directly to the SQL session. This page describes: @@ -15,10 +15,10 @@ This page describes: ## Before you create a changefeed 1. Enable rangefeeds on CockroachDB {{ site.data.products.advanced }} and CockroachDB {{ site.data.products.core }}. Refer to [Enable rangefeeds](#enable-rangefeeds) for instructions. -1. Decide on whether you will run an {{ site.data.products.enterprise }} or basic changefeed. Refer to the [Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %}) page for a comparative capability table. +1. Decide on whether you will run a changefeed that emits to a sink or a sinkless changefeed. Refer to the [Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %}) page for a comparative capability table. 1. Plan the number of changefeeds versus the number of tables to include in a single changefeed for your cluster. {% include {{ page.version.version }}/cdc/changefeed-number-limit.md %} Refer to [System resources and running changefeeds]({% link {{ page.version.version }}/changefeed-best-practices.md %}#maintain-system-resources-and-running-changefeeds) and [Recommendations for the number of target tables]({% link {{ page.version.version }}/changefeed-best-practices.md %}#plan-the-number-of-watched-tables-for-a-single-changefeed). - {% include common/cdc-cloud-costs-link.md %} -1. Consider whether your {{ site.data.products.enterprise }} [changefeed use case](#create) would be better served by [change data capture queries]({% link {{ page.version.version }}/cdc-queries.md %}) that can filter data on a single table. CDC queries can improve the efficiency of changefeeds because the job will not need to encode as much change data. +1. Consider whether your [changefeed use case](#create) would be better served by [change data capture queries]({% link {{ page.version.version }}/cdc-queries.md %}) that can filter data on a single table. CDC queries can improve the efficiency of changefeeds because the job will not need to encode as much change data. 1. Read the following: - The [Changefeed Best Practices]({% link {{ page.version.version }}/changefeed-best-practices.md %}) reference for details on planning changefeeds, monitoring basics, and schema changes. - The [Considerations](#considerations) section that provides information on changefeed interactions that could affect how you configure or run your changefeed. @@ -34,7 +34,7 @@ Changefeeds connect to a long-lived request called a _rangefeed_, which pushes c SET CLUSTER SETTING kv.rangefeed.enabled = true; ~~~ -Any created changefeeds will error until this setting is enabled. If you are working on a CockroachDB Serverless cluster, the `kv.rangefeed.enabled` cluster setting is enabled by default. +Any created changefeeds will error until this setting is enabled. If you are working on a CockroachDB {{ site.data.products.basic }} or {{ site.data.products.standard }} cluster, the `kv.rangefeed.enabled` cluster setting is enabled by default. Enabling rangefeeds has a small performance cost (about a 5–10% increase in write latencies), whether or not the rangefeed is being used in a changefeed. When `kv.rangefeed.enabled` is set to `true`, a small portion of the latency cost is caused by additional write event information that is sent to the [Raft log]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft-logs) and for [replication]({% link {{ page.version.version }}/architecture/replication-layer.md %}). The remainder of the latency cost is incurred once a changefeed is running; the write event information is reconstructed and sent to an active rangefeed, which will push the event to the changefeed. @@ -53,46 +53,39 @@ For further detail on performance-related configuration, refer to the [Advanced - After you [restore from a full-cluster backup]({% link {{ page.version.version }}/restore.md %}#full-cluster), changefeed jobs will **not** resume on the new cluster. It is necessary to manually create the changefeeds following the full-cluster restore. - {% include {{ page.version.version }}/cdc/virtual-computed-column-cdc.md %} -The following Enterprise and Core sections outline how to create and configure each type of changefeed: +The following sections outline how to create and configure each type of changefeed:
- - + +
-
+
## Configure a changefeed -An {{ site.data.products.enterprise }} changefeed streams row-level changes in a [configurable format]({% link {{ page.version.version }}/changefeed-messages.md %}) to one of the following sinks: +A changefeed streams row-level changes in a [configurable format]({% link {{ page.version.version }}/changefeed-messages.md %}) to one of the following sinks: {% include {{ page.version.version }}/cdc/sink-list.md %} -You can [create](#create), [pause](#pause), [resume](#resume), and [cancel](#cancel) an {{ site.data.products.enterprise }} changefeed. For a step-by-step example connecting to a specific sink, see the [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}) page. +You can [create](#create), [pause](#pause), [resume](#resume), and [cancel](#cancel) a changefeed emitting messages to a sink. For a step-by-step example connecting to a specific sink, see the [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}) page. ### Create -To create an {{ site.data.products.enterprise }} changefeed: +To create a changefeed: {% include_cached copy-clipboard.html %} ~~~ sql -CREATE CHANGEFEED FOR TABLE table_name, table_name2 INTO '{scheme}://{host}:{port}?{query_parameters}'; +CREATE CHANGEFEED FOR TABLE table_name, table_name2 INTO '{scheme}://{sink_host}:{port}?{query_parameters}'; ~~~ {% include {{ page.version.version }}/cdc/url-encoding.md %} -When you create a changefeed **without** specifying a sink, CockroachDB sends the changefeed events to the SQL client. Consider the following regarding the [display format]({% link {{ page.version.version }}/cockroach-sql.md %}#sql-flag-format) in your SQL client: - -- If you do not define a display format, the CockroachDB SQL client will automatically use `ndjson` format. -- If you specify a display format, the client will use that format (e.g., `--format=csv`). -- If you set the client display format to `ndjson` and set the changefeed [`format`]({% link {{ page.version.version }}/create-changefeed.md %}#format) to `csv`, you'll receive JSON format with CSV nested inside. -- If you set the client display format to `csv` and set the changefeed [`format`]({% link {{ page.version.version }}/create-changefeed.md %}#format) to `json`, you'll receive a comma-separated list of JSON values. - For more information, see [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}). ### Show -To show a list of {{ site.data.products.enterprise }} changefeed jobs: +To show a list of changefeed jobs: {% include {{ page.version.version }}/cdc/show-changefeed-job.md %} @@ -104,7 +97,7 @@ For more information, refer to [`SHOW CHANGEFEED JOB`]({% link {{ page.version.v ### Pause -To pause an {{ site.data.products.enterprise }} changefeed: +To pause a changefeed: {% include_cached copy-clipboard.html %} ~~~ sql @@ -115,7 +108,7 @@ For more information, refer to [`PAUSE JOB`]({% link {{ page.version.version }}/ ### Resume -To resume a paused {{ site.data.products.enterprise }} changefeed: +To resume a paused changefeed: {% include_cached copy-clipboard.html %} ~~~ sql @@ -126,7 +119,7 @@ For more information, refer to [`RESUME JOB`]({% link {{ page.version.version }} ### Cancel -To cancel an {{ site.data.products.enterprise }} changefeed: +To cancel a changefeed: {% include_cached copy-clipboard.html %} ~~~ sql @@ -145,20 +138,25 @@ For more information, refer to [`CANCEL JOB`]({% link {{ page.version.version }}
-
- -## Create a changefeed +
-A basic changefeed streams row-level changes to the client indefinitely until the underlying connection is closed or the changefeed is canceled. +## Create a sinkless changefeed -To create a basic changefeed: +When you create a changefeed **without** specifying a sink (a sinkless changefeed), CockroachDB sends the changefeed events to the SQL client indefinitely until the underlying connection is closed or the changefeed is canceled: {% include_cached copy-clipboard.html %} ~~~ sql -EXPERIMENTAL CHANGEFEED FOR table_name; +CREATE CHANGEFEED FOR TABLE table_name, table_name2; ~~~ -For more information, see [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.version.version }}/changefeed-for.md %}). +Consider the following regarding the [display format]({% link {{ page.version.version }}/cockroach-sql.md %}#sql-flag-format) in your SQL client: + +- If you do not define a display format, the CockroachDB SQL client will automatically use `ndjson` format. +- If you specify a display format, the client will use that format (e.g., `--format=csv`). +- If you set the client display format to `ndjson` and set the changefeed [`format`]({% link {{ page.version.version }}/create-changefeed.md %}#format) to `csv`, you'll receive JSON format with CSV nested inside. +- If you set the client display format to `csv` and set the changefeed [`format`]({% link {{ page.version.version }}/create-changefeed.md %}#format) to `json`, you'll receive a comma-separated list of JSON values. + +For more information, see [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}#create-a-sinkless-changefeed).
@@ -172,5 +170,4 @@ For more information, see [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.versio ## See also - [`SHOW JOBS`]({% link {{ page.version.version }}/show-jobs.md %}) -- [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.version.version }}/changefeed-for.md %}) - [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}) diff --git a/src/current/v25.2/create-changefeed.md b/src/current/v25.2/create-changefeed.md index 210626f5be7..4da4f14c6df 100644 --- a/src/current/v25.2/create-changefeed.md +++ b/src/current/v25.2/create-changefeed.md @@ -5,9 +5,11 @@ toc: true docs_area: reference.sql --- -The `CREATE CHANGEFEED` [statement]({% link {{ page.version.version }}/sql-statements.md %}) creates a new {{ site.data.products.enterprise }} changefeed, which targets an allowlist of tables called "watched rows". Every change to a watched row is emitted as a record in a configurable format (`JSON` or Avro) to a configurable sink ([Kafka](https://kafka.apache.org/), [Google Cloud Pub/Sub](https://cloud.google.com/pubsub), a [cloud storage sink]({% link {{ page.version.version }}/changefeed-sinks.md %}#cloud-storage-sink), or a [webhook sink]({% link {{ page.version.version }}/changefeed-sinks.md %}#webhook-sink)). You can [create](#examples), [pause](#pause-a-changefeed), [resume](#resume-a-paused-changefeed), [alter]({% link {{ page.version.version }}/alter-changefeed.md %}), or [cancel](#cancel-a-changefeed) an {{ site.data.products.enterprise }} changefeed. +The `CREATE CHANGEFEED` [statement]({% link {{ page.version.version }}/sql-statements.md %}) creates a new changefeed, which targets an allowlist of tables called "watched rows". Every change to a watched row is emitted as a record in a configurable format (`JSON` or Avro) to a [configurable sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) or directly to the SQL session. -To get started with changefeeds, refer to the [Create and Configure Changefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}) page for important usage considerations. For detail on how changefeeds emit messages, refer to the [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}) page. +When a changefeed emits messages to a sink, it works as a [job]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). You can [create](#examples), [pause](#pause-a-changefeed), [resume](#resume-a-paused-changefeed), [alter]({% link {{ page.version.version }}/alter-changefeed.md %}), or [cancel](#cancel-a-changefeed) a changefeed job. + +To get started with changefeeds, refer to the [Create and Configure Changefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}) page for important usage considerations. For details on how changefeeds emit messages, refer to the [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}) page. The [examples](#examples) on this page provide the foundational syntax of the `CREATE CHANGEFEED` statement. For examples on more specific use cases with changefeeds, refer to the following pages: @@ -131,7 +133,7 @@ Option | Value | Description `lagging_ranges_threshold` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Set a duration from the present that determines the length of time a range is considered to be lagging behind, which will then track in the [`lagging_ranges`]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#lagging-ranges-metric) metric. Note that ranges undergoing an [initial scan](#initial-scan) for longer than the threshold duration are considered to be lagging. Starting a changefeed with an initial scan on a large table will likely increment the metric for each range in the table. As ranges complete the initial scan, the number of ranges lagging behind will decrease.

**Default:** `3m` `lagging_ranges_polling_interval` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Set the interval rate for when lagging ranges are checked and the `lagging_ranges` metric is updated. Polling adds latency to the `lagging_ranges` metric being updated. For example, if a range falls behind by 3 minutes, the metric may not update until an additional minute afterward.

**Default:** `1m` `metrics_label` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Define a metrics label to which the metrics for one or multiple changefeeds increment. All changefeeds also have their metrics aggregated.

The maximum length of a label is 128 bytes. There is a limit of 1024 unique labels.

`WITH metrics_label=label_name`

For more detail on usage and considerations, see [Using changefeed metrics labels]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#using-changefeed-metrics-labels). -`min_checkpoint_frequency` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Controls how often a node's changefeed [aggregator]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) will flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). A node's changefeed aggregator will wait at least the specified duration between sending progress updates for the ranges it is watching to the coordinator. This can help you control the flush frequency of higher latency sinks to achieve better throughput. However, more frequent checkpointing can increase CPU usage. If this is set to `0s`, a node will flush messages as long as the high-water mark has increased for the ranges that particular node is processing. If a changefeed is resumed, then `min_checkpoint_frequency` is the amount of time that changefeed will need to catch up. That is, it could emit [duplicate messages]({% link {{ page.version.version }}/changefeed-messages.md %}#duplicate-messages) during this time.

**Note:** [`resolved`](#resolved) messages will not be emitted more frequently than the configured `min_checkpoint_frequency` (but may be emitted less frequently). If you require `resolved` messages more frequently than `30s`, you must configure `min_checkpoint_frequency` to at least the desired `resolved` message frequency. For more details, refer to [Resolved message frequency]({% link {{ page.version.version }}/changefeed-messages.md %}#resolved-timestamp-frequency).

**Default:** `30s` +`min_checkpoint_frequency` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Controls how often a node's changefeed [aggregator]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) will flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). A node's changefeed aggregator will wait at least the specified duration between sending progress updates for the ranges it is watching to the coordinator. This can help you control the flush frequency of higher latency sinks to achieve better throughput. However, more frequent checkpointing can increase CPU usage. If this is set to `0s`, a node will flush messages as long as the high-water mark has increased for the ranges that particular node is processing. If a changefeed is resumed, then `min_checkpoint_frequency` is the amount of time that changefeed will need to catch up. That is, it could emit [duplicate messages]({% link {{ page.version.version }}/changefeed-messages.md %}#duplicate-messages) during this time.

**Note:** [`resolved`](#resolved) messages will not be emitted more frequently than the configured `min_checkpoint_frequency` (but may be emitted less frequently). If you require `resolved` messages more frequently than `30s`, you must configure `min_checkpoint_frequency` to at least the desired `resolved` message frequency. For more details, refer to [Resolved message frequency]({% link {{ page.version.version }}/changefeed-messages.md %}#resolved-timestamp-frequency).

**Default:** `30s` `mvcc_timestamp` | N/A | Include the [MVCC]({% link {{ page.version.version }}/architecture/storage-layer.md %}#mvcc) timestamp for each emitted row in a changefeed. With the `mvcc_timestamp` option, each emitted row will always contain its MVCC timestamp, even during the changefeed's initial backfill. `on_error` | `pause` / `fail` | Use `on_error=pause` to pause the changefeed when encountering **non**-retryable errors. `on_error=pause` will pause the changefeed instead of sending it into a terminal failure state. **Note:** Retryable errors will continue to be retried with this option specified.

Use with [`protect_data_from_gc_on_pause`](#protect-data-from-gc-on-pause) to protect changes from [garbage collection]({% link {{ page.version.version }}/configure-replication-zones.md %}#gc-ttlseconds).

If a changefeed with `on_error=pause` is running when a watched table is [truncated]({% link {{ page.version.version }}/truncate.md %}), the changefeed will pause but will not be able to resume reads from that table. Using [`ALTER CHANGEFEED`]({% link {{ page.version.version }}/alter-changefeed.md %}) to drop the table from the changefeed and then [resuming the job]({% link {{ page.version.version }}/resume-job.md %}) will work, but you cannot add the same table to the changefeed again. Instead, you will need to [create a new changefeed](#start-a-new-changefeed-where-another-ended) for that table.

Default: `on_error=fail` `protect_data_from_gc_on_pause` | N/A | This option is deprecated as of v23.2 and will be removed in a future release.

When a [changefeed is paused]({% link {{ page.version.version }}/pause-job.md %}), ensure that the data needed to [resume the changefeed]({% link {{ page.version.version }}/resume-job.md %}) is not garbage collected. If `protect_data_from_gc_on_pause` is **unset**, pausing the changefeed will release the existing protected timestamp records. It is also important to note that pausing and adding `protect_data_from_gc_on_pause` to a changefeed will not protect data if the [garbage collection]({% link {{ page.version.version }}/configure-replication-zones.md %}#gc-ttlseconds) window has already passed.

Use with [`on_error=pause`](#on-error) to protect changes from garbage collection when encountering non-retryable errors.

Refer to [Protect Changefeed Data from Garbage Collection]({% link {{ page.version.version }}/protect-changefeed-data.md %}) for more detail on protecting changefeed data.

**Note:** If you use this option, changefeeds that are left paused for long periods of time can prevent garbage collection. Use with the [`gc_protect_expires_after`](#gc-protect-expires-after) option to set a limit for protected data and for how long a changefeed will remain paused. @@ -238,7 +240,7 @@ CREATE CHANGEFEED INTO 'scheme://host:port' WHERE status = 'lost'; ~~~ -CDC queries can only run on a single table per changefeed and require an {{ site.data.products.enterprise }} license. +CDC queries can only run on a single table per changefeed. ### Create a sinkless changefeed @@ -250,9 +252,7 @@ CREATE CHANGEFEED FOR TABLE table_name, table_name2, table_name3 WITH updated, resolved; ~~~ -Sinkless changefeeds do not require an {{ site.data.products.enterprise }} license; however, a sinkless changefeed with CDC queries **does** require an {{ site.data.products.enterprise }} license. - -To create a sinkless changefeed using CDC queries: +To create a sinkless changefeed using [CDC queries]({% link {{ page.version.version }}/cdc-queries.md %}): {% include_cached copy-clipboard.html %} ~~~ sql @@ -296,7 +296,7 @@ For guidance on how to filter changefeed messages to emit [row-level TTL]({% lin ### Manage a changefeed - For {{ site.data.products.enterprise }} changefeeds, use [`SHOW CHANGEFEED JOBS`]({% link {{ page.version.version }}/show-jobs.md %}) to check the status of your changefeed jobs: +For changefeed jobs, use [`SHOW CHANGEFEED JOBS`]({% link {{ page.version.version }}/show-jobs.md %}) to check the status: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v25.2/export-data-with-changefeeds.md b/src/current/v25.2/export-data-with-changefeeds.md index ae5af42ad1b..fd8923d2235 100644 --- a/src/current/v25.2/export-data-with-changefeeds.md +++ b/src/current/v25.2/export-data-with-changefeeds.md @@ -5,15 +5,15 @@ toc: true docs_area: stream_data --- -When you create an {{ site.data.products.enterprise }} changefeed, you can include the [`initial_scan = 'only'`]({% link {{ page.version.version }}/create-changefeed.md %}#initial-scan) option to specify that the changefeed should only complete a table scan. The changefeed emits messages for the table scan and then the job completes with a `succeeded` status. As a result, you can create a changefeed with `initial_scan = 'only'` to [`EXPORT`]({% link {{ page.version.version }}/export.md %}) data out of your database. +When you create a changefeed, you can include the [`initial_scan = 'only'`]({% link {{ page.version.version }}/create-changefeed.md %}#initial-scan) option to specify that the changefeed should only complete a table scan. The changefeed emits messages for the table scan and then the job completes with a `succeeded` status. As a result, you can create a changefeed with `initial_scan = 'only'` to [`EXPORT`]({% link {{ page.version.version }}/export.md %}) data out of your database. -You can also [schedule a changefeed](#create-a-scheduled-changefeed-to-export-filtered-data) to use a changefeed initial scan for exporting data on a regular cadence. +You can also [schedule a changefeed](#create-a-scheduled-changefeed-to-export-filtered-data) that is emitting messages to a downstream sink, which allows you to use a changefeed initial scan for exporting data on a regular cadence. The benefits of using changefeeds for this use case instead of [export]({% link {{ page.version.version }}/export.md %}), include: - Changefeeds are jobs, which can be [paused]({% link {{ page.version.version }}/pause-job.md %}), [resumed]({% link {{ page.version.version }}/resume-job.md %}), [cancelled]({% link {{ page.version.version }}/cancel-job.md %}), [scheduled]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}), and [altered]({% link {{ page.version.version }}/alter-changefeed.md %}). - There is observability into a changefeed job using [`SHOW CHANGEFEED JOBS`]({% link {{ page.version.version }}/show-jobs.md %}#show-changefeed-jobs) and the [Changefeeds Dashboard]({% link {{ page.version.version }}/ui-cdc-dashboard.md %}) in the DB Console. -- Changefeed jobs have built-in [checkpointing]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) and [retries]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#changefeed-retry-errors). +- Changefeed jobs have built-in [checkpointing]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) and [retries]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#changefeed-retry-errors). - [Changefeed sinks]({% link {{ page.version.version }}/changefeed-sinks.md %}) provide additional endpoints for your data. - You can use the [`format=csv`]({% link {{ page.version.version }}/create-changefeed.md %}#format) option with `initial_scan= 'only'` to emit messages in CSV format. diff --git a/src/current/v25.2/how-does-an-enterprise-changefeed-work.md b/src/current/v25.2/how-does-a-changefeed-work.md similarity index 77% rename from src/current/v25.2/how-does-an-enterprise-changefeed-work.md rename to src/current/v25.2/how-does-a-changefeed-work.md index 738c570902a..b58be3b1303 100644 --- a/src/current/v25.2/how-does-an-enterprise-changefeed-work.md +++ b/src/current/v25.2/how-does-a-changefeed-work.md @@ -1,11 +1,11 @@ --- -title: How Does an Enterprise Changefeed Work? +title: How Does a Changefeed Work? summary: Stream data out of CockroachDB with efficient, distributed, row-level change subscriptions (changefeeds). toc: true docs_area: stream_data --- -When an {{ site.data.products.enterprise }} changefeed is started on a node, that node becomes the _coordinator_ for the changefeed job (**Node 2** in the diagram). The coordinator node acts as an administrator: keeping track of all other nodes during job execution and the changefeed work as it completes. The changefeed job will run across nodes in the cluster to access changed data in the watched table. The job will evenly distribute changefeed work across the cluster by assigning it to any [replica]({% link {{ page.version.version }}/architecture/replication-layer.md %}) for a particular range, which determines the node that will emit the changefeed data. If a [locality filter]({% link {{ page.version.version }}/changefeeds-in-multi-region-deployments.md %}#run-a-changefeed-job-by-locality) is specified, work is distributed to a node from those that match the locality filter and has the most locality tiers in common with a node that has a replica. +When a changefeed that will emit changes to a sink is started on a node, that node becomes the _coordinator_ for the changefeed job (**Node 2** in the diagram). The coordinator node acts as an administrator: keeping track of all other nodes during job execution and the changefeed work as it completes. The changefeed job will run across nodes in the cluster to access changed data in the watched table. The job will evenly distribute changefeed work across the cluster by assigning it to any [replica]({% link {{ page.version.version }}/architecture/replication-layer.md %}) for a particular range, which determines the node that will emit the changefeed data. If a [locality filter]({% link {{ page.version.version }}/changefeeds-in-multi-region-deployments.md %}#run-a-changefeed-job-by-locality) is specified, work is distributed to a node from those that match the locality filter and has the most locality tiers in common with a node that has a replica. Each node uses its _aggregator processors_ to send back checkpoint progress to the coordinator, which gathers this information to update the _high-water mark timestamp_. The high-water mark acts as a checkpoint for the changefeed’s job progress, and guarantees that all changes before (or at) the timestamp have been emitted. In the unlikely event that the changefeed’s coordinating node were to fail during the job, that role will move to a different node and the changefeed will restart from the last checkpoint. If restarted, the changefeed may [re-emit messages]({% link {{ page.version.version }}/changefeed-messages.md %}#duplicate-messages) starting at the high-water mark time to the current time. Refer to [Ordering Guarantees]({% link {{ page.version.version }}/changefeed-messages.md %}#ordering-and-delivery-guarantees) for detail on CockroachDB's at-least-once-delivery-guarantee and how per-key message ordering is applied. diff --git a/src/current/v25.2/monitor-and-debug-changefeeds.md b/src/current/v25.2/monitor-and-debug-changefeeds.md index e065a852baf..cbcbc098b5d 100644 --- a/src/current/v25.2/monitor-and-debug-changefeeds.md +++ b/src/current/v25.2/monitor-and-debug-changefeeds.md @@ -6,7 +6,7 @@ docs_area: stream_data --- {{site.data.alerts.callout_info}} -Monitoring is only available for [{{ site.data.products.enterprise }} changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}#stream-row-level-changes-with-changefeeds). +Monitoring is only available for [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}#stream-row-level-changes-with-changefeeds) that emit messages to a [sink]({% link {{ page.version.version }}/changefeed-sinks.md %}). {{site.data.alerts.end}} Changefeeds work as jobs in CockroachDB, which allows for [monitoring](#monitor-a-changefeed) and [debugging](#debug-a-changefeed) through the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}) [**Jobs**]({% link {{ page.version.version }}/ui-jobs-page.md %}) page and [`SHOW JOBS`]({% link {{ page.version.version }}/show-jobs.md %}) SQL statements using the job ID. @@ -28,7 +28,7 @@ We recommend monitoring changefeeds with [Prometheus]({% link {{ page.version.ve ## Monitor a changefeed -Changefeed progress is exposed as a [high-water timestamp]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) that advances as the changefeed progresses. This is a guarantee that all changes before or at the timestamp have been emitted. You can monitor a changefeed: +Changefeed progress is exposed as a [high-water timestamp]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) that advances as the changefeed progresses. This is a guarantee that all changes before or at the timestamp have been emitted. You can monitor a changefeed: - On the [**Changefeeds** dashboard]({% link {{ page.version.version }}/ui-cdc-dashboard.md %}) of the DB Console. - On the [**Jobs** page]({% link {{ page.version.version }}/ui-jobs-page.md %}) of the DB Console. Hover over the high-water timestamp column to view the [system time]({% link {{ page.version.version }}/as-of-system-time.md %}). @@ -76,10 +76,6 @@ If you are running a changefeed with the [`confluent_schema_registry`]({% link { ### Using changefeed metrics labels -{{site.data.alerts.callout_info}} -An {{ site.data.products.enterprise }} license is required to use metrics labels in changefeeds. -{{site.data.alerts.end}} - {% include {{ page.version.version }}/cdc/metrics-labels.md %} To start a changefeed with a metrics label, set the following cluster setting to `true`: @@ -136,7 +132,7 @@ changefeed_emitted_bytes{scope="vehicles"} 183557 | Metric | Description | Unit | Type -------------------+--------------+------+-------------------------------------------- `changefeed.admit_latency` | Difference between the event's MVCC timestamp and the time the event is put into the memory buffer. | Nanoseconds | Histogram -`changefeed.aggregator_progress` | The earliest timestamp up to which any [aggregator]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) is guaranteed to have emitted all values for which it is responsible. **Note:** This metric may regress when a changefeed restarts due to a transient error. Consider tracking the `changefeed.checkpoint_progress` metric, which will not regress. | Timestamp | Gauge +`changefeed.aggregator_progress` | The earliest timestamp up to which any [aggregator]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) is guaranteed to have emitted all values for which it is responsible. **Note:** This metric may regress when a changefeed restarts due to a transient error. Consider tracking the `changefeed.checkpoint_progress` metric, which will not regress. | Timestamp | Gauge `changefeed.backfill_count` | Number of changefeeds currently executing a backfill ([schema change]({% link {{ page.version.version }}/changefeed-messages.md %}#schema-changes) or initial scan). | Changefeeds | Gauge `changefeed.backfill_pending_ranges` | Number of [ranges]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-range) in an ongoing backfill that are yet to be fully emitted. | Ranges | Gauge `changefeed.checkpoint_hist_nanos` | Time spent checkpointing changefeed progress. | Nanoseconds | Histogram @@ -153,7 +149,7 @@ changefeed_emitted_bytes{scope="vehicles"} 183557 `changefeed.message_size_hist` | Distribution in the size of emitted messages. | Bytes | Histogram `changefeed.running` | Number of currently running changefeeds, including sinkless changefeeds. | Changefeeds | Gauge `changefeed.sink_batch_hist_nanos` | Time messages spend batched in the sink buffer before being flushed and acknowledged. | Nanoseconds | Histogram -`changefeed.total_ranges` | Total number of ranges that are watched by [aggregator processors]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) participating in the changefeed job. `changefeed.total_ranges` shares the same polling interval as the [`changefeed.lagging_ranges`](#lagging-ranges-metric) metric, which is controlled by the `lagging_ranges_polling_interval` option. For more details, refer to [Lagging ranges](#lagging-ranges). +`changefeed.total_ranges` | Total number of ranges that are watched by [aggregator processors]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) participating in the changefeed job. `changefeed.total_ranges` shares the same polling interval as the [`changefeed.lagging_ranges`](#lagging-ranges-metric) metric, which is controlled by the `lagging_ranges_polling_interval` option. For more details, refer to [Lagging ranges](#lagging-ranges). ### Monitoring and measuring changefeed latency @@ -196,7 +192,7 @@ If your changefeed is experiencing elevated latency, you can use these metrics t ### Using logs -For {{ site.data.products.enterprise }} changefeeds, [use log information]({% link {{ page.version.version }}/logging-overview.md %}) to debug connection issues (i.e., `kafka: client has run out of available brokers to talk to (Is your cluster reachable?)`). Debug by looking for lines in the logs with `[kafka-producer]` in them: +For changefeeds, [use log information]({% link {{ page.version.version }}/logging-overview.md %}) to debug connection issues (i.e., `kafka: client has run out of available brokers to talk to (Is your cluster reachable?)`). Debug by looking for lines in the logs with `[kafka-producer]` in them: ~~~ I190312 18:56:53.535646 585 vendor/github.com/Shopify/sarama/client.go:123 [kafka-producer] Initializing new client @@ -208,7 +204,7 @@ I190312 18:56:53.537686 585 vendor/github.com/Shopify/sarama/client.go:170 [kaf ### Using `SHOW CHANGEFEED JOBS` -For {{ site.data.products.enterprise }} changefeeds, use `SHOW CHANGEFEED JOBS` to check the status of your changefeed jobs: +For changefeeds, use `SHOW CHANGEFEED JOBS` to check the status of your changefeed jobs: {% include {{ page.version.version }}/cdc/show-changefeed-job.md %} diff --git a/src/current/v25.2/protect-changefeed-data.md b/src/current/v25.2/protect-changefeed-data.md index 49afa1eb6c5..634078e8c14 100644 --- a/src/current/v25.2/protect-changefeed-data.md +++ b/src/current/v25.2/protect-changefeed-data.md @@ -5,7 +5,7 @@ toc: true docs_area: stream_data --- -By default, [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) will protect changefeed data from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) up to the time of the [_checkpoint_]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). +By default, [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) will protect changefeed data from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) up to the time of the [_checkpoint_]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). Protected timestamps will protect changefeed data from garbage collection in the following scenarios: diff --git a/src/current/v25.2/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md b/src/current/v25.2/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md index 014377bf164..73f460242ac 100644 --- a/src/current/v25.2/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md +++ b/src/current/v25.2/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md @@ -5,7 +5,7 @@ toc: true docs_area: stream_data --- -CockroachDB {{ site.data.products.enterprise }} changefeeds can stream change data out to [Apache Kafka](https://kafka.apache.org/) with different [configuration settings]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka-sink-configuration) and [options]({% link {{ page.version.version }}/create-changefeed.md %}). [Confluent Cloud](https://www.confluent.io/confluent-cloud/) provides a fully managed service for running Apache Kafka as well as the [Confluent Cloud Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html). +CockroachDB changefeeds can stream change data out to [Apache Kafka](https://kafka.apache.org/) with different [configuration settings]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka-sink-configuration) and [options]({% link {{ page.version.version }}/create-changefeed.md %}). [Confluent Cloud](https://www.confluent.io/confluent-cloud/) provides a fully managed service for running Apache Kafka as well as the [Confluent Cloud Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html). A schema registry is a repository for schemas, which allows you to share and manage schemas between different services. Confluent Cloud Schema Registries map to Kafka topics in your Confluent Cloud environment. @@ -248,19 +248,7 @@ To create your changefeed, you'll prepare your CockroachDB cluster with the `mov cockroach sql --url {"CONNECTION STRING"} ~~~ -1. Set your organization name and [{{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#types-of-licenses) key: - - {% include_cached copy-clipboard.html %} - ~~~sql - SET CLUSTER SETTING cluster.organization = ''; - ~~~ - - {% include_cached copy-clipboard.html %} - ~~~sql - SET CLUSTER SETTING enterprise.license = ''; - ~~~ - -1. Before you can create an {{ site.data.products.enterprise }} changefeed, it is necessary to enable rangefeeds on your cluster: +1. Before you can create a changefeed, it is necessary to enable rangefeeds on your cluster: {% include_cached copy-clipboard.html %} ~~~sql @@ -322,7 +310,7 @@ You can also [create external connections]({% link {{ page.version.version }}/cr CREATE CHANGEFEED FOR TABLE users INTO "external://kafka" WITH updated, format = avro, confluent_schema_registry = "external://confluent_registry"; ~~~ - See [Options]({% link {{ page.version.version }}/create-changefeed.md %}#options) for a list of all available Enterprise changefeed options. + See [Options]({% link {{ page.version.version }}/create-changefeed.md %}#options) for a list of all available changefeed options. {{site.data.alerts.callout_success}} {% include {{ page.version.version }}/cdc/schema-registry-metric.md %} diff --git a/src/current/v25.2/ui-cdc-dashboard.md b/src/current/v25.2/ui-cdc-dashboard.md index 08ca950ecf9..e2c0ee5131b 100644 --- a/src/current/v25.2/ui-cdc-dashboard.md +++ b/src/current/v25.2/ui-cdc-dashboard.md @@ -73,7 +73,7 @@ Metric | Description ## Max Checkpoint Latency -This graph displays the most any changefeed's persisted [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) is behind the present time. Larger values indicate issues with successfully ingesting or emitting changes. If errors cause a changefeed to restart, or the changefeed is [paused]({% link {{ page.version.version }}/pause-job.md %}) and unpaused, emitted data up to the last checkpoint may be re-emitted. +This graph displays the most any changefeed's persisted [checkpoint]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) is behind the present time. Larger values indicate issues with successfully ingesting or emitting changes. If errors cause a changefeed to restart, or the changefeed is [paused]({% link {{ page.version.version }}/pause-job.md %}) and unpaused, emitted data up to the last checkpoint may be re-emitted. {{site.data.alerts.callout_info}} In v23.1 and earlier, the **Max Checkpoint Latency** graph was named **Max Changefeed Latency**. If you want to customize charts, including how metrics are named, use the [**Custom Chart** debug page]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}).
Basic changefeedsEnterprise changefeedsSinkless changefeedsChangefeeds
All productsCockroachDB {{ site.data.products.basic }}, {{ site.data.products.standard }}, {{ site.data.products.advanced }}, or with an {{ site.data.products.enterprise }} license in CockroachDB {{ site.data.products.core }}.All products
Streams indefinitely until underlying SQL connection is closed.Maintains connection to configured sink:
Amazon S3, Azure Event Hubs, Azure Storage, Confluent Cloud, Google Cloud Pub/Sub, Google Cloud Storage, HTTP, Kafka, Webhook.
Maintains connection to configured sink.
SQL statement Create with EXPERIMENTAL CHANGEFEED FORCreate with CREATE CHANGEFEEDCreate with CREATE CHANGEFEED FOR TABLE table_name;Create with CREATE CHANGEFEED FOR TABLE table_name INTO 'sink';
Filter change data Not supportedUse CDC queries to define the emitted change data. Use CDC queries to define the emitted change data.
Emits every change to a "watched" row as a record to the current SQL session.Emits every change to a "watched" row as a record in a configurable format: JSON, CSV, Avro, Parquet.Emits every change to a "watched" row as a record in a configurable format.
Management Create the changefeed and cancel by closing the connection.Create the changefeed and cancel by closing the SQL connection. Manage changefeed with CREATE, PAUSE, RESUME, ALTER, and CANCEL.