-
Notifications
You must be signed in to change notification settings - Fork 473
[WIP] MOLT Replicator draft docs #20465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Files changed:
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
||
<section class="filter-content" markdown="1" data-scope="mysql"> | ||
For MySQL **8.0 and later** sources, enable [global transaction identifiers (GTID)](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html) consistency. Set the following values in `mysql.cnf`, in the SQL shell, or as flags in the `mysql` start command: | ||
Enable [global transaction identifiers (GTID)](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html) and configure binary logging. Set `binlog-row-metadata` or `binlog-row-image` to `full` to provide complete metadata for replication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be worth calling out that it's also important to tune binlog retention: https://dba.stackexchange.com/a/206602
This can impact if the data from the GTID you specify is still available or if it's now purged/rotated. It's important to note that if using something like AWS RDS or GCP CloudSQL, there are provider specific ways they handle this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work @taroface . Not an easy doc to write, but you made it understandable and clean! Let's bottom out on some of these discussions and ensure the deprecation effort from @tuansydau reflects the reality of what we are documenting.
Enable [global transaction identifiers (GTID)](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html) and configure binary logging. Set `binlog-row-metadata` or `binlog-row-image` to `full` to provide complete metadata for replication. | ||
|
||
{{site.data.alerts.callout_info}} | ||
GTID replication sends all database changes to Replicator. To limit replication to specific tables or schemas, use the `--table-filter` and `--schema-filter` flags in the `replicator` command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note that schema-filter
and table-filter
are not supported for replicator
. This use case will actually require a userscript. Given we don't have userscripts documented right now, wondering how you want to proceed here? CC @Jeremyyang920 @rohan-joshi
|
||
Use the `Executed_Gtid_Set` value for the `--defaultGTIDSet` flag in MOLT Replicator. | ||
|
||
To verify that a GTID set is valid and not purged, use the following queries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, this section will be really helpful and would have helped some folks sanity check before raising an issue.
+---------------+----------+--------------+------------------+-------------------------------------------+ | ||
~~~ | ||
|
||
Use the `Executed_Gtid_Set` value for the `--defaultGTIDSet` flag in MOLT Replicator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note is that this value will only be used if there is no GTID in the memo
table which is in the staging database (i.e _replicator). Otherwise, it will use the one in the memo table and keep track of advancing GTID checkpoints in memo
. Is this called out elsewhere, or can we add a line about this here?
To force the system to respect the defaultGTIDSet you pass in, you can just clear the memo
table and it will be as if it's a fresh run.
</section> | ||
|
||
<section class="filter-content" markdown="1" data-scope="oracle"> | ||
##### Enable ARCHIVELOG and FORCE LOGGING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferring to @noelcrl to review the correctness here.
--source 'postgres://migration_user:password@localhost:5432/molt?sslmode=verify-full' | ||
~~~ | ||
|
||
The source connection must point to the PostgreSQL primary instance, not a read replica. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we do have a flag that can still ignore replication setup for cases where folks just want a data load and don't have any need for replication setup or information. Should we clarify this? CC @Jeremyyang920
@@ -0,0 +1,27 @@ | |||
### Replicator metrics | |||
|
|||
By default, MOLT Replicator exports [Prometheus](https://prometheus.io/) metrics at the address specified by `--metricsAddr` (default `:30005`) at the path `/_/varz`. For example: `http://localhost:30005/_/varz`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide on referring to it as MOLT Replicator
generally from now on? CC @rohan-joshi @Jeremyyang920
@@ -0,0 +1,27 @@ | |||
### Replicator metrics | |||
|
|||
By default, MOLT Replicator exports [Prometheus](https://prometheus.io/) metrics at the address specified by `--metricsAddr` (default `:30005`) at the path `/_/varz`. For example: `http://localhost:30005/_/varz`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, I actually see what Replicator doesn't actually default metricsAddr
which means that metrics are not enabled by default. This is stale information since the MOLT wrapper used to set metricsAddr
to 30005. I think we should call out that the default behavior is to not spin up metrics, but you can set it to a port (:30005
recommended).
Here is the code snippet that made me realize this:
cmd.Flags().StringVar(&metricsAddr, "metricsAddr", "", "start a metrics server")
|
||
{% include molt/molt-setup.md %} | ||
|
||
## Start Fetch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So an important note here is that as part of the deprecation of the wrapper, we're mainly removing the invocations of Replicator from MOLT. However, there is some source database Replication setup that we'll still need to perform for PostgreSQL specifically. The reason we have to do this is because we need to create the slot at the time we actually do the snapshot export so we don't have gaps in data.
So that means that we still need to document the behavior when we set certain pg-*
flags for setting publication, slots and the relevant drop/recreate behavior. I think we'll need to discuss this a bit more in the next team meeting to clearly lay out what the behavior still is. CC @tuansydau @Jeremyyang920
</section> | ||
|
||
<section class="filter-content" markdown="1" data-scope="mysql"> | ||
Use the `replicator mylogical` command. Replicator will automatically use the saved GTID from the staging schema, or fall back to the specified `--defaultGTIDSet` if no saved state exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit: the saved GTID from the staging schema's memo
table if they want to know where to look.
|
||
MOLT Replicator continuously replicates changes from source databases to CockroachDB as part of a [database migration]({% link molt/migration-overview.md %}). It supports live ongoing migrations to CockroachDB from a source database, and enables backfill from CockroachDB to your source database for failback scenarios to preserve a rollback option during a migration window. | ||
|
||
MOLT Replicator consumes change data from CockroachDB changefeeds, PostgreSQL logical replication streams, MySQL GTID-based replication, and Oracle LogMiner. It applies changes to target databases while maintaining configurable consistency {% comment %}and transaction boundaries{% endcomment %}, and features an embedded TypeScript/JavaScript environment for configuration and live data transforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit: MOLT Replicator also consumes
DOC-13338
DOC-14748
This PR is still WIP.
Notes for reviewers: