Skip to content

Conversation

taroface
Copy link
Contributor

@taroface taroface commented Oct 1, 2025

DOC-13338
DOC-14748

This PR is still WIP.

Notes for reviewers:

Page Please review Notes
Load and Replicate entire flow, but focus on Replicator setup, usage, troubleshooting Fetch content is pre-existing
Migration Failback entire flow This was completely rewritten
Resume Replication Replicator usage, any missing context/caveats about resuming Structure is still rough
MOLT Replicator whole page Structure is WIP. Usage section is still barebones. I need to think about a good way to present the flags per dialect.
MOLT Fetch check for content that should be removed/moved to Replicator I think I caught everything, but may not understand something

Copy link

github-actions bot commented Oct 1, 2025

Files changed:

Copy link

netlify bot commented Oct 1, 2025

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit c84c05c
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/68dd385278559300084c9654

Copy link

netlify bot commented Oct 1, 2025

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit c84c05c
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/68dd38522180d80007dc2610

Copy link

netlify bot commented Oct 1, 2025

Netlify Preview

Name Link
🔨 Latest commit c84c05c
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/68dd38525814220008185119
😎 Deploy Preview https://deploy-preview-20465--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@taroface taroface changed the title [wip] MOLT Replicator draft docs [WIP] MOLT Replicator draft docs Oct 1, 2025

<section class="filter-content" markdown="1" data-scope="mysql">
For MySQL **8.0 and later** sources, enable [global transaction identifiers (GTID)](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html) consistency. Set the following values in `mysql.cnf`, in the SQL shell, or as flags in the `mysql` start command:
Enable [global transaction identifiers (GTID)](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html) and configure binary logging. Set `binlog-row-metadata` or `binlog-row-image` to `full` to provide complete metadata for replication.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be worth calling out that it's also important to tune binlog retention: https://dba.stackexchange.com/a/206602

This can impact if the data from the GTID you specify is still available or if it's now purged/rotated. It's important to note that if using something like AWS RDS or GCP CloudSQL, there are provider specific ways they handle this:

Copy link

@ryanluu12345 ryanluu12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work @taroface . Not an easy doc to write, but you made it understandable and clean! Let's bottom out on some of these discussions and ensure the deprecation effort from @tuansydau reflects the reality of what we are documenting.

Enable [global transaction identifiers (GTID)](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html) and configure binary logging. Set `binlog-row-metadata` or `binlog-row-image` to `full` to provide complete metadata for replication.

{{site.data.alerts.callout_info}}
GTID replication sends all database changes to Replicator. To limit replication to specific tables or schemas, use the `--table-filter` and `--schema-filter` flags in the `replicator` command.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that schema-filter and table-filter are not supported for replicator. This use case will actually require a userscript. Given we don't have userscripts documented right now, wondering how you want to proceed here? CC @Jeremyyang920 @rohan-joshi


Use the `Executed_Gtid_Set` value for the `--defaultGTIDSet` flag in MOLT Replicator.

To verify that a GTID set is valid and not purged, use the following queries:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this section will be really helpful and would have helped some folks sanity check before raising an issue.

+---------------+----------+--------------+------------------+-------------------------------------------+
~~~

Use the `Executed_Gtid_Set` value for the `--defaultGTIDSet` flag in MOLT Replicator.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note is that this value will only be used if there is no GTID in the memo table which is in the staging database (i.e _replicator). Otherwise, it will use the one in the memo table and keep track of advancing GTID checkpoints in memo. Is this called out elsewhere, or can we add a line about this here?

To force the system to respect the defaultGTIDSet you pass in, you can just clear the memo table and it will be as if it's a fresh run.

</section>

<section class="filter-content" markdown="1" data-scope="oracle">
##### Enable ARCHIVELOG and FORCE LOGGING

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring to @noelcrl to review the correctness here.

--source 'postgres://migration_user:password@localhost:5432/molt?sslmode=verify-full'
~~~

The source connection must point to the PostgreSQL primary instance, not a read replica.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we do have a flag that can still ignore replication setup for cases where folks just want a data load and don't have any need for replication setup or information. Should we clarify this? CC @Jeremyyang920

@@ -0,0 +1,27 @@
### Replicator metrics

By default, MOLT Replicator exports [Prometheus](https://prometheus.io/) metrics at the address specified by `--metricsAddr` (default `:30005`) at the path `/_/varz`. For example: `http://localhost:30005/_/varz`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we decide on referring to it as MOLT Replicator generally from now on? CC @rohan-joshi @Jeremyyang920

@@ -0,0 +1,27 @@
### Replicator metrics

By default, MOLT Replicator exports [Prometheus](https://prometheus.io/) metrics at the address specified by `--metricsAddr` (default `:30005`) at the path `/_/varz`. For example: `http://localhost:30005/_/varz`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, I actually see what Replicator doesn't actually default metricsAddr which means that metrics are not enabled by default. This is stale information since the MOLT wrapper used to set metricsAddr to 30005. I think we should call out that the default behavior is to not spin up metrics, but you can set it to a port (:30005 recommended).

Here is the code snippet that made me realize this:

cmd.Flags().StringVar(&metricsAddr, "metricsAddr", "", "start a metrics server")


{% include molt/molt-setup.md %}

## Start Fetch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So an important note here is that as part of the deprecation of the wrapper, we're mainly removing the invocations of Replicator from MOLT. However, there is some source database Replication setup that we'll still need to perform for PostgreSQL specifically. The reason we have to do this is because we need to create the slot at the time we actually do the snapshot export so we don't have gaps in data.

So that means that we still need to document the behavior when we set certain pg-* flags for setting publication, slots and the relevant drop/recreate behavior. I think we'll need to discuss this a bit more in the next team meeting to clearly lay out what the behavior still is. CC @tuansydau @Jeremyyang920

</section>

<section class="filter-content" markdown="1" data-scope="mysql">
Use the `replicator mylogical` command. Replicator will automatically use the saved GTID from the staging schema, or fall back to the specified `--defaultGTIDSet` if no saved state exists.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: the saved GTID from the staging schema's memo table if they want to know where to look.


MOLT Replicator continuously replicates changes from source databases to CockroachDB as part of a [database migration]({% link molt/migration-overview.md %}). It supports live ongoing migrations to CockroachDB from a source database, and enables backfill from CockroachDB to your source database for failback scenarios to preserve a rollback option during a migration window.

MOLT Replicator consumes change data from CockroachDB changefeeds, PostgreSQL logical replication streams, MySQL GTID-based replication, and Oracle LogMiner. It applies changes to target databases while maintaining configurable consistency {% comment %}and transaction boundaries{% endcomment %}, and features an embedded TypeScript/JavaScript environment for configuration and live data transforms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: MOLT Replicator also consumes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants