Skip to content

Update docs for Leader Leases GA in v25.2 #19587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 12, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/current/_includes/v25.2/essential-alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,8 @@ During [rolling maintenance]({% link {{ page.version.version }}/upgrade-cockroac

### Heartbeat latency

[XXX](XXX): DO WE HAVE A METRIC UNDER `storeliveness.heartbeat.*` that is analagous to this? or should folks just still use this metric?

Monitor the cluster health for early signs of instability. If this metric exceeds 1 second, it is a sign of instability.

**Metric**
Expand Down
3 changes: 2 additions & 1 deletion src/current/_includes/v25.2/essential-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ The **Usage** column explains why each metric is important to visualize in a cus

| <div style="width:225px">CockroachDB Metric Name</div> | {% if include.deployment == 'self-hosted' %}<div style="width:225px">[Datadog Integration Metric Name](https://docs.datadoghq.com/integrations/cockroachdb/?tab=host#metrics)<br>(add `cockroachdb.` prefix)</div> |{% elsif include.deployment == 'advanced' %}<div style="width:225px">[Datadog Integration Metric Name](https://docs.datadoghq.com/integrations/cockroachdb_dedicated/#metrics)<br>(add `crdb_dedicated.` prefix)</div> |{% endif %}<div style="width:150px">Description</div>| Usage |
| ----------------------------------------------------- | {% if include.deployment == 'self-hosted' %}------ |{% elsif include.deployment == 'advanced' %}---- |{% endif %} ------------------------------------------------------------ | ------------------------------------------------------------ |
| [xxx](xxx): what if any `storeliveness.*.*` metrics do we wanna add here? |
| <a id="liveness-heartbeatlatency"></a>liveness.heartbeatlatency | {% if include.deployment == 'self-hosted' %}liveness.heartbeatlatency-p90 |{% elsif include.deployment == 'advanced' %}liveness.heartbeatlatency |{% endif %} Node liveness heartbeat latency | If this metric exceeds 1 second, it is a sign of cluster instability. |
| <a id="liveness-livenodes"></a>liveness.livenodes | liveness.livenodes | Number of live nodes in the cluster (will be 0 if this node is not itself live) | This is a critical metric that tracks the live nodes in the cluster. |
| distsender.rpc.sent.nextreplicaerror | distsender.rpc.sent.nextreplicaerror | Number of replica-addressed RPCs sent due to per-replica errors | [RPC](architecture/overview.html#overview) errors do not necessarily indicate a problem. This metric tracks remote procedure calls that return a status value other than "success". A non-success status of an RPC should not be misconstrued as a network transport issue. It is database code logic executed on another cluster node. The non-success status is a result of an orderly execution of an RPC that reports a specific logical condition. |
Expand All @@ -93,7 +94,7 @@ The **Usage** column explains why each metric is important to visualize in a cus
| <div style="width:225px">CockroachDB Metric Name</div> | {% if include.deployment == 'self-hosted' %}<div style="width:225px">[Datadog Integration Metric Name](https://docs.datadoghq.com/integrations/cockroachdb/?tab=host#metrics)<br>(add `cockroachdb.` prefix)</div> |{% elsif include.deployment == 'advanced' %}<div style="width:225px">[Datadog Integration Metric Name](https://docs.datadoghq.com/integrations/cockroachdb_dedicated/#metrics)<br>(add `crdb_dedicated.` prefix)</div> |{% endif %}<div style="width:150px">Description</div>| Usage |
| ----------------------------------------------------- | {% if include.deployment == 'self-hosted' %}------ |{% elsif include.deployment == 'advanced' %}---- |{% endif %} ------------------------------------------------------------ | ------------------------------------------------------------ |
| leases.transfers.success | leases.transfers.success | Number of successful lease transfers | A high number of [lease](architecture/replication-layer.html#leases) transfers is not a negative or positive signal, rather it is a reflection of the elastic cluster activities. For example, this metric is high during cluster topology changes. A high value is often the reason for NotLeaseHolderErrors which are normal and expected during rebalancing. Observing this metric may provide a confirmation of the cause of such errors. |
| rebalancing_lease_transfers | rebalancing.lease.transfers | Counter of the number of [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) that occur during replica rebalancing. These lease transfers are tracked by a component that looks for a [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. |
| rebalancing_lease_transfers | rebalancing.lease.transfers | Counter of the number of [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) that occur during replica rebalancing. These lease transfers are tracked by a component that looks for a [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. |
| rebalancing_range_rebalances | {% if include.deployment == 'self-hosted' %}rebalancing.range.rebalances | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the number of [load-based range rebalances]({% link {{ page.version.version }}/architecture/replication-layer.md %}#load-based-replica-rebalancing). This range movement is tracked by a component that looks for [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. |
| rebalancing_replicas_queriespersecond | {% if include.deployment == 'self-hosted' %}rebalancing.replicas.queriespersecond | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the KV-level requests received per second by a given [store]({% link {{ page.version.version }}/cockroach-start.md %}#store). The store aggregates all of the CPU and QPS stats across all its replicas and then creates a histogram that maintains buckets that can be queried for, e.g., the P95 replica's QPS or CPU. | A high value of this metric could indicate that one of the store's replicas is part of a [hot range]({% link {{ page.version.version }}/understand-hotspots.md %}#hot-range). See also: `rebalancing_replicas_cpunanospersecond`. |
| rebalancing_replicas_cpunanospersecond | {% if include.deployment == 'self-hosted' %}rebalancing.replicas.cpunanospersecond | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the CPU nanoseconds of execution time per second by a given [store]({% link {{ page.version.version }}/cockroach-start.md %}#store). The store aggregates all of the CPU and QPS stats across all its replicas and then creates a histogram that maintains buckets that can be queried for, e.g., the P95 replica's QPS or CPU. | A high value of this metric could indicate that one of the store's replicas is part of a [hot range]({% link {{ page.version.version }}/understand-hotspots.md %}#hot-range). See also the non-histogram variant: `rebalancing.cpunanospersecond`. |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
[Per-replica circuit breakers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#per-replica-circuit-breakers) have the following limitations:

- They cannot prevent requests from hanging when the node's [liveness range]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) is unavailable. For more information about troubleshooting a cluster that's having node liveness issues, see [Node liveness issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues).
- They are not tripped if _all_ replicas of a range [become unavailable]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#db-console-shows-under-replicated-unavailable-ranges), because the circuit breaker mechanism operates per-replica. This means at least one replica needs to be available to receive the request in order for the breaker to trip.
- They are not tripped if _all_ replicas of a range [become unavailable]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#db-console-shows-under-replicated-unavailable-ranges), because the circuit breaker mechanism operates per-replica. This means at least one replica needs to be available to receive the request in order for the breaker to trip.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
By default under `SERIALIZABLE` isolation, locks acquired using `SELECT ... FOR UPDATE` and `SELECT ... FOR SHARE` are implemented as fast, in-memory [unreplicated locks](architecture/transaction-layer.html#unreplicated-locks). If a [lease transfer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) or [range split/merge]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#range-merges) occurs on a range held by an unreplicated lock, the lock is dropped. The following behaviors can occur:
By default under `SERIALIZABLE` isolation, locks acquired using `SELECT ... FOR UPDATE` and `SELECT ... FOR SHARE` are implemented as fast, in-memory [unreplicated locks](architecture/transaction-layer.html#unreplicated-locks). If a [lease transfer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) or [range split/merge]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#range-merges) occurs on a range held by an unreplicated lock, the lock is dropped. The following behaviors can occur:

- The desired ordering of concurrent accesses to one or more rows of a table expressed by your use of `SELECT ... FOR UPDATE` may not be preserved (that is, a transaction _B_ against some table _T_ that was supposed to wait behind another transaction _A_ operating on _T_ may not wait for transaction _A_).
- The transaction that acquired the (now dropped) unreplicated lock may fail to commit, leading to [transaction retry errors with code `40001`]({% link {{ page.version.version }}/transaction-retry-error-reference.md %}) and the [`restart transaction` error message]({% link {{ page.version.version }}/common-errors.md %}#restart-transaction).
Expand Down
2 changes: 1 addition & 1 deletion src/current/_includes/v25.2/leader-leases-intro.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
CockroachDB offers an improved leasing system rebuilt atop a stronger form of [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) leadership that ensures that the Raft leader is **always** the range's leaseholder. This new type of lease is called a _Leader lease_, and supersedes [epoch-based leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) and [expiration-based leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#expiration-based-leases-meta-and-system-ranges) leases while combining the performance of the former with the resilience of the latter. **Leader leases are not enabled by default.**
{% include_cached new-in.html version="v25.2" %} CockroachDB offers an improved leasing system rebuilt atop a stronger form of [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) leadership that ensures that the Raft leader is **always** the range's leaseholder. This new type of lease is called a _Leader lease_, and supersedes the former system of having different epoch-based and expiration-based lease types, while combining the performance of the former with the resilience of the latter.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Node liveness is no longer determined by heartbeating a single "liveness range"; instead it is determined using [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases).

However, node heartbeats of a single range are still used to determine:

- Whether a node is still a member of a cluster (this is used by [`cockroach node decommission`]({% link {{ page.version.version }}/cockroach-node.md %}#node-decommission)).
- Whether a node is dead or not (in which case [its leases will be transferred away]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node)).
8 changes: 4 additions & 4 deletions src/current/_includes/v25.2/metric-names-serverless.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ Name | Description
`jobs.changefeed.resume_retry_error` | Number of changefeed jobs which failed with a retryable error
`keybytes` | Number of bytes taken up by keys
`keycount` | Count of all keys
`leases.epoch` | Number of replica leaseholders using epoch-based leases
`leases.epoch` | [XXX](XXX): Should go away with leader leases? Still in beta.3 Number of replica leaseholders using epoch-based leases
`leases.error` | Number of failed lease requests
`leases.expiration` | Number of replica leaseholders using expiration-based leases
`leases.expiration` | [XXX](XXX): Should go away with leader leases? Still in beta.3 Number of replica leaseholders using expiration-based leases
`leases.success` | Number of successful lease requests
`leases.transfers.error` | Number of failed lease transfers
`leases.transfers.success` | Number of successful lease transfers
Expand Down Expand Up @@ -148,10 +148,10 @@ Name | Description
`ranges.underreplicated` | Number of ranges with fewer live replicas than the replication target
`ranges` | Number of ranges
`rebalancing.writespersecond` | Number of keys written (i.e., applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions
`replicas.leaders_not_leaseholders` | Number of replicas that are Raft leaders whose range lease is held by another store
`replicas.leaders_not_leaseholders` | ([XXX](XXX): This seems like it should go away with the advent of leader leases?) Number of replicas that are Raft leaders whose range lease is held by another store
`replicas.leaders` | Number of Raft leaders
`replicas.leaseholders` | Number of lease holders
`replicas.quiescent` | Number of quiesced replicas
`replicas.quiescent` | Number of quiesced replicas [XXX](XXX): Can this metric be removed from docs in a leader leases world v25.2+ ???
`replicas.reserved` | Number of replicas reserved for snapshots
`replicas` | Number of replicas
`requests.backpressure.split` | Number of backpressured writes waiting on a range split. A range will backpressure (roughly) non-system traffic when the range is above the configured size until the range splits. When the rate of this metric is nonzero over extended periods of time, it should be investigated why splits are not occurring.
Expand Down
4 changes: 3 additions & 1 deletion src/current/_includes/v25.2/misc/basic-terms.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,15 @@ The replica that holds the "range lease." This replica receives and coordinates

For most types of tables and queries, the leaseholder is the only replica that can serve consistent reads (reads that return "the latest" data).

{% include_cached new-in.html version="v25.2" %} The leaseholder is always the same replica as the [Raft leader](#architecture-raft-leader). For more information, see [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases).

### Raft protocol
<a name="architecture-raft"></a>
The [consensus protocol]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) employed in CockroachDB that ensures that your data is safely stored on multiple nodes and that those nodes agree on the current state even if some of them are temporarily disconnected.

### Raft leader
<a name="architecture-raft-leader"></a>
For each range, the replica that is the "leader" for write requests. The leader uses the Raft protocol to ensure that a majority of replicas (the leader and enough followers) agree, based on their Raft logs, before committing the write. The Raft leader is almost always the same replica as the leaseholder.
For each range, the replica that is the "leader" for write requests. The leader uses the Raft protocol to ensure that a majority of replicas (the leader and enough followers) agree, based on their Raft logs, before committing the write. {% include_cached new-in.html version="v25.2" %} The Raft leader is always the same replica as the [leaseholder](#architecture-raft-leader). For more information, see [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases).

### Raft log
A time-ordered log of writes to a range that its replicas have agreed on. This log exists on-disk with each replica and is the range's source of truth for consistent replication.
Loading
Loading