Skip to content

Commit e61c059

Browse files
committed
Update docs for Leader Leases GA in v25.2
Fixes DOC-11671, DOC-12861 Summary of changes: - Update 'Architecture > Replication Layer > Leader Leases' to remove the stuff about them being off by default - While we're in there, also update the 'How leases are transferred off a dead node' section to match the new reality. Also did various other edits on that page re: leaseholder vs. Raft leader mismatches which won't exist anymore - Revise 'Troubleshoot Self-Hosted Setup > Node liveness issues' so that the content there stops referring to Ye Olde SPOF liveness range - Update many, many other places where we said things like "sometimes the leaseholder and the Raft leader are different, and that can be sad, but usually they aren't" etc. that no longer apply - Attempt to rebrand the old "node liveness" stuff as "node heartbeats" in a few places and clarify that it is only used for a couple of less central purposes like cluster membership, etc., in v25.2+ - Also remove lots of extemporaneous mentions of "node liveness heartbeats" etc. from various places where it was sprinkled around. - Also remove various references to range quiescence, which is not a thing for Leader leases
1 parent 6da99f1 commit e61c059

30 files changed

+107
-123
lines changed

src/current/_includes/v25.2/essential-alerts.md

+2
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,8 @@ During [rolling maintenance]({% link {{ page.version.version }}/upgrade-cockroac
224224

225225
### Heartbeat latency
226226

227+
[XXX](XXX): DO WE HAVE A METRIC UNDER `storeliveness.heartbeat.*` that is analagous to this? or should folks just still use this metric?
228+
227229
Monitor the cluster health for early signs of instability. If this metric exceeds 1 second, it is a sign of instability.
228230

229231
**Metric**

src/current/_includes/v25.2/essential-metrics.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ The **Usage** column explains why each metric is important to visualize in a cus
8383

8484
| <div style="width:225px">CockroachDB Metric Name</div> | {% if include.deployment == 'self-hosted' %}<div style="width:225px">[Datadog Integration Metric Name](https://docs.datadoghq.com/integrations/cockroachdb/?tab=host#metrics)<br>(add `cockroachdb.` prefix)</div> |{% elsif include.deployment == 'advanced' %}<div style="width:225px">[Datadog Integration Metric Name](https://docs.datadoghq.com/integrations/cockroachdb_dedicated/#metrics)<br>(add `crdb_dedicated.` prefix)</div> |{% endif %}<div style="width:150px">Description</div>| Usage |
8585
| ----------------------------------------------------- | {% if include.deployment == 'self-hosted' %}------ |{% elsif include.deployment == 'advanced' %}---- |{% endif %} ------------------------------------------------------------ | ------------------------------------------------------------ |
86+
| [xxx](xxx): what if any `storeliveness.*.*` metrics do we wanna add here? |
8687
| <a id="liveness-heartbeatlatency"></a>liveness.heartbeatlatency | {% if include.deployment == 'self-hosted' %}liveness.heartbeatlatency-p90 |{% elsif include.deployment == 'advanced' %}liveness.heartbeatlatency |{% endif %} Node liveness heartbeat latency | If this metric exceeds 1 second, it is a sign of cluster instability. |
8788
| <a id="liveness-livenodes"></a>liveness.livenodes | liveness.livenodes | Number of live nodes in the cluster (will be 0 if this node is not itself live) | This is a critical metric that tracks the live nodes in the cluster. |
8889
| distsender.rpc.sent.nextreplicaerror | distsender.rpc.sent.nextreplicaerror | Number of replica-addressed RPCs sent due to per-replica errors | [RPC](architecture/overview.html#overview) errors do not necessarily indicate a problem. This metric tracks remote procedure calls that return a status value other than "success". A non-success status of an RPC should not be misconstrued as a network transport issue. It is database code logic executed on another cluster node. The non-success status is a result of an orderly execution of an RPC that reports a specific logical condition. |
@@ -93,7 +94,7 @@ The **Usage** column explains why each metric is important to visualize in a cus
9394
| <div style="width:225px">CockroachDB Metric Name</div> | {% if include.deployment == 'self-hosted' %}<div style="width:225px">[Datadog Integration Metric Name](https://docs.datadoghq.com/integrations/cockroachdb/?tab=host#metrics)<br>(add `cockroachdb.` prefix)</div> |{% elsif include.deployment == 'advanced' %}<div style="width:225px">[Datadog Integration Metric Name](https://docs.datadoghq.com/integrations/cockroachdb_dedicated/#metrics)<br>(add `crdb_dedicated.` prefix)</div> |{% endif %}<div style="width:150px">Description</div>| Usage |
9495
| ----------------------------------------------------- | {% if include.deployment == 'self-hosted' %}------ |{% elsif include.deployment == 'advanced' %}---- |{% endif %} ------------------------------------------------------------ | ------------------------------------------------------------ |
9596
| leases.transfers.success | leases.transfers.success | Number of successful lease transfers | A high number of [lease](architecture/replication-layer.html#leases) transfers is not a negative or positive signal, rather it is a reflection of the elastic cluster activities. For example, this metric is high during cluster topology changes. A high value is often the reason for NotLeaseHolderErrors which are normal and expected during rebalancing. Observing this metric may provide a confirmation of the cause of such errors. |
96-
| rebalancing_lease_transfers | rebalancing.lease.transfers | Counter of the number of [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) that occur during replica rebalancing. These lease transfers are tracked by a component that looks for a [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. |
97+
| rebalancing_lease_transfers | rebalancing.lease.transfers | Counter of the number of [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) that occur during replica rebalancing. These lease transfers are tracked by a component that looks for a [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. |
9798
| rebalancing_range_rebalances | {% if include.deployment == 'self-hosted' %}rebalancing.range.rebalances | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the number of [load-based range rebalances]({% link {{ page.version.version }}/architecture/replication-layer.md %}#load-based-replica-rebalancing). This range movement is tracked by a component that looks for [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. |
9899
| rebalancing_replicas_queriespersecond | {% if include.deployment == 'self-hosted' %}rebalancing.replicas.queriespersecond | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the KV-level requests received per second by a given [store]({% link {{ page.version.version }}/cockroach-start.md %}#store). The store aggregates all of the CPU and QPS stats across all its replicas and then creates a histogram that maintains buckets that can be queried for, e.g., the P95 replica's QPS or CPU. | A high value of this metric could indicate that one of the store's replicas is part of a [hot range]({% link {{ page.version.version }}/understand-hotspots.md %}#hot-range). See also: `rebalancing_replicas_cpunanospersecond`. |
99100
| rebalancing_replicas_cpunanospersecond | {% if include.deployment == 'self-hosted' %}rebalancing.replicas.cpunanospersecond | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the CPU nanoseconds of execution time per second by a given [store]({% link {{ page.version.version }}/cockroach-start.md %}#store). The store aggregates all of the CPU and QPS stats across all its replicas and then creates a histogram that maintains buckets that can be queried for, e.g., the P95 replica's QPS or CPU. | A high value of this metric could indicate that one of the store's replicas is part of a [hot range]({% link {{ page.version.version }}/understand-hotspots.md %}#hot-range). See also the non-histogram variant: `rebalancing.cpunanospersecond`. |
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
11
[Per-replica circuit breakers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#per-replica-circuit-breakers) have the following limitations:
22

3-
- They cannot prevent requests from hanging when the node's [liveness range]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) is unavailable. For more information about troubleshooting a cluster that's having node liveness issues, see [Node liveness issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues).
4-
- They are not tripped if _all_ replicas of a range [become unavailable]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#db-console-shows-under-replicated-unavailable-ranges), because the circuit breaker mechanism operates per-replica. This means at least one replica needs to be available to receive the request in order for the breaker to trip.
3+
- They are not tripped if _all_ replicas of a range [become unavailable]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#db-console-shows-under-replicated-unavailable-ranges), because the circuit breaker mechanism operates per-replica. This means at least one replica needs to be available to receive the request in order for the breaker to trip.

src/current/_includes/v25.2/known-limitations/select-for-update-limitations.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
By default under `SERIALIZABLE` isolation, locks acquired using `SELECT ... FOR UPDATE` and `SELECT ... FOR SHARE` are implemented as fast, in-memory [unreplicated locks](architecture/transaction-layer.html#unreplicated-locks). If a [lease transfer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) or [range split/merge]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#range-merges) occurs on a range held by an unreplicated lock, the lock is dropped. The following behaviors can occur:
1+
By default under `SERIALIZABLE` isolation, locks acquired using `SELECT ... FOR UPDATE` and `SELECT ... FOR SHARE` are implemented as fast, in-memory [unreplicated locks](architecture/transaction-layer.html#unreplicated-locks). If a [lease transfer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) or [range split/merge]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#range-merges) occurs on a range held by an unreplicated lock, the lock is dropped. The following behaviors can occur:
22

33
- The desired ordering of concurrent accesses to one or more rows of a table expressed by your use of `SELECT ... FOR UPDATE` may not be preserved (that is, a transaction _B_ against some table _T_ that was supposed to wait behind another transaction _A_ operating on _T_ may not wait for transaction _A_).
44
- The transaction that acquired the (now dropped) unreplicated lock may fail to commit, leading to [transaction retry errors with code `40001`]({% link {{ page.version.version }}/transaction-retry-error-reference.md %}) and the [`restart transaction` error message]({% link {{ page.version.version }}/common-errors.md %}#restart-transaction).
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
CockroachDB offers an improved leasing system rebuilt atop a stronger form of [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) leadership that ensures that the Raft leader is **always** the range's leaseholder. This new type of lease is called a _Leader lease_, and supersedes [epoch-based leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) and [expiration-based leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#expiration-based-leases-meta-and-system-ranges) leases while combining the performance of the former with the resilience of the latter. **Leader leases are not enabled by default.**
1+
{% include_cached new-in.html version="v25.2" %} CockroachDB offers an improved leasing system rebuilt atop a stronger form of [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) leadership that ensures that the Raft leader is **always** the range's leaseholder. This new type of lease is called a _Leader lease_, and supersedes the former system of having different epoch-based and expiration-based lease types, while combining the performance of the former with the resilience of the latter.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Node liveness is no longer determined by heartbeating a single "liveness range"; instead it is determined using [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases).
2+
3+
However, node heartbeats of a single range are still used to determine:
4+
5+
- Whether a node is still a member of a cluster (this is used by [`cockroach node decommission`]({% link {{ page.version.version }}/cockroach-node.md %}#node-decommission)).
6+
- Whether a node is dead or not (in which case [its leases will be transferred away]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node)).

src/current/_includes/v25.2/metric-names-serverless.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -45,9 +45,9 @@ Name | Description
4545
`jobs.changefeed.resume_retry_error` | Number of changefeed jobs which failed with a retryable error
4646
`keybytes` | Number of bytes taken up by keys
4747
`keycount` | Count of all keys
48-
`leases.epoch` | Number of replica leaseholders using epoch-based leases
48+
`leases.epoch` | [XXX](XXX): Should go away with leader leases? Still in beta.3 Number of replica leaseholders using epoch-based leases
4949
`leases.error` | Number of failed lease requests
50-
`leases.expiration` | Number of replica leaseholders using expiration-based leases
50+
`leases.expiration` | [XXX](XXX): Should go away with leader leases? Still in beta.3 Number of replica leaseholders using expiration-based leases
5151
`leases.success` | Number of successful lease requests
5252
`leases.transfers.error` | Number of failed lease transfers
5353
`leases.transfers.success` | Number of successful lease transfers
@@ -148,10 +148,10 @@ Name | Description
148148
`ranges.underreplicated` | Number of ranges with fewer live replicas than the replication target
149149
`ranges` | Number of ranges
150150
`rebalancing.writespersecond` | Number of keys written (i.e., applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions
151-
`replicas.leaders_not_leaseholders` | Number of replicas that are Raft leaders whose range lease is held by another store
151+
`replicas.leaders_not_leaseholders` | ([XXX](XXX): This seems like it should go away with the advent of leader leases?) Number of replicas that are Raft leaders whose range lease is held by another store
152152
`replicas.leaders` | Number of Raft leaders
153153
`replicas.leaseholders` | Number of lease holders
154-
`replicas.quiescent` | Number of quiesced replicas
154+
`replicas.quiescent` | Number of quiesced replicas [XXX](XXX): Can this metric be removed from docs in a leader leases world v25.2+ ???
155155
`replicas.reserved` | Number of replicas reserved for snapshots
156156
`replicas` | Number of replicas
157157
`requests.backpressure.split` | Number of backpressured writes waiting on a range split. A range will backpressure (roughly) non-system traffic when the range is above the configured size until the range splits. When the rate of this metric is nonzero over extended periods of time, it should be investigated why splits are not occurring.

src/current/_includes/v25.2/misc/basic-terms.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,15 @@ The replica that holds the "range lease." This replica receives and coordinates
2424

2525
For most types of tables and queries, the leaseholder is the only replica that can serve consistent reads (reads that return "the latest" data).
2626

27+
{% include_cached new-in.html version="v25.2" %} The leaseholder is always the same replica as the [Raft leader](#architecture-raft-leader). For more information, see [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases).
28+
2729
### Raft protocol
2830
<a name="architecture-raft"></a>
2931
The [consensus protocol]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) employed in CockroachDB that ensures that your data is safely stored on multiple nodes and that those nodes agree on the current state even if some of them are temporarily disconnected.
3032

3133
### Raft leader
3234
<a name="architecture-raft-leader"></a>
33-
For each range, the replica that is the "leader" for write requests. The leader uses the Raft protocol to ensure that a majority of replicas (the leader and enough followers) agree, based on their Raft logs, before committing the write. The Raft leader is almost always the same replica as the leaseholder.
35+
For each range, the replica that is the "leader" for write requests. The leader uses the Raft protocol to ensure that a majority of replicas (the leader and enough followers) agree, based on their Raft logs, before committing the write. {% include_cached new-in.html version="v25.2" %} The Raft leader is always the same replica as the [leaseholder](#architecture-raft-leader). For more information, see [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases).
3436

3537
### Raft log
3638
A time-ordered log of writes to a range that its replicas have agreed on. This log exists on-disk with each replica and is the range's source of truth for consistent replication.

0 commit comments

Comments
 (0)