You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixes DOC-11671, DOC-12861
Summary of changes:
- Update 'Architecture > Replication Layer > Leader Leases' to remove the
stuff about them being off by default
- While we're in there, also update the 'How leases are transferred
off a dead node' section to match the new reality. Also did various
other edits on that page re: leaseholder vs. Raft leader mismatches
which won't exist anymore
- Revise 'Troubleshoot Self-Hosted Setup > Node liveness issues' so that
the content there stops referring to Ye Olde SPOF liveness range
- Update many, many other places where we said things like "sometimes
the leaseholder and the Raft leader are different, and that can be
sad, but usually they aren't" etc. that no longer apply
- Attempt to rebrand the old "node liveness" stuff as "node heartbeats"
in a few places and clarify that it is only used for a couple of less
central purposes like cluster membership, etc., in v25.2+
- Also remove lots of extemporaneous mentions of "node liveness
heartbeats" etc. from various places where it was sprinkled around.
- Also remove various references to range quiescence, which is not a
thing for Leader leases
|[xxx](xxx): what if any `storeliveness.*.*` metrics do we wanna add here? |
86
87
| <aid="liveness-heartbeatlatency"></a>liveness.heartbeatlatency | {% if include.deployment == 'self-hosted' %}liveness.heartbeatlatency-p90 |{% elsif include.deployment == 'advanced' %}liveness.heartbeatlatency |{% endif %} Node liveness heartbeat latency | If this metric exceeds 1 second, it is a sign of cluster instability. |
87
88
| <aid="liveness-livenodes"></a>liveness.livenodes | liveness.livenodes | Number of live nodes in the cluster (will be 0 if this node is not itself live) | This is a critical metric that tracks the live nodes in the cluster. |
88
89
| distsender.rpc.sent.nextreplicaerror | distsender.rpc.sent.nextreplicaerror | Number of replica-addressed RPCs sent due to per-replica errors |[RPC](architecture/overview.html#overview) errors do not necessarily indicate a problem. This metric tracks remote procedure calls that return a status value other than "success". A non-success status of an RPC should not be misconstrued as a network transport issue. It is database code logic executed on another cluster node. The non-success status is a result of an orderly execution of an RPC that reports a specific logical condition. |
@@ -93,7 +94,7 @@ The **Usage** column explains why each metric is important to visualize in a cus
| leases.transfers.success | leases.transfers.success | Number of successful lease transfers | A high number of [lease](architecture/replication-layer.html#leases) transfers is not a negative or positive signal, rather it is a reflection of the elastic cluster activities. For example, this metric is high during cluster topology changes. A high value is often the reason for NotLeaseHolderErrors which are normal and expected during rebalancing. Observing this metric may provide a confirmation of the cause of such errors. |
96
-
| rebalancing_lease_transfers | rebalancing.lease.transfers | Counter of the number of [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) that occur during replica rebalancing. These lease transfers are tracked by a component that looks for a [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective`[cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. |
97
+
| rebalancing_lease_transfers | rebalancing.lease.transfers | Counter of the number of [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) that occur during replica rebalancing. These lease transfers are tracked by a component that looks for a [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective`[cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. |
97
98
| rebalancing_range_rebalances | {% if include.deployment == 'self-hosted' %}rebalancing.range.rebalances | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the number of [load-based range rebalances]({% link {{ page.version.version }}/architecture/replication-layer.md %}#load-based-replica-rebalancing). This range movement is tracked by a component that looks for [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. |
98
99
| rebalancing_replicas_queriespersecond | {% if include.deployment == 'self-hosted' %}rebalancing.replicas.queriespersecond | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the KV-level requests received per second by a given [store]({% link {{ page.version.version }}/cockroach-start.md %}#store). The store aggregates all of the CPU and QPS stats across all its replicas and then creates a histogram that maintains buckets that can be queried for, e.g., the P95 replica's QPS or CPU. | A high value of this metric could indicate that one of the store's replicas is part of a [hot range]({% link {{ page.version.version }}/understand-hotspots.md %}#hot-range). See also: `rebalancing_replicas_cpunanospersecond`. |
99
100
| rebalancing_replicas_cpunanospersecond | {% if include.deployment == 'self-hosted' %}rebalancing.replicas.cpunanospersecond | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the CPU nanoseconds of execution time per second by a given [store]({% link {{ page.version.version }}/cockroach-start.md %}#store). The store aggregates all of the CPU and QPS stats across all its replicas and then creates a histogram that maintains buckets that can be queried for, e.g., the P95 replica's QPS or CPU. | A high value of this metric could indicate that one of the store's replicas is part of a [hot range]({% link {{ page.version.version }}/understand-hotspots.md %}#hot-range). See also the non-histogram variant: `rebalancing.cpunanospersecond`. |
[Per-replica circuit breakers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#per-replica-circuit-breakers) have the following limitations:
2
2
3
-
- They cannot prevent requests from hanging when the node's [liveness range]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) is unavailable. For more information about troubleshooting a cluster that's having node liveness issues, see [Node liveness issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues).
4
-
- They are not tripped if _all_ replicas of a range [become unavailable]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#db-console-shows-under-replicated-unavailable-ranges), because the circuit breaker mechanism operates per-replica. This means at least one replica needs to be available to receive the request in order for the breaker to trip.
3
+
- They are not tripped if _all_ replicas of a range [become unavailable]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#db-console-shows-under-replicated-unavailable-ranges), because the circuit breaker mechanism operates per-replica. This means at least one replica needs to be available to receive the request in order for the breaker to trip.
Copy file name to clipboardExpand all lines: src/current/_includes/v25.2/known-limitations/select-for-update-limitations.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
By default under `SERIALIZABLE` isolation, locks acquired using `SELECT ... FOR UPDATE` and `SELECT ... FOR SHARE` are implemented as fast, in-memory [unreplicated locks](architecture/transaction-layer.html#unreplicated-locks). If a [lease transfer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) or [range split/merge]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#range-merges) occurs on a range held by an unreplicated lock, the lock is dropped. The following behaviors can occur:
1
+
By default under `SERIALIZABLE` isolation, locks acquired using `SELECT ... FOR UPDATE` and `SELECT ... FOR SHARE` are implemented as fast, in-memory [unreplicated locks](architecture/transaction-layer.html#unreplicated-locks). If a [lease transfer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) or [range split/merge]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#range-merges) occurs on a range held by an unreplicated lock, the lock is dropped. The following behaviors can occur:
2
2
3
3
- The desired ordering of concurrent accesses to one or more rows of a table expressed by your use of `SELECT ... FOR UPDATE` may not be preserved (that is, a transaction _B_ against some table _T_ that was supposed to wait behind another transaction _A_ operating on _T_ may not wait for transaction _A_).
4
4
- The transaction that acquired the (now dropped) unreplicated lock may fail to commit, leading to [transaction retry errors with code `40001`]({% link {{ page.version.version }}/transaction-retry-error-reference.md %}) and the [`restart transaction` error message]({% link {{ page.version.version }}/common-errors.md %}#restart-transaction).
CockroachDB offers an improved leasing system rebuilt atop a stronger form of [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) leadership that ensures that the Raft leader is **always** the range's leaseholder. This new type of lease is called a _Leader lease_, and supersedes [epoch-based leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) and [expiration-based leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#expiration-based-leases-meta-and-system-ranges) leases while combining the performance of the former with the resilience of the latter.**Leader leases are not enabled by default.**
1
+
{% include_cached new-in.html version="v25.2" %} CockroachDB offers an improved leasing system rebuilt atop a stronger form of [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) leadership that ensures that the Raft leader is **always** the range's leaseholder. This new type of lease is called a _Leader lease_, and supersedes the former system of having different epoch-based and expiration-based lease types, while combining the performance of the former with the resilience of the latter.
Node liveness is no longer determined by heartbeating a single "liveness range"; instead it is determined using [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases).
2
+
3
+
However, node heartbeats of a single range are still used to determine:
4
+
5
+
- Whether a node is still a member of a cluster (this is used by [`cockroach node decommission`]({% link {{ page.version.version }}/cockroach-node.md %}#node-decommission)).
6
+
- Whether a node is dead or not (in which case [its leases will be transferred away]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node)).
Copy file name to clipboardExpand all lines: src/current/_includes/v25.2/metric-names-serverless.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -45,9 +45,9 @@ Name | Description
45
45
`jobs.changefeed.resume_retry_error` | Number of changefeed jobs which failed with a retryable error
46
46
`keybytes` | Number of bytes taken up by keys
47
47
`keycount` | Count of all keys
48
-
`leases.epoch` | Number of replica leaseholders using epoch-based leases
48
+
`leases.epoch` | [XXX](XXX): Should go away with leader leases? Still in beta.3 Number of replica leaseholders using epoch-based leases
49
49
`leases.error` | Number of failed lease requests
50
-
`leases.expiration` | Number of replica leaseholders using expiration-based leases
50
+
`leases.expiration` | [XXX](XXX): Should go away with leader leases? Still in beta.3 Number of replica leaseholders using expiration-based leases
51
51
`leases.success` | Number of successful lease requests
52
52
`leases.transfers.error` | Number of failed lease transfers
53
53
`leases.transfers.success` | Number of successful lease transfers
@@ -148,10 +148,10 @@ Name | Description
148
148
`ranges.underreplicated` | Number of ranges with fewer live replicas than the replication target
149
149
`ranges` | Number of ranges
150
150
`rebalancing.writespersecond` | Number of keys written (i.e., applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions
151
-
`replicas.leaders_not_leaseholders` | Number of replicas that are Raft leaders whose range lease is held by another store
151
+
`replicas.leaders_not_leaseholders` | ([XXX](XXX): This seems like it should go away with the advent of leader leases?) Number of replicas that are Raft leaders whose range lease is held by another store
152
152
`replicas.leaders` | Number of Raft leaders
153
153
`replicas.leaseholders` | Number of lease holders
154
-
`replicas.quiescent` | Number of quiesced replicas
154
+
`replicas.quiescent` | Number of quiesced replicas[XXX](XXX): Can this metric be removed from docs in a leader leases world v25.2+ ???
155
155
`replicas.reserved` | Number of replicas reserved for snapshots
156
156
`replicas` | Number of replicas
157
157
`requests.backpressure.split` | Number of backpressured writes waiting on a range split. A range will backpressure (roughly) non-system traffic when the range is above the configured size until the range splits. When the rate of this metric is nonzero over extended periods of time, it should be investigated why splits are not occurring.
Copy file name to clipboardExpand all lines: src/current/_includes/v25.2/misc/basic-terms.md
+3-1
Original file line number
Diff line number
Diff line change
@@ -24,13 +24,15 @@ The replica that holds the "range lease." This replica receives and coordinates
24
24
25
25
For most types of tables and queries, the leaseholder is the only replica that can serve consistent reads (reads that return "the latest" data).
26
26
27
+
{% include_cached new-in.html version="v25.2" %} The leaseholder is always the same replica as the [Raft leader](#architecture-raft-leader). For more information, see [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases).
28
+
27
29
### Raft protocol
28
30
<aname="architecture-raft"></a>
29
31
The [consensus protocol]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) employed in CockroachDB that ensures that your data is safely stored on multiple nodes and that those nodes agree on the current state even if some of them are temporarily disconnected.
30
32
31
33
### Raft leader
32
34
<aname="architecture-raft-leader"></a>
33
-
For each range, the replica that is the "leader" for write requests. The leader uses the Raft protocol to ensure that a majority of replicas (the leader and enough followers) agree, based on their Raft logs, before committing the write. The Raft leader is almost always the same replica as the leaseholder.
35
+
For each range, the replica that is the "leader" for write requests. The leader uses the Raft protocol to ensure that a majority of replicas (the leader and enough followers) agree, based on their Raft logs, before committing the write. {% include_cached new-in.html version="v25.2" %} The Raft leader is always the same replica as the [leaseholder](#architecture-raft-leader). For more information, see [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases).
34
36
35
37
### Raft log
36
38
A time-ordered log of writes to a range that its replicas have agreed on. This log exists on-disk with each replica and is the range's source of truth for consistent replication.
0 commit comments