diff --git a/src/current/_includes/v25.2/essential-metrics.md b/src/current/_includes/v25.2/essential-metrics.md index d3acb380d44..7c958db3f50 100644 --- a/src/current/_includes/v25.2/essential-metrics.md +++ b/src/current/_includes/v25.2/essential-metrics.md @@ -93,7 +93,7 @@ The **Usage** column explains why each metric is important to visualize in a cus |
CockroachDB Metric Name
| {% if include.deployment == 'self-hosted' %}
[Datadog Integration Metric Name](https://docs.datadoghq.com/integrations/cockroachdb/?tab=host#metrics)
(add `cockroachdb.` prefix)
|{% elsif include.deployment == 'advanced' %}
[Datadog Integration Metric Name](https://docs.datadoghq.com/integrations/cockroachdb_dedicated/#metrics)
(add `crdb_dedicated.` prefix)
|{% endif %}
Description
| Usage | | ----------------------------------------------------- | {% if include.deployment == 'self-hosted' %}------ |{% elsif include.deployment == 'advanced' %}---- |{% endif %} ------------------------------------------------------------ | ------------------------------------------------------------ | | leases.transfers.success | leases.transfers.success | Number of successful lease transfers | A high number of [lease](architecture/replication-layer.html#leases) transfers is not a negative or positive signal, rather it is a reflection of the elastic cluster activities. For example, this metric is high during cluster topology changes. A high value is often the reason for NotLeaseHolderErrors which are normal and expected during rebalancing. Observing this metric may provide a confirmation of the cause of such errors. | -| rebalancing_lease_transfers | rebalancing.lease.transfers | Counter of the number of [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) that occur during replica rebalancing. These lease transfers are tracked by a component that looks for a [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. | +| rebalancing_lease_transfers | rebalancing.lease.transfers | Counter of the number of [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) that occur during replica rebalancing. These lease transfers are tracked by a component that looks for a [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. | | rebalancing_range_rebalances | {% if include.deployment == 'self-hosted' %}rebalancing.range.rebalances | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the number of [load-based range rebalances]({% link {{ page.version.version }}/architecture/replication-layer.md %}#load-based-replica-rebalancing). This range movement is tracked by a component that looks for [store-level]({% link {{ page.version.version }}/cockroach-start.md %}#store) load imbalance of either QPS (`rebalancing.queriespersecond`) or CPU usage (`rebalancing.cpunanospersecond`), depending on the value of the `kv.allocator.load_based_rebalancing.objective` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-allocator-load-based-rebalancing-objective). | Used to identify when there has been more rebalancing activity triggered by imbalance between stores (of QPS or CPU). If this is high (when the count is rated), it indicates that more rebalancing activity is taking place due to load imbalance between stores. | | rebalancing_replicas_queriespersecond | {% if include.deployment == 'self-hosted' %}rebalancing.replicas.queriespersecond | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the KV-level requests received per second by a given [store]({% link {{ page.version.version }}/cockroach-start.md %}#store). The store aggregates all of the CPU and QPS stats across all its replicas and then creates a histogram that maintains buckets that can be queried for, e.g., the P95 replica's QPS or CPU. | A high value of this metric could indicate that one of the store's replicas is part of a [hot range]({% link {{ page.version.version }}/understand-hotspots.md %}#hot-range). See also: `rebalancing_replicas_cpunanospersecond`. | | rebalancing_replicas_cpunanospersecond | {% if include.deployment == 'self-hosted' %}rebalancing.replicas.cpunanospersecond | {% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Counter of the CPU nanoseconds of execution time per second by a given [store]({% link {{ page.version.version }}/cockroach-start.md %}#store). The store aggregates all of the CPU and QPS stats across all its replicas and then creates a histogram that maintains buckets that can be queried for, e.g., the P95 replica's QPS or CPU. | A high value of this metric could indicate that one of the store's replicas is part of a [hot range]({% link {{ page.version.version }}/understand-hotspots.md %}#hot-range). See also the non-histogram variant: `rebalancing.cpunanospersecond`. | diff --git a/src/current/_includes/v25.2/known-limitations/per-replica-circuit-breaker-limitations.md b/src/current/_includes/v25.2/known-limitations/per-replica-circuit-breaker-limitations.md index 0abc2b55fec..18ceb4fefed 100644 --- a/src/current/_includes/v25.2/known-limitations/per-replica-circuit-breaker-limitations.md +++ b/src/current/_includes/v25.2/known-limitations/per-replica-circuit-breaker-limitations.md @@ -1,4 +1,3 @@ [Per-replica circuit breakers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#per-replica-circuit-breakers) have the following limitations: -- They cannot prevent requests from hanging when the node's [liveness range]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) is unavailable. For more information about troubleshooting a cluster that's having node liveness issues, see [Node liveness issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues). -- They are not tripped if _all_ replicas of a range [become unavailable]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#db-console-shows-under-replicated-unavailable-ranges), because the circuit breaker mechanism operates per-replica. This means at least one replica needs to be available to receive the request in order for the breaker to trip. \ No newline at end of file +- They are not tripped if _all_ replicas of a range [become unavailable]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#db-console-shows-under-replicated-unavailable-ranges), because the circuit breaker mechanism operates per-replica. This means at least one replica needs to be available to receive the request in order for the breaker to trip. diff --git a/src/current/_includes/v25.2/known-limitations/select-for-update-limitations.md b/src/current/_includes/v25.2/known-limitations/select-for-update-limitations.md index 3597c510ead..894f1f9441a 100644 --- a/src/current/_includes/v25.2/known-limitations/select-for-update-limitations.md +++ b/src/current/_includes/v25.2/known-limitations/select-for-update-limitations.md @@ -1,4 +1,4 @@ -By default under `SERIALIZABLE` isolation, locks acquired using `SELECT ... FOR UPDATE` and `SELECT ... FOR SHARE` are implemented as fast, in-memory [unreplicated locks](architecture/transaction-layer.html#unreplicated-locks). If a [lease transfer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) or [range split/merge]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#range-merges) occurs on a range held by an unreplicated lock, the lock is dropped. The following behaviors can occur: +By default under `SERIALIZABLE` isolation, locks acquired using `SELECT ... FOR UPDATE` and `SELECT ... FOR SHARE` are implemented as fast, in-memory [unreplicated locks](architecture/transaction-layer.html#unreplicated-locks). If a [lease transfer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) or [range split/merge]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#range-merges) occurs on a range held by an unreplicated lock, the lock is dropped. The following behaviors can occur: - The desired ordering of concurrent accesses to one or more rows of a table expressed by your use of `SELECT ... FOR UPDATE` may not be preserved (that is, a transaction _B_ against some table _T_ that was supposed to wait behind another transaction _A_ operating on _T_ may not wait for transaction _A_). - The transaction that acquired the (now dropped) unreplicated lock may fail to commit, leading to [transaction retry errors with code `40001`]({% link {{ page.version.version }}/transaction-retry-error-reference.md %}) and the [`restart transaction` error message]({% link {{ page.version.version }}/common-errors.md %}#restart-transaction). diff --git a/src/current/_includes/v25.2/leader-leases-intro.md b/src/current/_includes/v25.2/leader-leases-intro.md index e2e62cc0ccb..e11070ebf62 100644 --- a/src/current/_includes/v25.2/leader-leases-intro.md +++ b/src/current/_includes/v25.2/leader-leases-intro.md @@ -1 +1 @@ -CockroachDB offers an improved leasing system rebuilt atop a stronger form of [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) leadership that ensures that the Raft leader is **always** the range's leaseholder. This new type of lease is called a _Leader lease_, and supersedes [epoch-based leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) and [expiration-based leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#expiration-based-leases-meta-and-system-ranges) leases while combining the performance of the former with the resilience of the latter. **Leader leases are not enabled by default.** +CockroachDB offers an improved leasing system rebuilt atop a stronger form of [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) leadership that ensures that the Raft leader is always the range's leaseholder, except briefly during [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node). This type of lease is called a _Leader lease_, and supersedes the former system of having different epoch-based and expiration-based lease types, while combining the performance of the former with the resilience of the latter. diff --git a/src/current/_includes/v25.2/leader-leases-node-heartbeat-use-cases.md b/src/current/_includes/v25.2/leader-leases-node-heartbeat-use-cases.md new file mode 100644 index 00000000000..481c9220a35 --- /dev/null +++ b/src/current/_includes/v25.2/leader-leases-node-heartbeat-use-cases.md @@ -0,0 +1,7 @@ +{% include_cached new-in.html version="v25.2" %} For the purposes of [Raft replication]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) and determining the [leaseholder]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-leaseholder) of a [range]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-range), node health is no longer determined by heartbeating a single "liveness range"; instead it is determined using [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases). + +However, node heartbeats of a single range are still used to determine: + +- Whether a node is still a member of a cluster (this is used by [`cockroach node decommission`]({% link {{ page.version.version }}/cockroach-node.md %}#node-decommission)). +- Whether a node is dead (in which case [its leases will be transferred away]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node)). +- How to avoid placing replicas on dead, decommissioning or unhealthy nodes, and to make decisions about lease transfers. diff --git a/src/current/_includes/v25.2/misc/basic-terms.md b/src/current/_includes/v25.2/misc/basic-terms.md index f168f878c63..caf572a056b 100644 --- a/src/current/_includes/v25.2/misc/basic-terms.md +++ b/src/current/_includes/v25.2/misc/basic-terms.md @@ -24,13 +24,15 @@ The replica that holds the "range lease." This replica receives and coordinates For most types of tables and queries, the leaseholder is the only replica that can serve consistent reads (reads that return "the latest" data). +{% include_cached new-in.html version="v25.2" %} The leaseholder is always the same replica as the [Raft leader](#architecture-raft-leader), except briefly during [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node). For more information, refer to [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases). + ### Raft protocol The [consensus protocol]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) employed in CockroachDB that ensures that your data is safely stored on multiple nodes and that those nodes agree on the current state even if some of them are temporarily disconnected. ### Raft leader -For each range, the replica that is the "leader" for write requests. The leader uses the Raft protocol to ensure that a majority of replicas (the leader and enough followers) agree, based on their Raft logs, before committing the write. The Raft leader is almost always the same replica as the leaseholder. +For each range, the replica that is the "leader" for write requests. The leader uses the Raft protocol to ensure that a majority of replicas (the leader and enough followers) agree, based on their Raft logs, before committing the write. {% include_cached new-in.html version="v25.2" %} The Raft leader is always the same replica as the [leaseholder](#architecture-raft-leader). For more information, refer to [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases). ### Raft log A time-ordered log of writes to a range that its replicas have agreed on. This log exists on-disk with each replica and is the range's source of truth for consistent replication. diff --git a/src/current/v25.2/admission-control.md b/src/current/v25.2/admission-control.md index 1218b6391b1..bc44a083f93 100644 --- a/src/current/v25.2/admission-control.md +++ b/src/current/v25.2/admission-control.md @@ -5,7 +5,7 @@ toc: true docs_area: develop --- -CockroachDB supports an admission control system to maintain cluster performance and availability when some nodes experience high load. When admission control is enabled, CockroachDB sorts request and response operations into work queues by priority, giving preference to higher priority operations. Internal operations critical to node health, like [node liveness heartbeats]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues), are high priority. The admission control system also prioritizes transactions that hold [locks]({% link {{ page.version.version }}/crdb-internal.md %}#cluster_locks), to reduce [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) and release locks earlier. +CockroachDB supports an admission control system to maintain cluster performance and availability when some nodes experience high load. When admission control is enabled, CockroachDB sorts request and response operations into work queues by priority, giving preference to higher priority operations. Internal operations critical to node health are high priority. The admission control system also prioritizes transactions that hold [locks]({% link {{ page.version.version }}/crdb-internal.md %}#cluster_locks), to reduce [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) and release locks earlier. ## How admission control works @@ -95,7 +95,7 @@ When you enable or disable admission control settings for one layer, Cockroach L When admission control is enabled, request and response operations are sorted into work queues where the operations are organized by priority and transaction start time. -Higher priority operations are processed first. The criteria for determining higher and lower priority operations is different at each processing layer, and is determined by the CPU and storage I/O of the operation. Write operations in the [KV storage layer]({% link {{ page.version.version }}/architecture/storage-layer.md %}) in particular are often the cause of performance bottlenecks, and admission control prevents [the Pebble storage engine]({% link {{ page.version.version }}/architecture/storage-layer.md %}#pebble) from experiencing high [read amplification]({% link {{ page.version.version }}/architecture/storage-layer.md %}#read-amplification). Critical cluster operations like node heartbeats are processed as high priority, as are transactions that hold [locks]({% link {{ page.version.version }}/crdb-internal.md %}#cluster_locks) in order to avoid [contention]({% link {{ page.version.version }}/performance-recipes.md %}#transaction-contention) and release locks earlier. +Higher priority operations are processed first. The criteria for determining higher and lower priority operations is different at each processing layer, and is determined by the CPU and storage I/O of the operation. Write operations in the [KV storage layer]({% link {{ page.version.version }}/architecture/storage-layer.md %}) in particular are often the cause of performance bottlenecks, and admission control prevents [the Pebble storage engine]({% link {{ page.version.version }}/architecture/storage-layer.md %}#pebble) from experiencing high [read amplification]({% link {{ page.version.version }}/architecture/storage-layer.md %}#read-amplification). Critical cluster operations are processed as high priority, as are transactions that hold [locks]({% link {{ page.version.version }}/crdb-internal.md %}#cluster_locks) in order to avoid [contention]({% link {{ page.version.version }}/performance-recipes.md %}#transaction-contention) and release locks earlier. The transaction start time is used within the priority queue and gives preference to operations with earlier transaction start times. For example, within the high priority queue operations with an earlier transaction start time are processed first. diff --git a/src/current/v25.2/architecture/distribution-layer.md b/src/current/v25.2/architecture/distribution-layer.md index 1d19c2cec62..c4dbda88450 100644 --- a/src/current/v25.2/architecture/distribution-layer.md +++ b/src/current/v25.2/architecture/distribution-layer.md @@ -239,7 +239,7 @@ The distribution layer's `DistSender` receives `BatchRequests` from its own node ### Distribution and replication layer -The distribution layer routes `BatchRequests` to nodes containing ranges of data, which is ultimately routed to the Raft group leader or leaseholder, which are handled in the replication layer. +The distribution layer routes `BatchRequests` to nodes containing ranges of data, which is ultimately routed to the Raft group leader and leaseholder, which are handled in the replication layer. ## What's next? diff --git a/src/current/v25.2/architecture/life-of-a-distributed-transaction.md b/src/current/v25.2/architecture/life-of-a-distributed-transaction.md index 872e3add35d..dc1d41a6ed0 100644 --- a/src/current/v25.2/architecture/life-of-a-distributed-transaction.md +++ b/src/current/v25.2/architecture/life-of-a-distributed-transaction.md @@ -20,8 +20,8 @@ Here's a brief overview of the physical actors, in the sequence with which they' 1. [**SQL Client**](#sql-client-postgresql-wire-protocol) sends a query to your cluster. 1. [**Load Balancing**](#load-balancing-routing) routes the request to CockroachDB nodes in your cluster, which will act as a gateway. 1. [**Gateway**](#gateway) is a CockroachDB node that processes the SQL request and responds to the client. -1. [**Leaseholder**](#leaseholder-node) is a CockroachDB node responsible for serving reads and coordinating writes of a specific range of keys in your query. -1. [**Raft leader**](#raft-leader) is a CockroachDB node responsible for maintaining consensus among your CockroachDB replicas. +1. [**Leaseholder**](#leaseholder-node) is a CockroachDB [replica]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-replica) responsible for serving reads and coordinating writes of a specific range of keys in your query. +1. [**Raft leader**](#raft-leader) is a CockroachDB [replica]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-replica) responsible for maintaining consensus among all of the replicas in a [range]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-range). {% include_cached new-in.html version="v25.2" %} The [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases) system ensures that the Raft leader is always the range's leaseholder, except briefly during [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node). Once the transaction completes, queries traverse these actors in approximately reverse order. We say "approximately" because there might be many leaseholders and Raft leaders involved in a single query, and there is little-to-no interaction with the load balancer during the response. @@ -153,13 +153,11 @@ As we mentioned before, each read operation also updates the timestamp cache. After guaranteeing that there are no existing write intents for the keys, `BatchRequest`'s key-value operations are converted to [Raft operations]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) and have their values converted into write intents. -The leaseholder then proposes these Raft operations to the Raft group leader. The leaseholder and the Raft leader are almost always the same node, but there are situations where the roles might drift to different nodes. However, when the two roles are not collocated on the same physical machine, CockroachDB will attempt to relocate them on the same node at the next opportunity. - ## Raft Leader CockroachDB leverages Raft as its consensus protocol. If you aren't familiar with it, we recommend checking out the details about [how CockroachDB leverages Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), as well as [learning more about how the protocol works at large](http://thesecretlivesofdata.com/raft/). -In terms of executing transactions, the Raft leader receives proposed Raft commands from the leaseholder. Each Raft command is a write that is used to represent an atomic state change of the underlying key-value pairs stored in the storage engine. +In terms of executing transactions, the Raft leader receives proposed Raft commands from the leaseholder. Each Raft command is a write that is used to represent an atomic state change of the underlying key-value pairs stored in the storage engine. {% include_cached new-in.html version="v25.2" %} The [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases) system ensures that the Raft leader is always the range's leaseholder. ### Consensus diff --git a/src/current/v25.2/architecture/reads-and-writes-overview.md b/src/current/v25.2/architecture/reads-and-writes-overview.md index dee87704ae3..e1c0d5ee0fa 100644 --- a/src/current/v25.2/architecture/reads-and-writes-overview.md +++ b/src/current/v25.2/architecture/reads-and-writes-overview.md @@ -53,7 +53,7 @@ In this case: 1. Node 3 (the gateway node) receives the request to write to table 1. 1. The leaseholder for table 1 is on node 1, so the request is routed there. -1. The leaseholder is the same replica as the Raft leader (as is typical), so it simultaneously appends the write to its own Raft log and notifies its follower replicas on nodes 2 and 3. +1. The leaseholder is the same replica as the Raft leader, so it simultaneously appends the write to its own Raft log and notifies its follower replicas on nodes 2 and 3. 1. As soon as one follower has appended the write to its Raft log (and thus a majority of replicas agree based on identical Raft logs), it notifies the leader and the write is committed to the key-values on the agreeing replicas. In this diagram, the follower on node 2 acknowledged the write, but it could just as well have been the follower on node 3. Also note that the follower not involved in the consensus agreement usually commits the write very soon after the others. 1. Node 1 returns acknowledgement of the commit to node 3. 1. Node 3 responds to the client. diff --git a/src/current/v25.2/architecture/replication-layer.md b/src/current/v25.2/architecture/replication-layer.md index 45730a152cc..911f20d56c5 100644 --- a/src/current/v25.2/architecture/replication-layer.md +++ b/src/current/v25.2/architecture/replication-layer.md @@ -40,7 +40,7 @@ A third replica type, the "non-voting" replica, does not participate in Raft ele For the current values of the Raft election timeout, the Raft proposal timeout, and other important intervals, see [Important values and timeouts](#important-values-and-timeouts). -Once a node receives a `BatchRequest` for a range it contains, it converts those KV operations into Raft commands. Those commands are proposed to the Raft group leader––which is what makes it ideal for the [leaseholder](#leases) and the Raft leader to be one in the same––and written to the Raft log. +Once a node receives a `BatchRequest` for a range it contains, it converts those KV operations into Raft commands. Those commands are proposed to the Raft group leader––which is also the [leaseholder](#leases)––and written to the Raft log. For a great overview of Raft, we recommend [The Secret Lives of Data](http://thesecretlivesofdata.com/raft/). @@ -122,9 +122,15 @@ To limit the impact of snapshot ingestion on a node with a [provisioned rate]({% A single node in the Raft group acts as the leaseholder, which is the only node that can serve reads or propose writes to the Raft group leader (both actions are received as `BatchRequests` from [`DistSender`]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#distsender)). -CockroachDB attempts to elect a leaseholder who is also the Raft group leader, which can also optimize the speed of writes. When the leaseholder is sent a write request, a majority of the replica nodes must be able to communicate with each other to coordinate the write. This ensures that the most recent write is always available to subsequent reads. +CockroachDB ensures that the leaseholder is also the Raft group leader via the [Leader leases](#leader-leases) mechanism. This optimizes the speed of writes, and makes the cluster more robust against network partitions and node liveness failures. -If there is no leaseholder, any node receiving a request will attempt to become the leaseholder for the range. To prevent two nodes from acquiring the lease, the requester includes a copy of the last valid lease it had; if another node became the leaseholder, its request is ignored. +When the leaseholder is sent a write request, a majority of the replica nodes must be able to communicate with each other to coordinate the write. This ensures that the most recent write is always available to subsequent reads. + +If there is no leaseholder, any node receiving a request will attempt to become the leaseholder for the range. + +To extend its leases, each node must also remain the Raft leader, as described in [Leader leases](#leader-leases). When a node disconnects, it stops updating its _store liveness_, causing the node to [lose all of its leases](#how-leases-are-transferred-from-a-dead-node). + +A table's meta and system ranges (detailed in [Distribution Layer]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#meta-ranges)) are treated as normal key-value data, and therefore have leases just like table data. When serving [strongly-consistent (aka "non-stale") reads]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#reading), leaseholders bypass Raft; for the leaseholder's writes to have been committed in the first place, they must have already achieved consensus, so a second consensus on the same data is unnecessary. This has the benefit of not incurring latency from networking round trips required by Raft and greatly increases the speed of reads (without sacrificing consistency). @@ -132,11 +138,15 @@ CockroachDB is considered a CAP-Consistent (CP) system under the [CAP theorem](h #### Co-location with Raft leadership -The range lease is completely separate from Raft leadership, and so without further efforts, Raft leadership and the range lease might not be held by the same replica. However, we can optimize query performance by making the same node both Raft leader and the leaseholder; it reduces network round trips if the leaseholder receiving the requests can simply propose the Raft commands to itself, rather than communicating them to another node. +The range lease is always colocated with Raft leadership via the [Leader leases](#leader-leases) mechanism, except briefly during [lease transfers](#how-leases-are-transferred-from-a-dead-node). This reduces network round trips since the leaseholder receiving the requests can simply propose the Raft commands to itself, rather than communicating them to another node. + +It also increases robustness against network partitions and outages due to liveness failures. + +For more information, refer to [Leader leases](#leader-leases). -To achieve this, each lease renewal or transfer also attempts to collocate them. In practice, that means that the mismatch is rare and self-corrects quickly. +#### Epoch-based leases -#### Epoch-based leases (table data) +{% include_cached new-in.html version="v25.2" %} Epoch-based leases are disabled by default in favor of [Leader leases](#leader-leases). To manage leases for table data, CockroachDB implements a notion of "epochs," which are defined as the period between a node joining a cluster and a node disconnecting from a cluster. To extend its leases, each node must periodically update its liveness record, which is stored on a system range key. When a node disconnects, it stops updating the liveness record, and the epoch is considered changed. This causes the node to [lose all of its leases](#how-leases-are-transferred-from-a-dead-node) a few seconds later when the liveness record expires. @@ -146,21 +156,19 @@ Because leases do not expire until a node disconnects from a cluster, leaseholde A table's meta and system ranges (detailed in the [distribution layer]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#meta-ranges)) are treated as normal key-value data, and therefore have leases just like table data. -However, unlike table data, system ranges cannot use epoch-based leases because that would create a circular dependency: system ranges are already being used to implement epoch-based leases for table data. Therefore, system ranges use expiration-based leases instead. Expiration-based leases expire at a particular timestamp (typically after a few seconds). However, as long as a node continues proposing Raft commands, it continues to extend the expiration of its leases. If it doesn't, the next node containing a replica of the range that tries to read from or write to the range will become the leaseholder. - -#### Leader leases - +Unlike table data, system ranges use expiration-based leases; expiration-based leases expire at a particular timestamp (typically after a few seconds). However, as long as a node continues proposing Raft commands, it continues to extend the expiration of its leases. If it doesn't, the next node containing a replica of the range that tries to read from or write to the range will become the leaseholder. +Expiration-based leases are also used temporarily during operations like lease transfers, until the new Raft leader can be fortified based on store liveness, as described in [Leader leases](#leader-leases). -{% include feature-phases/preview.md %} +#### Leader leases -{% include {{ page.version.version }}/leader-leases-intro.md %} +{% include_cached new-in.html version="v25.2" %} {% include {{ page.version.version }}/leader-leases-intro.md %} Leader leases rely on a shared, store-wide failure detection mechanism for triggering new Raft elections. [Stores]({% link {{ page.version.version }}/cockroach-start.md %}#store) participate in Raft leader elections by "fortifying" a candidate replica based on that replica's _store liveness_, as determined among a quorum of all the node's stores. A replica can **only** become the Raft leader if it is so fortified. After the fortified Raft leader is chosen, it is then also established as the leaseholder. Support for the lease is provided as long as the Raft leader's store liveness remains supported by a quorum of stores in the Raft group. This provides the fortified Raft leader with a guarantee that it will not lose leadership until **after** it has lost store liveness support. This guarantee enables a number of improvements to the performance and resiliency of CockroachDB's Raft implementation that were prevented by the need to handle cases where Raft leadership and range leases were not colocated. -Importantly, since Leader leases rely on a quorum of stores in the Raft group, they remove the need for the single point of failure (SPOF) that was the [node liveness range]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues), As a result, Leader leases are not vulnerable to the scenario possible under the previous leasing regime where a leaseholder was [partitioned]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#network-partition) from its followers (including a follower that was the Raft leader) but still heartbeating the node liveness range. Before Leader leases, this scenario would result in an indefinite outage that lasted as long as the lease was held by the partitioned node. +Importantly, since Leader leases rely on a quorum of stores in the Raft group, they remove the need for the single point of failure (SPOF) that was the node liveness range. As a result, Leader leases are not vulnerable to the scenario possible under the previous leasing regime (prior to CockroachDB v25.2) where a leaseholder was [partitioned]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#network-partition) from its followers (including a follower that was the Raft leader) but still heartbeating the node liveness range. Before Leader leases, this scenario would result in an indefinite outage that lasted as long as the lease was held by the partitioned node. Based on Cockroach Labs' internal testing, leader leases provide the following user-facing benefits: @@ -168,29 +176,19 @@ Based on Cockroach Labs' internal testing, leader leases provide the following u - Outages caused by liveness failures last less than 1 second, since liveness is now determined by a store-level detection mechanism, not a single node liveness range. - Performance is equivalent (within less than 1%) to epoch-based leases on a 100 node cluster of 32 vCPU machines with 8 stores each. -To enable Leader leases for testing with your workload, use the [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) `kv.raft.leader_fortification.fraction_enabled`, which controls the fraction of ranges for which the Raft leader fortification protocol is enabled. Leader fortification is needed for a range to use a Leader lease. It can be set to `0.0` to disable leader fortification and, by extension, Leader leases. It can be set to `1.0` to enable leader fortification for all ranges and, by extension, use Leader leases for all ranges that do not require expiration-based leases. It can be set to a value between `0.0` and `1.0` to gradually roll out Leader leases across the ranges in a cluster. - -{% include_cached copy-clipboard.html %} -~~~ sql -SET CLUSTER SETTING kv.raft.leader_fortification.fraction_enabled = 1.0 -~~~ +{% include {{ page.version.version }}/leader-leases-node-heartbeat-use-cases.md %} -#### How leases are transferred from a dead node +### How leases are transferred from a dead node -When the cluster needs to access a range on a leaseholder node that is dead, that range's lease must be transferred to a healthy node. This process is as follows: +When a cluster needs to access a range on a leaseholder node that is dead, the lease must be transferred to a healthy node. The process is as follows: -1. The dead node's liveness record, which is stored in a system range, has an expiration time of `{{site.data.constants.cockroach_range_lease_duration}}`, and is heartbeated half as often (`{{site.data.constants.cockroach_range_lease_duration}} / 2`). When the node dies, the amount of time the cluster has to wait for the record to expire varies, but should be no more than a few seconds. -1. A healthy node attempts to acquire the lease. This is rejected because lease acquisition can only happen on the Raft leader, which the healthy node is not (yet). Therefore, a Raft election must be held. -1. The rejected attempt at lease acquisition [unquiesces]({% link {{ page.version.version }}/ui-replication-dashboard.md %}#replica-quiescence) ("wakes up") the range associated with the lease. -1. What happens next depends on whether the lease is on [table data](#epoch-based-leases-table-data) or [meta ranges or system ranges](#expiration-based-leases-meta-and-system-ranges): - - If the lease is on [meta or system ranges](#expiration-based-leases-meta-and-system-ranges), the node that unquiesced the range checks if the Raft leader is alive according to the liveness record. If the leader is not alive, it kicks off a campaign to try and win Raft leadership so it can become the leaseholder. - - If the lease is on [table data](#epoch-based-leases-table-data), the "is the leader alive?" check described above is skipped and an election is called immediately. The check is skipped since it would introduce a circular dependency on the liveness record used for table data, which is itself stored in a system range. -1. The Raft election is held and a new leader is chosen from among the healthy nodes. -1. The lease acquisition can now be processed by the newly elected Raft leader. +1. Detection of Node Failure: The _store liveness_ mechanism described in [Leader leases](#leader-leases) detects node failures through its store-wide heartbeating process. If a node becomes unresponsive, its store liveness support is withdrawn, marking it as unavailable. +1. Raft Leadership Election: A Raft election is initiated to establish a new leader for the range. This step is necessary because lease acquisition can only occur on the Raft leader. The election process includes a store liveness component to fortify the new leader, as described in [Leader leases](#leader-leases). +1. Lease Acquisition: Once a new Raft leader is elected, the lease acquisition process can proceed. The new leader acquires the lease. -This process should take no more than a few seconds for liveness expiration plus the cost of 2 network roundtrips: 1 for Raft leader election, and 1 for lease acquisition. +The entire process, from detecting the node failure to acquiring the lease on a new node, should complete within a few seconds. -Finally, note that the process described above is lazily initiated: it only occurs when a new request comes in for the range associated with the lease. +This process is lazily initiated and only occurs when a new request is made that requires access to the range associated with the lease on the dead node. #### Leaseholder rebalancing @@ -240,7 +238,7 @@ Whenever there are changes to a cluster's number of nodes, the members of Raft g - **Nodes added**: The new node communicates information about itself to other nodes, indicating that it has space available. The cluster then rebalances some replicas onto the new node. -- **Nodes going offline**: If a member of a Raft group ceases to respond, after 5 minutes, the cluster begins to rebalance by replicating the data the downed node held onto other nodes. +- **Nodes going offline**: If a member of a Raft group ceases to respond, the cluster begins to rebalance by replicating the data the downed node held onto other nodes. Rebalancing is achieved by using a snapshot of a replica from the leaseholder, and then sending the data to another node over [gRPC]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#grpc). After the transfer has been completed, the node with the new replica joins that range's Raft group; it then detects that its latest timestamp is behind the most recent entries in the Raft log and it replays all of the actions in the Raft log on itself. @@ -272,7 +270,6 @@ Constant | Default value | Notes [Raft](#raft) proposal timeout | {{site.data.constants.cockroach_raft_reproposal_timeout_ticks}} * {{site.data.constants.cockroach_raft_tick_interval}} | Controlled by `COCKROACH_RAFT_REPROPOSAL_TIMEOUT_TICKS`, which is then multiplied by the default tick interval to determine the value. [Lease interval](#how-leases-are-transferred-from-a-dead-node) | {{site.data.constants.cockroach_range_lease_duration}} | Controlled by `COCKROACH_RANGE_LEASE_DURATION`. [Lease acquisition timeout](#how-leases-are-transferred-from-a-dead-node) | {{site.data.constants.cockroach_range_lease_acquisition_timeout}} | -[Node heartbeat interval](#how-leases-are-transferred-from-a-dead-node) | {{site.data.constants.cockroach_range_lease_duration}} / 2 | Used to determine if you're having [node liveness issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues). This is calculated as one half of the lease interval. Raft tick interval | {{site.data.constants.cockroach_raft_tick_interval}} | Controlled by `COCKROACH_RAFT_TICK_INTERVAL`. Used to calculate various replication-related timeouts. ## Interactions with other layers diff --git a/src/current/v25.2/architecture/transaction-layer.md b/src/current/v25.2/architecture/transaction-layer.md index 66085f86f58..4bf7eff4426 100644 --- a/src/current/v25.2/architecture/transaction-layer.md +++ b/src/current/v25.2/architecture/transaction-layer.md @@ -129,7 +129,7 @@ The closed timestamps subsystem works by propagating information from leaseholde Once the follower replica has applied the abovementioned Raft commands, it has all the data necessary to serve reads with timestamps less than or equal to the closed timestamp. -Note that closed timestamps are valid even if the leaseholder changes, since they are preserved across [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data). Once a lease transfer occurs, the new leaseholder will not break the closed timestamp promise made by the old leaseholder. +Note that closed timestamps are valid even if the leaseholder changes, since they are preserved across [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node). Once a lease transfer occurs, the new leaseholder will not break the closed timestamp promise made by the old leaseholder. Closed timestamps provide the guarantees that are used to provide support for low-latency historical (stale) reads, also known as [Follower Reads]({% link {{ page.version.version }}/follower-reads.md %}). Follower reads can be particularly useful in [multi-region deployments]({% link {{ page.version.version }}/multiregion-overview.md %}). diff --git a/src/current/v25.2/cluster-setup-troubleshooting.md b/src/current/v25.2/cluster-setup-troubleshooting.md index 35dfd522e9f..68e1a47844a 100644 --- a/src/current/v25.2/cluster-setup-troubleshooting.md +++ b/src/current/v25.2/cluster-setup-troubleshooting.md @@ -236,6 +236,8 @@ then you might have a network partition. {% include common/network-partitions.md %} +{% include_cached new-in.html version="v25.2" %} With the introduction of [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases), most network partitions between a leaseholder and its followers should heal in a few seconds. + **Solution:** To identify a network partition: @@ -286,23 +288,6 @@ Failed running "sql" **Solution:** To successfully connect to the cluster, you must first either generate a client certificate or create a password for the user. -#### Cannot create new connections to cluster for up to 40 seconds after a node dies - -When a node [dies abruptly and/or loses its network connection to the cluster](#node-liveness-issues), the following behavior can occur: - -1. For a period of up to 40 seconds, clients trying to connect with [username and password authentication]({% link {{ page.version.version }}/authentication.md %}#client-authentication) cannot create new connections to any of the remaining nodes in the cluster. -1. Applications start timing out when trying to connect to the cluster during this window. - -The reason this happens is as follows: - -- Username and password information is stored in a system range. -- Since all system ranges are located [near the beginning of the keyspace]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#monolithic-sorted-map-structure), the system range containing the username/password info can sometimes be colocated with another system range that is used to determine [node liveness](#node-liveness-issues). -- If the username/password info and the node liveness record are stored together as described above, it can take extra time for the lease on this range to be transferred to another node. Normally, [lease transfers take a few seconds]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node), but in this case it may require multiple rounds of consensus to determine that the node in question is actually dead (the node liveness record check may be retried several times before failing). - -For more information about how lease transfers work when a node dies, see [How leases are transferred from a dead node]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node). - -The solution is to [use connection pooling]({% link {{ page.version.version }}/connection-pooling.md %}). - ## Clock sync issues #### A node's timezone data has changed @@ -428,7 +413,6 @@ Symptoms of disk stalls include: - Bad cluster write performance, usually in the form of a substantial drop in QPS for a given workload. - [Node liveness issues](#node-liveness-issues). -- Writes on one node come to a halt. This can happen because in rare cases, a node may be able to perform liveness checks (which involve writing to disk) even though it cannot write other data to disk due to one or more slow/stalled calls to `fsync`. Because the node is passing its liveness checks, it is able to hang onto its leases even though it cannot make progress on the ranges for which it is the leaseholder. This wedged node has a ripple effect on the rest of the cluster such that all processing of the ranges whose leaseholders are on that node basically grinds to a halt. As mentioned above, CockroachDB's disk stall detection will attempt to shut down the node when it detects this state. Causes of disk stalls include: @@ -444,7 +428,9 @@ CockroachDB's built-in disk stall detection works as follows: - `file write stall detected: %s` -- During [node liveness heartbeats](#node-liveness-issues), the [storage engine]({% link {{ page.version.version }}/architecture/storage-layer.md %}) writes to disk as part of the node liveness heartbeat process. +- During [store liveness]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases) heartbeats, the [storage engine]({% link {{ page.version.version }}/architecture/storage-layer.md %}) writes to disk. + +{% include_cached new-in.html version="v25.2" %} {% include {{ page.version.version }}/leader-leases-node-heartbeat-use-cases.md %} #### Disk utilization is different across nodes in the cluster @@ -473,7 +459,7 @@ Because [compaction]({% link {{ page.version.version }}/architecture/storage-lay {% include {{page.version.version}}/storage/compaction-concurrency.md %} {{site.data.alerts.end}} -If these issues remain unresolved, affected nodes will miss their liveness heartbeats, causing the cluster to lose nodes and eventually become unresponsive. +If these issues remain unresolved, affected nodes will eventually become unresponsive. **Solution:** To diagnose and resolve an excessive workload concurrency issue: @@ -601,13 +587,13 @@ To see which of your [localities]({% link {{ page.version.version }}/cockroach-s ## Node liveness issues -"Node liveness" refers to whether a node in your cluster has been determined to be "dead" or "alive" by the rest of the cluster. This is achieved using checks that ensure that each node connected to the cluster is updating its liveness record. This information is shared with the rest of the cluster using an internal gossip protocol. +"Node liveness" refers to whether a node in your cluster has been determined to be "dead" or "alive" by the rest of the cluster. {% include_cached new-in.html version="v25.2" %} With [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases), node liveness is managed based on [store]({% link {{ page.version.version }}/cockroach-start.md %}#store) liveness rather than through a centralized node liveness range (as in previous versions less than v25.2); this reduces single points of failure and improves resilience to [network partitions](#network-partition) and other faults. Common reasons for node liveness issues include: -- Heavy I/O load on the node. Because each node needs to update a liveness record on disk, maxing out disk bandwidth can cause liveness heartbeats to be missed. See also: [Capacity planning issues](#capacity-planning-issues). -- A [disk stall](#disk-stalls). This will cause node liveness issues for the same reasons as listed above. -- [Insufficient CPU for the workload](#cpu-is-insufficient-for-the-workload). This can eventually cause nodes to miss their liveness heartbeats and become unresponsive. +- Heavy I/O load on the node. Nodes running CockroachDB v25.2 and later no longer need to update a centralized node liveness record, but heavy I/O can still impact [store liveness]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases). See also: [Capacity planning issues](#capacity-planning-issues). +- A [disk stall](#disk-stalls). This can cause node liveness issues for the same reasons as listed above. +- [Insufficient CPU for the workload](#cpu-is-insufficient-for-the-workload): This can cause nodes to be marked as unresponsive. - [Networking issues](#networking-issues) with the node. The [DB Console][db_console] provides several ways to check for node liveness issues in your cluster: @@ -616,22 +602,20 @@ The [DB Console][db_console] provides several ways to check for node liveness is - [Check command commit latency]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#command-commit-latency) {{site.data.alerts.callout_info}} -For more information about how node liveness works, see [Replication Layer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data). +For more information about how node liveness works, see [Replication Layer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases). {{site.data.alerts.end}} #### Impact of node failure is greater than 10 seconds -When the cluster needs to access a range on a leaseholder node that is dead, that range's [lease must be transferred to a healthy node]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node). In theory, this process should take no more than a few seconds for liveness expiration plus the cost of several network roundtrips. - -In production, lease transfer upon node failure can take longer than expected. In {{ page.version.version }}, this is observed in the following scenarios: +{% include_cached new-in.html version="v25.2" %} With [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases), the impact of a node failure is significantly reduced. The dependency on a single node liveness range that existed in versions less than v25.2 has been eliminated. Leader leases use a store-wide failure detection mechanism that ensures that lease transfers occur more efficiently. -- **The leaseholder node for the liveness range fails.** The liveness range is a system range that [stores the liveness record]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) for each node on the cluster. If a node fails and is also the leaseholder for the liveness range, operations cannot proceed until the liveness range is [transferred to a new leaseholder]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node) and the liveness record is made available to other nodes. This can cause momentary cluster unavailability. +- **Lease Transfer Efficiency:** When a node fails, the [lease transfer process]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node) is expedited due to the decentralized store liveness checks. This process should complete within a few seconds, as there is no longer a dependency on a single node liveness range. - **Network or DNS issues cause connection issues between nodes.** If there is no live server for the IP address or DNS lookup, connection attempts to a node will not return an immediate error, but will hang [until timing out]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#grpc). This can cause unavailability and prevent a speedy movement of leases and recovery. CockroachDB avoids contacting unresponsive nodes or DNS during certain performance-critical operations, and the connection issue should generally resolve in 10-30 seconds. However, an attempt to contact an unresponsive node could still occur in other scenarios that are not yet addressed. -- **A node's disk stalls.** A [disk stall](#disk-stalls) on a node can cause write operations to stall indefinitely, also causes the node's heartbeats to fail since the storage engine cannot write to disk as part of the heartbeat, and may cause read requests to fail if they are waiting for a conflicting write to complete. Lease acquisition from this node can stall indefinitely until the node is shut down or recovered. Pebble detects most stalls and will terminate the `cockroach` process after 20 seconds, but there are gaps in its detection. In v22.1.2+ and v22.2+, each lease acquisition attempt on an unresponsive node [times out after a few seconds]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node). However, CockroachDB can still appear to stall as these timeouts are occurring. +- **A node's disk stalls.** A [disk stall](#disk-stalls) on a node can cause write operations to stall indefinitely. Pebble detects most stalls and will terminate the `cockroach` process after 20 seconds, but there are gaps in its detection. Each lease acquisition attempt on an unresponsive node [times out after a few seconds]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node). However, CockroachDB can still appear to stall as these timeouts are occurring. -- **Otherwise unresponsive nodes.** Internal deadlock due to faulty code, resource exhaustion, OS/hardware issues, and other arbitrary failures can make a node unresponsive. This can cause leases to become stuck in certain cases, such as when a response from the previous leaseholder is needed in order to move the lease. +- **Otherwise unresponsive nodes.** Internal deadlock due to faulty code, resource exhaustion, OS/hardware issues, and other arbitrary failures can make a node unresponsive. **Solution:** If you are experiencing intermittent network or connectivity issues, first [shut down the affected nodes]({% link {{ page.version.version }}/node-shutdown.md %}) temporarily so that nodes phasing in and out do not cause disruption. diff --git a/src/current/v25.2/cockroach-node.md b/src/current/v25.2/cockroach-node.md index 26a680b93e6..485768d4bd8 100644 --- a/src/current/v25.2/cockroach-node.md +++ b/src/current/v25.2/cockroach-node.md @@ -178,7 +178,7 @@ Field | Description `updated_at` | The date and time when the node last recorded the information displayed in this command's output. When healthy, a new status should be recorded every 10 seconds or so, but when unhealthy this command's stats may be much older.

**Required flag:** None `started_at` | The date and time when the node was started.

**Required flag:** None `replicas_leaders` | The number of range replicas on the node that are the Raft leader for their range. See `replicas_leaseholders` below for more details.

**Required flag:** `--ranges` or `--all` -`replicas_leaseholders` | The number of range replicas on the node that are the leaseholder for their range. A "leaseholder" replica handles all read requests for a range and directs write requests to the range's Raft leader (usually the same replica as the leaseholder).

**Required flag:** `--ranges` or `--all` +`replicas_leaseholders` | The number of range replicas on the node that are the leaseholder for their range. A "leaseholder" replica handles all read requests for a range and directs write requests to the range's Raft leader (usually the same replica as the leaseholder).

**Required flag:** `--ranges` or `--all` `ranges` | The number of ranges that have replicas on the node.

**Required flag:** `--ranges` or `--all` `ranges_unavailable` | The number of unavailable ranges that have replicas on the node.

**Required flag:** `--ranges` or `--all` `ranges_underreplicated` | The number of underreplicated ranges that have replicas on the node.

**Required flag:** `--ranges` or `--all` diff --git a/src/current/v25.2/cockroachdb-feature-availability.md b/src/current/v25.2/cockroachdb-feature-availability.md index c1aae75248e..d14d14cca52 100644 --- a/src/current/v25.2/cockroachdb-feature-availability.md +++ b/src/current/v25.2/cockroachdb-feature-availability.md @@ -275,7 +275,7 @@ Command | Description ### Leader leases -{% include {{ page.version.version }}/leader-leases-intro.md %} +{% include_cached new-in.html version="v25.2" %} {% include {{ page.version.version }}/leader-leases-intro.md %} For more information, see [Architecture > Replication Layer > Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases). diff --git a/src/current/v25.2/common-issues-to-monitor.md b/src/current/v25.2/common-issues-to-monitor.md index 60595f5c993..0071b4f7d7e 100644 --- a/src/current/v25.2/common-issues-to-monitor.md +++ b/src/current/v25.2/common-issues-to-monitor.md @@ -112,7 +112,7 @@ Issues at the storage layer, including an [inverted LSM]({% link {{ page.version #### Node health -If [issues at the storage layer](#lsm-health) remain unresolved, affected nodes will miss their liveness heartbeats, causing the cluster to lose nodes and eventually become unresponsive. +If [issues at the storage layer](#lsm-health) remain unresolved, affected nodes will eventually become unresponsive. - The [**Node status**]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}#node-status) on the Cluster Overview page indicates whether nodes are online (`LIVE`) or have crashed (`SUSPECT` or `DEAD`). @@ -273,7 +273,7 @@ Monitor storage capacity and disk performance: #### Storage capacity -CockroachDB requires disk space in order to accept writes and report node liveness. When a node runs out of disk space, it [shuts down](#node-health) and cannot be restarted until space is freed up. +CockroachDB requires disk space in order to accept writes. When a node runs out of disk space, it [shuts down](#node-health) and cannot be restarted until space is freed up. - The [**Capacity**]({% link {{ page.version.version }}/ui-storage-dashboard.md %}#capacity) graph on the Overview and Storage dashboards shows the available and used disk capacity in the CockroachDB [store]({% link {{ page.version.version }}/cockroach-start.md %}#store). @@ -304,12 +304,12 @@ With insufficient disk I/O, you may also see: #### Node heartbeat latency -Because each node needs to update a liveness record on disk, maxing out disk bandwidth can cause liveness heartbeats to be missed. - -- The [**Node Heartbeat Latency: 99th percentile**]({% link {{ page.version.version }}/ui-distributed-dashboard.md %}#node-heartbeat-latency-99th-percentile) and [**Node Heartbeat Latency: 90th percentile**]({% link {{ page.version.version }}/ui-distributed-dashboard.md %}#node-heartbeat-latency-90th-percentile) graphs on the [Distributed Dashboard]({% link {{ page.version.version }}/ui-distributed-dashboard.md %}) show the time elapsed between [node liveness]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues) heartbeats. +- The [**Node Heartbeat Latency: 99th percentile**]({% link {{ page.version.version }}/ui-distributed-dashboard.md %}#node-heartbeat-latency-99th-percentile) and [**Node Heartbeat Latency: 90th percentile**]({% link {{ page.version.version }}/ui-distributed-dashboard.md %}#node-heartbeat-latency-90th-percentile) graphs on the [Distributed Dashboard]({% link {{ page.version.version }}/ui-distributed-dashboard.md %}) show the time elapsed between node heartbeats. {% include {{ page.version.version }}/prod-deployment/healthy-node-heartbeat-latency.md %} +{% include_cached new-in.html version="v25.2" %} {% include {{ page.version.version }}/leader-leases-node-heartbeat-use-cases.md %} + #### Command commit latency - The **Command Commit Latency: 50th percentile** and **Command Commit Latency: 99th percentile** graphs on the [Storage dashboard]({% link {{ page.version.version }}/ui-storage-dashboard.md %}) show how quickly [Raft commands]({% link {{ page.version.version }}/architecture/replication-layer.md %}) are being committed by nodes in the cluster. This is a good signal of I/O load. diff --git a/src/current/v25.2/crdb-internal.md b/src/current/v25.2/crdb-internal.md index cab2a7e7424..248d37d0931 100644 --- a/src/current/v25.2/crdb-internal.md +++ b/src/current/v25.2/crdb-internal.md @@ -54,7 +54,6 @@ Table name | Description| Use in production [`index_usage_statistics`](#index_usage_statistics) | Contains statistics about the primary and secondary indexes used in statements.| ✓ `invalid_objects` | Contains information about invalid objects in your cluster.| ✗ `jobs` | Contains information about [jobs]({% link {{ page.version.version }}/show-jobs.md %}) running on your cluster.| ✗ -`kv_node_liveness` | Contains information about [node liveness]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues).| ✗ `kv_node_status` | Contains information about node status at the [key-value layer]({% link {{ page.version.version }}/architecture/storage-layer.md %}).| ✗ `kv_store_status` | Contains information about the key-value store for your cluster.| ✗ `leases` | Contains information about [leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) in your cluster.| ✗ diff --git a/src/current/v25.2/critical-log-messages.md b/src/current/v25.2/critical-log-messages.md index e9af068730e..82a12a6fac1 100644 --- a/src/current/v25.2/critical-log-messages.md +++ b/src/current/v25.2/critical-log-messages.md @@ -61,7 +61,7 @@ toc: true - **Severity**: Medium - **Description**: At the time of failure, there was a network-related issue that occurred in the environment that affected the listed node. - - **Impact**: Any leaseholders that are on the affected node will be unavailable and other nodes will need to re-elect a new leaseholder. As leaseholder election can take up to 9 seconds, the SQL service latency can increase significantly during this time, if records are accessed from a leaseholder on the impacted node. + - **Impact**: Any leaseholders that are on the affected node will be unavailable and other nodes will need to re-elect a new leaseholder. As leaseholder election can take multiple seconds, the SQL service latency can increase significantly during this time, if records are accessed from a leaseholder on the impacted node. - **Action**: Check if the node has experienced one of the following: - The user has purposefully removed the node from the cluster. - Asymmetrical network partitioning. diff --git a/src/current/v25.2/data-resilience.md b/src/current/v25.2/data-resilience.md index eee468bdd18..27b25203458 100644 --- a/src/current/v25.2/data-resilience.md +++ b/src/current/v25.2/data-resilience.md @@ -23,7 +23,7 @@ For a practical guide on how CockroachDB uses Raft to replicate, distribute, and - [**Multi-active availability**]({% link {{ page.version.version }}/multi-active-availability.md %}): CockroachDB's built-in [Raft replication]({% link {{ page.version.version }}/architecture/replication-layer.md %}) stores data safely and consistently on multiple nodes to ensure no downtime even during a temporary node outage. [Replication controls]({% link {{ page.version.version }}/configure-replication-zones.md %}) allow you to configure the number and location of [replicas]({% link {{ page.version.version }}/architecture/glossary.md %}#replica) to suit a deployment. - For more detail on planning for single-region or multi-region recovery, refer to [Single-region survivability planning]({% link {{ page.version.version }}/disaster-recovery-planning.md %}#single-region-survivability-planning) or [Multi-region survivability planning]({% link {{ page.version.version }}/disaster-recovery-planning.md %}#multi-region-survivability-planning). -- [**Advanced fault tolerance**]({% link {{ page.version.version }}/demo-cockroachdb-resilience.md %}): Capabilities built in to CockroachDB to perform routine maintenance operations with minimal impact to foreground performance. For example, [online schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}), [write-ahead log failover]({% link {{ page.version.version }}/cockroach-start.md %}#write-ahead-log-wal-failover). +- [**Advanced fault tolerance**]({% link {{ page.version.version }}/demo-cockroachdb-resilience.md %}): Capabilities built in to CockroachDB to perform routine maintenance operations with minimal impact to foreground performance. For example, [online schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}), [write-ahead log failover]({% link {{ page.version.version }}/cockroach-start.md %}#write-ahead-log-wal-failover), and [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases). - [**Logical data replication (LDR)**]({% link {{ page.version.version }}/logical-data-replication-overview.md %}) (Preview): A cross-cluster replication tool between active CockroachDB clusters, which supports a range of topologies. LDR provides eventually consistent, table-level replication between the clusters. Individually, each active cluster uses CockroachDB multi-active availability to achieve low, single-region write latency with transactionally consistent writes using Raft replication. ### Choose an HA strategy @@ -113,4 +113,4 @@ CockroachDB is designed to recover automatically; however, building backups or P - [Physical Cluster Replication Technical Overview]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) - [Backup Architecture]({% link {{ page.version.version }}/backup-architecture.md %}) - [Backup and Restore Overview]({% link {{ page.version.version }}/backup-and-restore-overview.md %}) -- [Managed Backups]({% link cockroachcloud/managed-backups.md %}) \ No newline at end of file +- [Managed Backups]({% link cockroachcloud/managed-backups.md %}) diff --git a/src/current/v25.2/monitoring-and-alerting.md b/src/current/v25.2/monitoring-and-alerting.md index 1ff631afab2..6fee772d1a7 100644 --- a/src/current/v25.2/monitoring-and-alerting.md +++ b/src/current/v25.2/monitoring-and-alerting.md @@ -177,9 +177,6 @@ sys_cgocalls 3501 # HELP sys_cpu_sys_percent Current system cpu percentage # TYPE sys_cpu_sys_percent gauge sys_cpu_sys_percent 1.098855319644276e-10 -# HELP replicas_quiescent Number of quiesced replicas -# TYPE replicas_quiescent gauge -replicas_quiescent{store="1"} 20 ... ~~~ diff --git a/src/current/v25.2/recommended-production-settings.md b/src/current/v25.2/recommended-production-settings.md index b1e94495749..5b8d844bd89 100644 --- a/src/current/v25.2/recommended-production-settings.md +++ b/src/current/v25.2/recommended-production-settings.md @@ -173,7 +173,7 @@ Disks must be able to achieve {% include {{ page.version.version }}/prod-deploym - The optimal configuration for striping more than one device is [RAID 10](https://wikipedia.org/wiki/Nested_RAID_levels#RAID_10_(RAID_1+0)). RAID 0 and 1 are also acceptable from a performance perspective. {{site.data.alerts.callout_info}} -Disk I/O especially affects [performance on write-heavy workloads]({% link {{ page.version.version }}/architecture/reads-and-writes-overview.md %}#network-and-i-o-bottlenecks). For more information, see [capacity planning issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#capacity-planning-issues) and [node liveness issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues). +Disk I/O especially affects [performance on write-heavy workloads]({% link {{ page.version.version }}/architecture/reads-and-writes-overview.md %}#network-and-i-o-bottlenecks). For more information, see [capacity planning issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#capacity-planning-issues). {{site.data.alerts.end}} ##### Node density testing configuration diff --git a/src/current/v25.2/transaction-retry-error-reference.md b/src/current/v25.2/transaction-retry-error-reference.md index fdb3f33e63f..66c40369f7b 100644 --- a/src/current/v25.2/transaction-retry-error-reference.md +++ b/src/current/v25.2/transaction-retry-error-reference.md @@ -178,7 +178,7 @@ TransactionRetryWithProtoRefreshError: ... RETRY_ASYNC_WRITE_FAILURE ... **Description:** -The `RETRY_ASYNC_WRITE_FAILURE` error occurs when some kind of problem with your cluster's operation occurs at the moment of a previous write in the transaction, causing CockroachDB to fail to replicate one of the transaction's writes. This can happen if a [lease transfer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#epoch-based-leases-table-data) occurs while the transaction is executing, or less commonly if you have a [network partition]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#network-partition) that cuts off access to some nodes in your cluster. +The `RETRY_ASYNC_WRITE_FAILURE` error occurs when some kind of problem with your cluster's operation occurs at the moment of a previous write in the transaction, causing CockroachDB to fail to replicate one of the transaction's writes. This can happen if a [lease transfer]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) occurs while the transaction is executing, or less commonly if you have a [network partition]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#network-partition) that cuts off access to some nodes in your cluster. **Action:** @@ -284,7 +284,7 @@ If you are encountering deadlocks: If you are using only default [transaction priorities]({% link {{ page.version.version }}/transactions.md %}#transaction-priorities): -- This error means your cluster has problems. You are likely overloading it. Investigate the source of the overload, and do something about it. For more information, see [Node liveness issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues). +- This error means your cluster has problems. You are likely overloading it. Investigate the source of the overload, and do something about it. The best place to start investigating is the [**Overload Dashboard**]({% link {{ page.version.version }}/ui-overload-dashboard.md %}). If you are using [high- or low-priority transactions]({% link {{ page.version.version }}/transactions.md %}#transaction-priorities): diff --git a/src/current/v25.2/ui-distributed-dashboard.md b/src/current/v25.2/ui-distributed-dashboard.md index a92e1d606c4..50119cf7c6a 100644 --- a/src/current/v25.2/ui-distributed-dashboard.md +++ b/src/current/v25.2/ui-distributed-dashboard.md @@ -100,7 +100,7 @@ Metric | Description DB Console node heartbeat latency: 99th percentile graph -The **Node Heartbeat Latency: 99th percentile** graph displays the 99th percentile of time elapsed between [node liveness]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues) heartbeats on the cluster over a one-minute period. +The **Node Heartbeat Latency: 99th percentile** graph displays the 99th percentile of time elapsed between node heartbeats on the cluster over a one-minute period. Hovering over the graph displays values for the following metrics: @@ -108,17 +108,21 @@ Metric | Description --------|---- `` | The 99th percentile of time elapsed between [node liveness]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues) heartbeats on the cluster over a one-minute period for that node, as calculated from the `liveness.heartbeatlatency` metric. +{% include_cached new-in.html version="v25.2" %} {% include {{ page.version.version }}/leader-leases-node-heartbeat-use-cases.md %} + ## Node Heartbeat Latency: 90th percentile DB Console node heartbeat latency: 90th percentile graph -The **Node Heartbeat Latency: 90th percentile** graph displays the 90th percentile of time elapsed between [node liveness]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues) heartbeats on the cluster over a one-minute period. +The **Node Heartbeat Latency: 90th percentile** graph displays the 90th percentile of time elapsed between node heartbeats on the cluster over a one-minute period. Hovering over the graph displays values for the following metrics: Metric | Description --------|---- -`` | The 90th percentile of time elapsed between [node liveness]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues) heartbeats on the cluster over a one-minute period for that node, as calculated from the `liveness.heartbeatlatency` metric. +`` | The 90th percentile of time elapsed between node heartbeats on the cluster over a one-minute period for that node, as calculated from the `liveness.heartbeatlatency` metric. + +{% include_cached new-in.html version="v25.2" %} {% include {{ page.version.version }}/leader-leases-node-heartbeat-use-cases.md %} {% include {{ page.version.version }}/ui/ui-summary-events.md %} diff --git a/src/current/v25.2/ui-logical-data-replication-dashboard.md b/src/current/v25.2/ui-logical-data-replication-dashboard.md index 40ec394da3e..d18b1e18034 100644 --- a/src/current/v25.2/ui-logical-data-replication-dashboard.md +++ b/src/current/v25.2/ui-logical-data-replication-dashboard.md @@ -10,7 +10,7 @@ The **Logical Data Replication** dashboard in the DB Console lets you monitor me To view this dashboard, [access the DB Console]({% link {{ page.version.version }}/ui-overview.md %}#db-console-access) for the destination cluster, click **Metrics** on the left-hand navigation bar, and select **Logical Data Replication** from the **Dashboard** dropdown. {{site.data.alerts.callout_info}} -The **Logical Data Replication** dashboard is distinct from the [**Replication** dashboard]({% link {{ page.version.version }}/ui-replication-dashboard.md %}), which tracks metrics related to how data is replicated across the cluster, e.g., range status, replicas per store, and replica quiescence. +The **Logical Data Replication** dashboard is distinct from the [**Replication** dashboard]({% link {{ page.version.version }}/ui-replication-dashboard.md %}), which tracks metrics related to how data is replicated across the cluster, e.g., range status, replicas per store, etc. {{site.data.alerts.end}} ## Dashboard navigation @@ -104,4 +104,4 @@ retry queue bytes | `logical_replication.retry_queue_bytes` | The size of the re - [Logical Data Replication Overview]({% link {{ page.version.version }}/logical-data-replication-overview.md %}) - [Logical Data Replication Monitoring]({% link {{ page.version.version }}/logical-data-replication-monitoring.md %}) - [Troubleshooting Overview]({% link {{ page.version.version }}/troubleshooting-overview.md %}) -- [Support Resources]({% link {{ page.version.version }}/support-resources.md %}) \ No newline at end of file +- [Support Resources]({% link {{ page.version.version }}/support-resources.md %}) diff --git a/src/current/v25.2/ui-network-latency-page.md b/src/current/v25.2/ui-network-latency-page.md index 479052ad6c9..afc78fd280a 100644 --- a/src/current/v25.2/ui-network-latency-page.md +++ b/src/current/v25.2/ui-network-latency-page.md @@ -58,11 +58,13 @@ This specific information can help you understand the root cause of the connecti {{site.data.alerts.callout_info}} {% include common/network-partitions.md %} + +{% include_cached new-in.html version="v25.2" %} With the introduction of [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases), most network partitions between a leaseholder and its followers should heal in a few seconds. {{site.data.alerts.end}} ### Node liveness status -Hover over a node's ID in the row and column headers to show the node's liveness status, such as `healthy` or `suspect`. Node liveness status is also indicated by the colored circle next to the Node ID: green for `healthy` or red for `suspect`. +Hover over a node's ID in the row and column headers to show the node's [liveness]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues) status, such as `healthy` or `suspect`. Node liveness status is also indicated by the colored circle next to the Node ID: green for `healthy` or red for `suspect`. If a `suspect` node stays offline for the duration set by [`server.time_until_store_dead`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-server-time-until-store-dead) (5 minutes by default), the [cluster considers the node "dead"]({% link {{ page.version.version }}/node-shutdown.md %}#process-termination) and the node is removed from the matrix. diff --git a/src/current/v25.2/ui-overview.md b/src/current/v25.2/ui-overview.md index 2d8f9bf0bd9..c669165fa45 100644 --- a/src/current/v25.2/ui-overview.md +++ b/src/current/v25.2/ui-overview.md @@ -44,8 +44,8 @@ The Metrics page provides dashboards for all types of CockroachDB metrics. - [Runtime dashboard]({% link {{ page.version.version }}/ui-runtime-dashboard.md %}) has metrics about node count, CPU time, and memory usage. - [SQL dashboard]({% link {{ page.version.version }}/ui-sql-dashboard.md %}) has metrics about SQL connections, byte traffic, queries, transactions, and service latency. - [Storage dashboard]({% link {{ page.version.version }}/ui-storage-dashboard.md %}) has metrics about storage capacity and file descriptors. -- [Replication dashboard]({% link {{ page.version.version }}/ui-replication-dashboard.md %}) has metrics about how data is replicated across the cluster, e.g., range status, replicas per store, and replica quiescence. -- [Distributed dashboard]({% link {{ page.version.version }}/ui-distributed-dashboard.md %}) has metrics about distribution tasks across the cluster, including RPCs, transactions, and node heartbeats. +- [Replication dashboard]({% link {{ page.version.version }}/ui-replication-dashboard.md %}) has metrics about how data is replicated across the cluster, e.g., range status, replicas per store, etc. +- [Distributed dashboard]({% link {{ page.version.version }}/ui-distributed-dashboard.md %}) has metrics about distribution tasks across the cluster, including RPCs and transactions. - [Queues dashboard]({% link {{ page.version.version }}/ui-queues-dashboard.md %}) has metrics about the health and performance of various queueing systems in CockroachDB, including the garbage collection and Raft log queues. - [Slow requests dashboard]({% link {{ page.version.version }}/ui-slow-requests-dashboard.md %}) has metrics about important cluster tasks that take longer than expected to complete, including Raft proposals and lease acquisitions. - [Changefeeds dashboard]({% link {{ page.version.version }}/ui-cdc-dashboard.md %}) has metrics about the [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) created across your cluster. diff --git a/src/current/v25.2/ui-physical-cluster-replication-dashboard.md b/src/current/v25.2/ui-physical-cluster-replication-dashboard.md index 6c7378c17f0..e3f1fdb1cbf 100644 --- a/src/current/v25.2/ui-physical-cluster-replication-dashboard.md +++ b/src/current/v25.2/ui-physical-cluster-replication-dashboard.md @@ -10,7 +10,7 @@ The **Physical Cluster Replication** dashboard in the DB Console lets you monito To view this dashboard, [access the DB Console]({% link {{ page.version.version }}/ui-overview.md %}#db-console-access) for your standby cluster, click **Metrics** on the left-hand navigation bar, and select **Physical Cluster Replication** from the **Dashboard** dropdown. {{site.data.alerts.callout_info}} -The **Physical Cluster Replication** dashboard is distinct from the [**Replication** dashboard]({% link {{ page.version.version }}/ui-replication-dashboard.md %}), which tracks metrics related to how data is replicated across the cluster, e.g., range status, replicas per store, and replica quiescence. +The **Physical Cluster Replication** dashboard is distinct from the [**Replication** dashboard]({% link {{ page.version.version }}/ui-replication-dashboard.md %}), which tracks metrics related to how data is replicated across the cluster, e.g., range status, replicas per store, etc. {{site.data.alerts.end}} ## Dashboard navigation diff --git a/src/current/v25.2/ui-replication-dashboard.md b/src/current/v25.2/ui-replication-dashboard.md index bb52e7b2721..d5c0ab4edaa 100644 --- a/src/current/v25.2/ui-replication-dashboard.md +++ b/src/current/v25.2/ui-replication-dashboard.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.db_console --- -The **Replication** dashboard in the DB Console lets you monitor the replication metrics for your cluster, such as range status, replicas per store, and replica quiescence. +The **Replication** dashboard in the DB Console lets you monitor the replication metrics for your cluster, such as range status, replicas per store, etc. To view this dashboard, [access the DB Console]({% link {{ page.version.version }}/ui-overview.md %}#db-console-access), click **Metrics** in the left-hand navigation, and select **Dashboard** > **Replication**. @@ -17,7 +17,7 @@ The **Replication** dashboard is distinct from the [**Physical Cluster Replicati - **Range**: CockroachDB stores all user data and almost all system data in a giant sorted map of key-value pairs. This keyspace is divided into "ranges", contiguous chunks of the keyspace, so that every key can always be found in a single range. - **Range Replica:** CockroachDB replicates each range (3 times by default) and stores each replica on a different node. -- **Range Lease:** For each range, one of the replicas holds the "range lease". This replica, referred to as the "leaseholder", is the one that receives and coordinates all read and write requests for the range. +- **Range Lease:** For each range, one of the replicas holds the "range lease". This replica, referred to as the "leaseholder", is the one that receives and coordinates all read and write requests for the range. {% include_cached new-in.html version="v25.2" %} The [Leader leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases) system ensures that the leaseholder is always the [Raft leader]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), except briefly during [lease transfers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#how-leases-are-transferred-from-a-dead-node). - **Under-replicated Ranges:** When a cluster is first initialized, the few default starting ranges have a single replica. As more nodes become available, the cluster replicates these ranges to other nodes until the number of replicas for each range reaches the desired [replication factor]({% link {{ page.version.version }}/configure-replication-zones.md %}#num_replicas) (3 by default). If a range has fewer replicas than the replication factor, the range is said to be "under-replicated". [Non-voting replicas]({% link {{ page.version.version }}/architecture/replication-layer.md %}#non-voting-replicas), if configured, are not counted when calculating replication status. - **Unavailable Ranges:** If a majority of a range's replicas are on nodes that are unavailable, then the entire range is unavailable and will be unable to process queries.