You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
OpenSearch is a distributed search and analytics engine designed to handle large volumes of data with high performance and availability. A critical aspect of its architecture is data replication, which ensures data durability and enables fast query performance across the cluster.
Traditionally, OpenSearch has used document-level replication. In this model:
When a document is indexed, it's first processed by the primary shard.
The document is then sent to each replica shard.
Each replica independently processes the document, creating its own segment.
While effective, this approach has some drawbacks:
CPU Intensity: Each replica performs the same compute-intensive tasks as the primary, including text analysis, inverted index creation, and codec operations.
Network Overhead: Individual documents are sent over the network, which can be inefficient for bulk operations.
Replication Lag: Under high indexing loads, replicas can fall behind the primary, potentially affecting search consistency.
To address these challenges, segment replication has been introduced:
Instead of replicating individual documents, entire Lucene segments are copied from the primary to replica shards.
Replicas receive fully-formed segments, eliminating the need to repeat CPU-intensive indexing operations.
This approach significantly reduces CPU usage on replica nodes, as they no longer need to analyze and index each document individually.
Network efficiency is improved, especially for bulk indexing operations, as segments can be transferred in larger, optimized chunks.
Replication lag is reduced, helping to maintain better consistency between primary and replica shards.
For remote store backed cluster, Segment Replication is used as the replication strategy. With segment replication, segments are created only on primary shard and these segments are copied to the replica shards. As segment creation is CPU intensive, we have observed CPU skew between nodes of the same cluster where primary shards are not balanced.
The earlier attempts to rebalance primary shards across nodes (per index, across all nodes) are definitely helping to reduce the skew but they work on the best effort basis and don’t add any constraint.
Describe the solution you'd like
Implement two new settings in OpenSearch:
index.routing.allocation.total_primary_shards_per_node: An index-level setting to limit primary shards per node for a specific index. Store this limit (indexTotalPrimaryShardsPerNodeLimit) in index metadata, similar to indexTotalShardsPerNodeLimit.
cluster.routing.allocation.total_primary_shards_per_node: A cluster-level setting to limit total primary shards per node across all indices.
These settings will enhance control over primary shard distribution, improving cluster balance and performance management.
The existing ShardsLimitAllocationDecider class already contains the necessary infrastructure and logic to evaluate shard allocation constraints. It has access to the current cluster state, routing information, and methods to check shard counts per node. Given this existing functionality, we propose implementing the new primary shard limit settings within this class. This approach leverages the current decision-making framework, ensuring consistency with existing allocation rules and minimizing code duplication. By extending the ShardsLimitAllocationDecider, we can efficiently integrate the new primary shard limit checks into the existing allocation decision process.
Related component
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe
OpenSearch is a distributed search and analytics engine designed to handle large volumes of data with high performance and availability. A critical aspect of its architecture is data replication, which ensures data durability and enables fast query performance across the cluster.
Traditionally, OpenSearch has used document-level replication. In this model:
While effective, this approach has some drawbacks:
To address these challenges, segment replication has been introduced:
For remote store backed cluster, Segment Replication is used as the replication strategy. With segment replication, segments are created only on primary shard and these segments are copied to the replica shards. As segment creation is CPU intensive, we have observed CPU skew between nodes of the same cluster where primary shards are not balanced.
The earlier attempts to rebalance primary shards across nodes (per index, across all nodes) are definitely helping to reduce the skew but they work on the best effort basis and don’t add any constraint.
Describe the solution you'd like
Implement two new settings in OpenSearch:
index.routing.allocation.total_primary_shards_per_node
: An index-level setting to limit primary shards per node for a specific index. Store this limit (indexTotalPrimaryShardsPerNodeLimit) in index metadata, similar to indexTotalShardsPerNodeLimit.cluster.routing.allocation.total_primary_shards_per_node
: A cluster-level setting to limit total primary shards per node across all indices.These settings will enhance control over primary shard distribution, improving cluster balance and performance management.
The existing ShardsLimitAllocationDecider class already contains the necessary infrastructure and logic to evaluate shard allocation constraints. It has access to the current cluster state, routing information, and methods to check shard counts per node. Given this existing functionality, we propose implementing the new primary shard limit settings within this class. This approach leverages the current decision-making framework, ensuring consistency with existing allocation rules and minimizing code duplication. By extending the ShardsLimitAllocationDecider, we can efficiently integrate the new primary shard limit checks into the existing allocation decision process.
Related component
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: