Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Primary Shard Count Constraint #17293

Open
pandeydivyansh1803 opened this issue Feb 7, 2025 · 0 comments · May be fixed by #17295
Open

[Feature Request] Primary Shard Count Constraint #17293

pandeydivyansh1803 opened this issue Feb 7, 2025 · 0 comments · May be fixed by #17295
Labels
enhancement Enhancement or improvement to existing feature or request _No response_ untriaged

Comments

@pandeydivyansh1803
Copy link

pandeydivyansh1803 commented Feb 7, 2025

Is your feature request related to a problem? Please describe

OpenSearch is a distributed search and analytics engine designed to handle large volumes of data with high performance and availability. A critical aspect of its architecture is data replication, which ensures data durability and enables fast query performance across the cluster.
Traditionally, OpenSearch has used document-level replication. In this model:

  1. When a document is indexed, it's first processed by the primary shard.
  2. The document is then sent to each replica shard.
  3. Each replica independently processes the document, creating its own segment.

While effective, this approach has some drawbacks:

  1. CPU Intensity: Each replica performs the same compute-intensive tasks as the primary, including text analysis, inverted index creation, and codec operations.
  2. Network Overhead: Individual documents are sent over the network, which can be inefficient for bulk operations.
  3. Replication Lag: Under high indexing loads, replicas can fall behind the primary, potentially affecting search consistency.

To address these challenges, segment replication has been introduced:

  1. Instead of replicating individual documents, entire Lucene segments are copied from the primary to replica shards.
  2. Replicas receive fully-formed segments, eliminating the need to repeat CPU-intensive indexing operations.
  3. This approach significantly reduces CPU usage on replica nodes, as they no longer need to analyze and index each document individually.
  4. Network efficiency is improved, especially for bulk indexing operations, as segments can be transferred in larger, optimized chunks.
  5. Replication lag is reduced, helping to maintain better consistency between primary and replica shards.

For remote store backed cluster, Segment Replication is used as the replication strategy. With segment replication, segments are created only on primary shard and these segments are copied to the replica shards. As segment creation is CPU intensive, we have observed CPU skew between nodes of the same cluster where primary shards are not balanced.

The earlier attempts to rebalance primary shards across nodes (per index, across all nodes) are definitely helping to reduce the skew but they work on the best effort basis and don’t add any constraint.

Describe the solution you'd like

Implement two new settings in OpenSearch:

  1. index.routing.allocation.total_primary_shards_per_node: An index-level setting to limit primary shards per node for a specific index. Store this limit (indexTotalPrimaryShardsPerNodeLimit) in index metadata, similar to indexTotalShardsPerNodeLimit.
  2. cluster.routing.allocation.total_primary_shards_per_node: A cluster-level setting to limit total primary shards per node across all indices.

These settings will enhance control over primary shard distribution, improving cluster balance and performance management.
The existing ShardsLimitAllocationDecider class already contains the necessary infrastructure and logic to evaluate shard allocation constraints. It has access to the current cluster state, routing information, and methods to check shard counts per node. Given this existing functionality, we propose implementing the new primary shard limit settings within this class. This approach leverages the current decision-making framework, ensuring consistency with existing allocation rules and minimizing code duplication. By extending the ShardsLimitAllocationDecider, we can efficiently integrate the new primary shard limit checks into the existing allocation decision process.

Related component

No response

Describe alternatives you've considered

No response

Additional context

No response

@pandeydivyansh1803 pandeydivyansh1803 added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 7, 2025
@pandeydivyansh1803 pandeydivyansh1803 changed the title [Feature Request] <Primary Shard Count Constraint> [Feature Request] Primary Shard Count Constraint Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request _No response_ untriaged
Projects
None yet
1 participant