Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a new index level setting to limit the total primary shards per node per index #17295

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pandeydivyansh1803
Copy link

Description

For remote store backed cluster, Segment Replication is used as the replication strategy. With segment replication, segments are created only on primary shard and these segments are copied to the replica shards. As segment creation is CPU intensive, we have observed CPU skew between nodes of the same cluster where primary shards are not balanced.

The earlier attempts to rebalance primary shards across nodes (#6422, #12250) are definitely helping to reduce the skew but they work on the best effort basis and don’t add any constraint.

Implement new setting in OpenSearch:
index.routing.allocation.total_primary_shards_per_node: An index-level setting to limit primary shards per node for a specific index. Store this limit (indexTotalPrimaryShardsPerNodeLimit) in index metadata, similar to indexTotalShardsPerNodeLimit.

This setting will enhance control over primary shard distribution, improving cluster balance and performance management.
The existing ShardsLimitAllocationDecider class already contains the necessary infrastructure and logic to evaluate shard allocation constraints. It has access to the current cluster state, routing information, and methods to check shard counts per node. Given this existing functionality, we propose implementing the new primary shard limit settings within this class. This approach leverages the current decision-making framework, ensuring consistency with existing allocation rules and minimizing code duplication. By extending the ShardsLimitAllocationDecider, we can efficiently integrate the new primary shard limit checks into the existing allocation decision process.

Related Issues

Resolves #17293

Check List

  • [✔️] Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • [✔️] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… index per node. Added relevant files for unit test and integration test.
Copy link
Contributor

github-actions bot commented Feb 7, 2025

❌ Gradle check result for 721865e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Feb 7, 2025

❌ Gradle check result for 920f71a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@@ -0,0 +1,29 @@
### IntelliJ IDEA ###
Copy link
Member

@cwperks cwperks Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the files under the untitled/ folder got added to this PR inadvertently. Can you remove this file and the one below?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request _No response_
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Primary Shard Count Constraint
2 participants