Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28963 Updating Quota Factors is too expensive #6451

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rmdmattingly
Copy link
Contributor

@rmdmattingly rmdmattingly commented Nov 8, 2024

https://issues.apache.org/jira/browse/HBASE-28963

My company is running Quotas across a few hundred clusters of varied size. One cluster has hundreds of servers, tens of thousands of regions, and tens of thousands of unique users — for all of whom we build default user quotas to manage resource usage OOTB.

We noticed that the HMaster was quite busy for this cluster, and after some investigation we realized that RegionServers were hammering the HMaster's ClusterMetrics endpoint to facilitate the refreshing of table machine quota factors. We were also hotspotting the RegionServer hosting the quotas system table.

2024-11-05T21:22:21,024 [regionserver:60020.Chore.1 {}] INFO org.apache.hadoop.hbase.client.HBaseAdmin: getClusterMetrics call stack:
java.base/java.lang.Thread.getStackTrace(Thread.java:2450)
org.apache.hadoop.hbase.client.HBaseAdmin.getClusterMetrics(HBaseAdmin.java:2307)
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:402)
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:267)
org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:161)
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
java.base/java.lang.Thread.run(Thread.java:1583)

After some digging here, we realized there were three meaningful changes that we could make to the quota refresh process to really increase its scalability as RegionServer count, region count, and distinct user count grow.

  1. Each quota cache miss should not trigger a full refresh. With tens of thousands of distinct users on our cluster, and a routine eviction rate of 5*refreshPeriod, this caused a constant refreshing of quotas on every RegionServer. This is the most meaningful change because our RegionServers were truly continuously refreshing the quotas cache
  2. We should only query for every region state if table scoped quotas exist. This expensive ClusterMetrics call is only necessary if table scoped quotas exist, so we should be a little more thoughtful about when we execute it.
  3. ClusterMetrics should be cached. As is, each quota refresh would trigger an expensive ClusterMetrics request that would require the HMaster iterating over a map of every region state. We only need this to determine the number of open regions per table — a number that doesn't change significantly in a moment's notice. We should cache this, and the cheaper ClusterMetrics alternative that optimization #2 introduced. The cache TTL defaults to the defined quota refresh period, but can be customized.

Testing

I've updated some tests to jive with the expectation that quotas will only refresh on the normally scheduled refresh period. Otherwise, I think our quotas test suite provides pretty good coverage to ensure that nothing is broken by this changeset.

I've also deployed this on a test cluster, and here is how refresh rates change in a given 30s timeperiod pre- and post-restart.

Pre-Restart

2024-11-08T12:30:02,555 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 5 ms.
2024-11-08T12:30:06,508 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 12 ms.
2024-11-08T12:30:10,437 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 5 ms.
2024-11-08T12:30:13,619 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 4 ms.
2024-11-08T12:30:13,805 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 4 ms.
2024-11-08T12:30:14,308 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 5 ms.
2024-11-08T12:30:14,335 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 4 ms.
2024-11-08T12:30:14,769 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 5 ms.
2024-11-08T12:30:16,781 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 4 ms.
2024-11-08T12:30:17,994 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 6 ms.
2024-11-08T12:30:18,745 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 4 ms.
2024-11-08T12:30:21,214 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 5 ms.
2024-11-08T12:30:24,034 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 29 ms.
2024-11-08T12:30:26,600 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 9 ms.
2024-11-08T12:30:30,976 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 36 ms.

Post-Restart

2024-11-08T12:35:56,886 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 45 ms.
2024-11-08T12:36:27,240 [regionserver.Chore.1 {}] DEBUG org.apache.hadoop.hbase.ScheduledChore: QuotaRefresherChore execution time: 89 ms.

On said test cluster I've also confirmed that throttling continues to work as intended. I've also validated that I can enact new manually defined throttles inline with refresh periods, and that I can remove a throttle to return to the default behavior.

cc @ndimiduk @hgromer

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

} catch (IOException e) {
LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
return;
boolean hasTableQuotas = !tableQuotaCache.entrySet().isEmpty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is relevant here, but this check won't be atomic. the contents of tableQuotaCache can change while checking userQuotaCache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay — the two conditions are definitely independent of each other, and the implication of that case would be that your tableQuotaCache addition missed the boat for this refresh and will be reflected in subsequent refreshes

@rmdmattingly rmdmattingly marked this pull request as ready for review November 8, 2024 15:52
Comment on lines -143 to +153
() -> QuotaUtil.buildDefaultUserQuotaState(rsServices.getConfiguration(), 0L),
this::triggerCacheRefresh);
() -> QuotaUtil.buildDefaultUserQuotaState(rsServices.getConfiguration(), 0L));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here, and in getQuotaState, are critical. These changes ensure that each cache miss does not trigger an immediate refresh — particularly given that cache entries are evicted after 5 refresh periods, this approach is too heavy handed.

}
}

static class RefreshableExpiringValueCache<T> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this class because I don't think there's a good keyless equivalent to a LoadingCache, and a memoized supplier does not offer all of the functionality that I'd like (on-demand refresh, invalidation)

Comment on lines +434 to +437
if (hasTableQuotas) {
updateTableMachineQuotaFactors();
} else {
updateOnlyMachineQuotaFactors();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check ensures that we only pull down every region state if we actually need to. Without table machine quotas, there is no point

final String userOverrideWithQuota = User.getCurrent().getShortName() + "123";
final String userOverrideWithQuota = User.getCurrent().getShortName();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few small test changes to ensure that refreshes were called appropriately. Because of the logic in ThrottleQuotaTestUtil.triggerUserCacheRefresh assuming that we're waiting on throttling for the current user, I also had to change around the procedure of this test a bit to ensure that the current user is who we're throttling (rather than the current user plus a suffix, like we were before). But the logic is still sound

@@ -61,6 +67,10 @@ public class QuotaCache implements Stoppable {
private static final Logger LOG = LoggerFactory.getLogger(QuotaCache.class);

public static final String REFRESH_CONF_KEY = "hbase.quota.refresh.period";
public static final String TABLE_REGION_STATES_CACHE_TTL_MS =
"hbase.quota.cache.ttl.region.states.ms";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for including a unit indicator in the config key! 🙇

Duration tableRegionStatesCacheTtl =
Duration.ofMillis(conf.getLong(TABLE_REGION_STATES_CACHE_TTL_MS, period));
this.tableRegionStatesClusterMetrics =
new RefreshableExpiringValueCache<>("tableRegionStatesClusterMetrics",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up JIRA, should these configuration values be hot-reloadable ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would definitely be nice, and would probably be a larger refactor so it would definitely be nice to make that a separate issue. The quota refresh period is also static, and should probably be made dynamic in that same push

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just some questions and nits. Thanks for the manual testing.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 27s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 0s master passed
+1 💚 compile 2m 59s master passed
+1 💚 checkstyle 0m 35s master passed
+1 💚 spotbugs 1m 29s master passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 51s the patch passed
+1 💚 compile 3m 0s the patch passed
+1 💚 javac 3m 0s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 35s the patch passed
+1 💚 spotbugs 1m 37s the patch passed
+1 💚 hadoopcheck 10m 48s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 42s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
35m 44s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6451/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6451
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 2ecc22b9b206 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 8bed3d3
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6451/4/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 15s master passed
+1 💚 compile 1m 16s master passed
+1 💚 javadoc 0m 36s master passed
+1 💚 shadedjars 7m 0s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 49s the patch passed
+1 💚 compile 1m 16s the patch passed
+1 💚 javac 1m 16s the patch passed
+1 💚 javadoc 0m 34s the patch passed
+1 💚 shadedjars 7m 13s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 233m 54s hbase-server in the patch passed.
264m 36s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6451/4/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6451
Optional Tests javac javadoc unit compile shadedjars
uname Linux e48f92ae88ae 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 8bed3d3
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6451/4/testReport/
Max. process+thread count 4964 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6451/4/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants