-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move TSC took-time policy to guard both heap and disk tier #17190
base: main
Are you sure you want to change the base?
Move TSC took-time policy to guard both heap and disk tier #17190
Conversation
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 36c7ba2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 0359de4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
...ache-common/src/main/java/org/opensearch/cache/common/tier/TieredSpilloverCacheSettings.java
Show resolved
Hide resolved
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 294ebae: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❕ Gradle check result for 2b3b0f0: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17190 +/- ##
============================================
+ Coverage 72.40% 72.48% +0.08%
- Complexity 65554 65619 +65
============================================
Files 5292 5292
Lines 304493 304519 +26
Branches 44218 44222 +4
============================================
+ Hits 220463 220733 +270
+ Misses 65975 65714 -261
- Partials 18055 18072 +17 ☔ View full report in Codecov by Sentry. |
...ache-common/src/main/java/org/opensearch/cache/common/tier/TieredSpilloverCacheSettings.java
Outdated
Show resolved
Hide resolved
...ache-common/src/main/java/org/opensearch/cache/common/tier/TieredSpilloverCacheSettings.java
Outdated
Show resolved
Hide resolved
...ache-common/src/main/java/org/opensearch/cache/common/tier/TieredSpilloverCacheSettings.java
Outdated
Show resolved
Hide resolved
// The old indices stats API runs onCached() whenever loader.load() runs. We can't change this as it's PublicApi. | ||
// But now it may not be the case that the key actually enters the cache on loader.load(). | ||
// To cancel this out, send the removalListener (the IRC) a removal notification with evicted = false, | ||
// so that it subtracts the stats off again without incrementing evictions. | ||
// These old stats will be removed anyway in 3.0. | ||
removalListener.onRemoval(new RemovalNotification<>(key, value, RemovalReason.EXPLICIT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this should be the case with any caller and not just IRC? Mentioning this as TSC in general is designed to be used anywhere and not just IRC. So writing logic/comments specific to IRC doesn't seem right to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition we can refactor this to something below. This looks much simpler to me and won't require any changes in handler.
else {
if (evaluatePoliciesList(value, policiesList)) {
future.complete(new Tuple<>(key, value));
} else {
future.complete(null); // Passing null would skip the logic to put this into onHeap cache.
removalListener.onRemoval(new RemovalNotification<>(key, value, RemovalReason.EXPLICIT));
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this shouldn't be coding around the IRC specifically. I'll remove the comments.
Actually I don't like that we have to do this removal notification at all, since it's pretty specific to how the IRC implements its logic for the old cache stats API. For example we don't have to do it on TSC.put()
as I described in the other comment. But I'm not sure there's a way around it. By design the TSC shouldn't be aware of this external stats tracking system, but we do want the old stats API to be accurate until we remove it (probably) in 3.0...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I don't think so. Even outside IRC, if someone calls put()
or computeIfAbsent()
on TSC, they are expecting that items should be added to the cache but it didn't, so we should notify them. In this case the caller can handle what they need to do with the removal notification whether to calculate stats or not.
With IRC particularly, it was calculating stats after calling computeIfAbsent()
, so in this case removal notification is handled by IRC explicitly to deduct stats. For other cases like mentioned above, one can handle it with their own logic or simply ignore it.
@@ -257,19 +263,23 @@ public void put(ICacheKey<K> key, V value) { | |||
Tuple<V, String> cacheValueTuple = getValueFromTieredCache(true).apply(key); | |||
if (cacheValueTuple == null) { | |||
// In case it is not present in any tier, put it inside onHeap cache by default. | |||
try (ReleasableLock ignore = writeLock.acquire()) { | |||
onHeapCache.put(key, value); | |||
if (evaluatePoliciesList(value, policies)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to remove listener logic in computeIfAbsent, should we also trigger a removal listener from here as well?
As otherwise there is no way to communicate to the caller that the put call actually got skipped in case policy returned false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good catch, I only noticed the removal listener issue later on and forgot to add it back into put()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it isn't necessary, since the old indices stats API onCached
listener only gets called on loader.load()
, and for put()
there's no loader involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider tiered cache in generic terms, as the framework is agnostic of its users. If someone is using TSC and calls put(), but the item isn't added to the on-heap cache, we should notify them via a removal listener?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in a fully generic setup, we shouldn't call onRemoval()
in either put()
or computeIfAbsent()
as nothing was ever actually removed from the cache. Adding the call in computeIfAbsent()
is kind of a gross hack we do because we know the IRC's external stats tracking incorrectly assumes loader.load()
always results in a value entering the cache.
I'm wondering if it might be better to keep this generic and not have either method call the removal listener? This would mean the old indices stats API can be incorrect, but we're planning on removing that in 3.0 anyway.
Or, we can put it in both and define "cache removal" to include "rejected by policy". If the IRC ever called put() directly it would make the old stats API incorrect, but it never actually does call put(), so it should be fine?
modules/cache-common/src/main/java/org/opensearch/cache/common/tier/TieredSpilloverCache.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for a317b9f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Changes the TieredSpilloverCache's minimum took-time policy so that queries must take >10 ms to enter either tier, instead of just to enter the disk tier. This is desirable because when we allow caching size > 0 queries in the request cache, the number of cacheable queries may go way up, and we should avoid flooding the heap tier with overly cheap queries.
Note the setting key
tiered_spillover.disk.store.policies.took_time.threshold
is unchanged, for backwards compatibility.Related Issues
Resolves #16162
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.