Skip to content

Commit 7180fab

Browse files
committed
Background data movement for the tiers
Part 1. -------------------------------------- This adds the following: 1. tryPromoteToNextTier. This could go with multi-tier part 2 2. Promotion iterators. This could go with MM2Q promotion iterators patch. It also enables background workers in the cache config. Future changes to the background workers can be merged with this patch. Background evictors multi-tier Part 2. -------------------------------- This should be rolled into background evictors part 1. improved bg stats structure and cachebench output adds the following: - approx usage stat - evictions / attempts per class Background evictors multi-tier Part 3. -------------------------------- use approximate usage fraction
1 parent 6e0fc13 commit 7180fab

22 files changed

+1028
-268
lines changed

MultiTierDataMovement.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Background Data Movement
2+
3+
In order to reduce the number of online evictions and support asynchronous
4+
promotion - we have added two periodic workers to handle eviction and promotion.
5+
6+
The diagram below shows a simplified version of how the background evictor
7+
thread (green) is integrated to the CacheLib architecture.
8+
9+
<p align="center">
10+
<img width="640" height="360" alt="BackgroundEvictor" src="cachelib-background-evictor.png">
11+
</p>
12+
13+
## Background Evictors
14+
15+
The background evictors scan each class to see if there are objects to move the next (lower)
16+
tier using a given strategy. Here we document the parameters for the different
17+
strategies and general parameters.
18+
19+
- `backgroundEvictorIntervalMilSec`: The interval that this thread runs for - by default
20+
the background evictor threads will wake up every 10 ms to scan the AllocationClasses. Also,
21+
the background evictor thread will be woken up everytime there is a failed allocation (from
22+
a request handling thread) and the current percentage of free memory for the
23+
AllocationClass is lower than `lowEvictionAcWatermark`. This may render the interval parameter
24+
not as important when there are many allocations occuring from request handling threads.
25+
26+
- `evictorThreads`: The number of background evictors to run - each thread is a assigned
27+
a set of AllocationClasses to scan and evict objects from. Currently, each thread gets
28+
an equal number of classes to scan - but as object size distribution may be unequal - future
29+
versions will attempt to balance the classes among threads. The range is 1 to number of AllocationClasses.
30+
The default is 1.
31+
32+
- `maxEvictionBatch`: The number of objects to remove in a given eviction call. The
33+
default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
34+
remove objects at a reasonable rate, too high and it might increase contention with user threads.
35+
36+
- `minEvictionBatch`: Minimum number of items to evict at any time (if there are any
37+
candidates)
38+
39+
- `maxEvictionPromotionHotness`: Maximum candidates to consider for eviction. This is similar to `maxEvictionBatch`
40+
but it specifies how many candidates will be taken into consideration, not the actual number of items to evict.
41+
This option can be used to configure duration of critical section on LRU lock.
42+
43+
44+
### FreeThresholdStrategy (default)
45+
46+
- `lowEvictionAcWatermark`: Triggers background eviction thread to run
47+
when this percentage of the AllocationClass is free.
48+
The default is `2.0`, to avoid wasting capacity we don't set this above `10.0`.
49+
50+
- `highEvictionAcWatermark`: Stop the evictions from an AllocationClass when this
51+
percentage of the AllocationClass is free. The default is `5.0`, to avoid wasting capacity we
52+
don't set this above `10`.
53+
54+
55+
## Background Promoters
56+
57+
The background promoters scan each class to see if there are objects to move to a lower
58+
tier using a given strategy. Here we document the parameters for the different
59+
strategies and general parameters.
60+
61+
- `backgroundPromoterIntervalMilSec`: The interval that this thread runs for - by default
62+
the background promoter threads will wake up every 10 ms to scan the AllocationClasses for
63+
objects to promote.
64+
65+
- `promoterThreads`: The number of background promoters to run - each thread is a assigned
66+
a set of AllocationClasses to scan and promote objects from. Currently, each thread gets
67+
an equal number of classes to scan - but as object size distribution may be unequal - future
68+
versions will attempt to balance the classes among threads. The range is `1` to number of AllocationClasses. The default is `1`.
69+
70+
- `maxProtmotionBatch`: The number of objects to promote in a given promotion call. The
71+
default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
72+
remove objects at a reasonable rate, too high and it might increase contention with user threads.
73+
74+
- `minPromotionBatch`: Minimum number of items to promote at any time (if there are any
75+
candidates)
76+
77+
- `numDuplicateElements`: This allows us to promote items that have existing handles (read-only) since
78+
we won't need to modify the data when a user is done with the data. Therefore, for a short time
79+
the data could reside in both tiers until it is evicted from its current tier. The default is to
80+
not allow this (0). Setting the value to 100 will enable duplicate elements in tiers.
81+
82+
### Background Promotion Strategy (only one currently)
83+
84+
- `promotionAcWatermark`: Promote items if there is at least this
85+
percent of free AllocationClasses. Promotion thread will attempt to move `maxPromotionBatch` number of objects
86+
to that tier. The objects are chosen from the head of the LRU. The default is `4.0`.
87+
This value should correlate with `lowEvictionAcWatermark`, `highEvictionAcWatermark`, `minAcAllocationWatermark`, `maxAcAllocationWatermark`.
88+
- `maxPromotionBatch`: The number of objects to promote in batch during BG promotion. Analogous to
89+
`maxEvictionBatch`. It's value should be lower to decrease contention on hot items.
90+

cachelib/allocator/BackgroundMover.h

Lines changed: 81 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@
1818

1919
#include "cachelib/allocator/BackgroundMoverStrategy.h"
2020
#include "cachelib/allocator/CacheStats.h"
21-
#include "cachelib/common/AtomicCounter.h"
2221
#include "cachelib/common/PeriodicWorker.h"
2322

2423
namespace facebook::cachelib {
@@ -51,6 +50,7 @@ enum class MoverDir { Evict = 0, Promote };
5150
template <typename CacheT>
5251
class BackgroundMover : public PeriodicWorker {
5352
public:
53+
using ClassBgStatsType = std::map<MemoryDescriptorType,uint64_t>;
5454
using Cache = CacheT;
5555
// @param cache the cache interface
5656
// @param strategy the stragey class that defines how objects are
@@ -62,8 +62,9 @@ class BackgroundMover : public PeriodicWorker {
6262
~BackgroundMover() override;
6363

6464
BackgroundMoverStats getStats() const noexcept;
65-
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
66-
getClassStats() const noexcept;
65+
ClassBgStatsType getClassStats() const noexcept {
66+
return movesPerClass_;
67+
}
6768

6869
void setAssignedMemory(std::vector<MemoryDescriptorType>&& assignedMemory);
6970

@@ -72,8 +73,27 @@ class BackgroundMover : public PeriodicWorker {
7273
static size_t workerId(TierId tid, PoolId pid, ClassId cid, size_t numWorkers);
7374

7475
private:
75-
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
76-
movesPerClass_;
76+
ClassBgStatsType movesPerClass_;
77+
78+
struct TraversalStats {
79+
// record a traversal and its time taken
80+
void recordTraversalTime(uint64_t nsTaken);
81+
82+
uint64_t getAvgTraversalTimeNs(uint64_t numTraversals) const;
83+
uint64_t getMinTraversalTimeNs() const { return minTraversalTimeNs_; }
84+
uint64_t getMaxTraversalTimeNs() const { return maxTraversalTimeNs_; }
85+
uint64_t getLastTraversalTimeNs() const { return lastTraversalTimeNs_; }
86+
87+
private:
88+
// time it took us the last time to traverse the cache.
89+
uint64_t lastTraversalTimeNs_{0};
90+
uint64_t minTraversalTimeNs_{
91+
std::numeric_limits<uint64_t>::max()};
92+
uint64_t maxTraversalTimeNs_{0};
93+
uint64_t totalTraversalTimeNs_{0};
94+
};
95+
96+
TraversalStats traversalStats_;
7797
// cache allocator's interface for evicting
7898
using Item = typename Cache::Item;
7999

@@ -89,9 +109,10 @@ class BackgroundMover : public PeriodicWorker {
89109
void work() override final;
90110
void checkAndRun();
91111

92-
AtomicCounter numMovedItems_{0};
93-
AtomicCounter numTraversals_{0};
94-
AtomicCounter totalBytesMoved_{0};
112+
uint64_t numMovedItems{0};
113+
uint64_t numTraversals{0};
114+
uint64_t totalClasses{0};
115+
uint64_t totalBytesMoved{0};
95116

96117
std::vector<MemoryDescriptorType> assignedMemory_;
97118
folly::DistributedMutex mutex_;
@@ -111,6 +132,20 @@ BackgroundMover<CacheT>::BackgroundMover(
111132
}
112133
}
113134

135+
template <typename CacheT>
136+
void BackgroundMover<CacheT>::TraversalStats::recordTraversalTime(uint64_t nsTaken) {
137+
lastTraversalTimeNs_ = nsTaken;
138+
minTraversalTimeNs_ = std::min(minTraversalTimeNs_, nsTaken);
139+
maxTraversalTimeNs_ = std::max(maxTraversalTimeNs_, nsTaken);
140+
totalTraversalTimeNs_ += nsTaken;
141+
}
142+
143+
template <typename CacheT>
144+
uint64_t BackgroundMover<CacheT>::TraversalStats::getAvgTraversalTimeNs(
145+
uint64_t numTraversals) const {
146+
return numTraversals ? totalTraversalTimeNs_ / numTraversals : 0;
147+
}
148+
114149
template <typename CacheT>
115150
BackgroundMover<CacheT>::~BackgroundMover() {
116151
stop(std::chrono::seconds(0));
@@ -144,44 +179,56 @@ template <typename CacheT>
144179
void BackgroundMover<CacheT>::checkAndRun() {
145180
auto assignedMemory = mutex_.lock_combine([this] { return assignedMemory_; });
146181

147-
unsigned int moves = 0;
148-
auto batches = strategy_->calculateBatchSizes(cache_, assignedMemory);
149-
150-
for (size_t i = 0; i < batches.size(); i++) {
151-
const auto [tid, pid, cid] = assignedMemory[i];
152-
const auto batch = batches[i];
182+
while (true) {
183+
unsigned int moves = 0;
184+
std::set<ClassId> classes{};
185+
auto batches = strategy_->calculateBatchSizes(cache_, assignedMemory);
186+
187+
const auto begin = util::getCurrentTimeNs();
188+
for (size_t i = 0; i < batches.size(); i++) {
189+
const auto [tid, pid, cid] = assignedMemory[i];
190+
const auto batch = batches[i];
191+
if (!batch) {
192+
continue;
193+
}
194+
195+
// try moving BATCH items from the class in order to reach free target
196+
auto moved = moverFunc(cache_, tid, pid, cid, batch);
197+
moves += moved;
198+
movesPerClass_[assignedMemory[i]] += moved;
199+
}
200+
auto end = util::getCurrentTimeNs();
201+
if (moves > 0) {
202+
traversalStats_.recordTraversalTime(end > begin ? end - begin : 0);
203+
numMovedItems += moves;
204+
numTraversals++;
205+
}
153206

154-
if (batch == 0) {
155-
continue;
207+
//we didn't move any objects done with this run
208+
if (moves == 0 || shouldStopWork()) {
209+
break;
156210
}
157-
const auto& mpStats = cache_.getPoolByTid(pid, tid).getStats();
158-
// try moving BATCH items from the class in order to reach free target
159-
auto moved = moverFunc(cache_, tid, pid, cid, batch);
160-
moves += moved;
161-
movesPerClass_[tid][pid][cid] += moved;
162-
totalBytesMoved_.add(moved * mpStats.acStats.at(cid).allocSize );
163211
}
164-
165-
numTraversals_.inc();
166-
numMovedItems_.add(moves);
167212
}
168213

169214
template <typename CacheT>
170215
BackgroundMoverStats BackgroundMover<CacheT>::getStats() const noexcept {
171216
BackgroundMoverStats stats;
172-
stats.numMovedItems = numMovedItems_.get();
173-
stats.runCount = numTraversals_.get();
174-
stats.totalBytesMoved = totalBytesMoved_.get();
217+
stats.numMovedItems = numMovedItems;
218+
stats.totalBytesMoved = totalBytesMoved;
219+
stats.totalClasses = totalClasses;
220+
auto runCount = getRunCount();
221+
stats.runCount = runCount;
222+
stats.numTraversals = numTraversals;
223+
stats.avgItemsMoved = (double) stats.numMovedItems / (double)runCount;
224+
stats.lastTraversalTimeNs = traversalStats_.getLastTraversalTimeNs();
225+
stats.avgTraversalTimeNs = traversalStats_.getAvgTraversalTimeNs(numTraversals);
226+
stats.minTraversalTimeNs = traversalStats_.getMinTraversalTimeNs();
227+
stats.maxTraversalTimeNs = traversalStats_.getMaxTraversalTimeNs();
175228

176229
return stats;
177230
}
178231

179-
template <typename CacheT>
180-
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
181-
BackgroundMover<CacheT>::getClassStats() const noexcept {
182-
return movesPerClass_;
183-
}
184-
185232
template <typename CacheT>
186233
size_t BackgroundMover<CacheT>::workerId(TierId tid,
187234
PoolId pid,

cachelib/allocator/BackgroundMoverStrategy.h

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,6 @@
2121
namespace facebook {
2222
namespace cachelib {
2323

24-
struct MemoryDescriptorType {
25-
MemoryDescriptorType(TierId tid, PoolId pid, ClassId cid) :
26-
tid_(tid), pid_(pid), cid_(cid) {}
27-
TierId tid_;
28-
PoolId pid_;
29-
ClassId cid_;
30-
};
31-
3224
// Base class for background eviction strategy.
3325
class BackgroundMoverStrategy {
3426
public:
@@ -46,5 +38,34 @@ class BackgroundMoverStrategy {
4638
virtual ~BackgroundMoverStrategy() = default;
4739
};
4840

41+
class DefaultBackgroundMoverStrategy : public BackgroundMoverStrategy {
42+
public:
43+
DefaultBackgroundMoverStrategy(uint64_t batchSize, double targetFree)
44+
: batchSize_(batchSize), targetFree_((double)targetFree/100.0) {}
45+
~DefaultBackgroundMoverStrategy() {}
46+
47+
std::vector<size_t> calculateBatchSizes(
48+
const CacheBase& cache,
49+
std::vector<MemoryDescriptorType> acVec) {
50+
std::vector<size_t> batches{};
51+
for (auto [tid, pid, cid] : acVec) {
52+
double usage = cache.getPoolByTid(pid, tid).getApproxUsage(cid);
53+
uint32_t perSlab = cache.getPoolByTid(pid, tid).getPerSlab(cid);
54+
if (usage >= (1.0-targetFree_)) {
55+
uint32_t batch = batchSize_ > perSlab ? perSlab : batchSize_;
56+
batches.push_back(batch);
57+
} else {
58+
//no work to be done since there is already
59+
//at least targetFree remaining in the class
60+
batches.push_back(0);
61+
}
62+
}
63+
return batches;
64+
}
65+
private:
66+
uint64_t batchSize_{100};
67+
double targetFree_{0.05};
68+
};
69+
4970
} // namespace cachelib
5071
} // namespace facebook

cachelib/allocator/Cache.h

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,22 @@ enum class DestructorContext {
7373
kRemovedFromNVM
7474
};
7575

76+
struct MemoryDescriptorType {
77+
MemoryDescriptorType(TierId tid, PoolId pid, ClassId cid) :
78+
tid_(tid), pid_(pid), cid_(cid) {}
79+
TierId tid_;
80+
PoolId pid_;
81+
ClassId cid_;
82+
83+
bool operator<(const MemoryDescriptorType& rhs) const {
84+
return std::make_tuple(tid_, pid_, cid_) < std::make_tuple(rhs.tid_, rhs.pid_, rhs.cid_);
85+
}
86+
87+
bool operator==(const MemoryDescriptorType& rhs) const {
88+
return std::make_tuple(tid_, pid_, cid_) == std::make_tuple(rhs.tid_, rhs.pid_, rhs.cid_);
89+
}
90+
};
91+
7692
// A base class of cache exposing members and status agnostic of template type.
7793
class CacheBase {
7894
public:

0 commit comments

Comments
 (0)