-
Notifications
You must be signed in to change notification settings - Fork 0
[Store] feat: introduce tired backend #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Xingrui Yi <[email protected]>
WalkthroughIntroduces a tiered storage abstraction layer for mooncake consisting of cache tier interfaces, a registry for memory-type copy operations, a data copier with direct and fallback paths, and a backend managing multi-tier allocation, replication, and metadata across diverse storage media. Changes
Sequence DiagramssequenceDiagram
participant Client
participant TieredBackend
participant CacheTier
participant DataCopier
participant MemoryPool
Client->>TieredBackend: Allocate(size, preferred_tier)
activate TieredBackend
TieredBackend->>TieredBackend: GetSortedTiers()
TieredBackend->>CacheTier: Allocate(size)
activate CacheTier
CacheTier->>MemoryPool: allocate
MemoryPool-->>CacheTier: ptr
CacheTier-->>TieredBackend: AllocationHandle(tier_id, DataSource)
deactivate CacheTier
TieredBackend-->>Client: AllocationHandle
deactivate TieredBackend
Client->>TieredBackend: Write(source_data, handle)
activate TieredBackend
TieredBackend->>DataCopier: Copy(source, dest_from_handle)
activate DataCopier
DataCopier->>DataCopier: FindCopier(src_type, dest_type)
alt Direct Path Exists
DataCopier->>DataCopier: Execute direct copy
else No Direct Path
note over DataCopier: Both non-DRAM?
DataCopier->>TieredBackend: Allocate temp DRAM buffer
DataCopier->>DataCopier: Copy src → DRAM
DataCopier->>DataCopier: Copy DRAM → dest
end
DataCopier-->>TieredBackend: success/failure
deactivate DataCopier
TieredBackend-->>Client: bool
deactivate TieredBackend
Client->>TieredBackend: Commit(key, handle)
activate TieredBackend
TieredBackend->>TieredBackend: Update metadata_index<br/>(per-key replicas)
TieredBackend-->>Client: bool
deactivate TieredBackend
sequenceDiagram
participant Client
participant TieredBackend
participant MetadataIndex
participant CacheTier1
participant CacheTier2
Client->>TieredBackend: Get(key, optional_tier_id)
activate TieredBackend
TieredBackend->>MetadataIndex: Lookup replicas for key
activate MetadataIndex
MetadataIndex-->>TieredBackend: replica list sorted by priority
deactivate MetadataIndex
alt tier_id specified
TieredBackend->>CacheTier1: Retrieve from specific tier
else best available
TieredBackend->>CacheTier1: Retrieve from highest-priority tier
end
activate CacheTier1
CacheTier1-->>TieredBackend: AllocationHandle
deactivate CacheTier1
TieredBackend-->>Client: AllocationHandle
deactivate TieredBackend
Client->>TieredBackend: Delete(key, optional_tier_id)
activate TieredBackend
TieredBackend->>MetadataIndex: Remove replica from key
activate MetadataIndex
MetadataIndex->>CacheTier2: Free allocation
activate CacheTier2
CacheTier2-->>MetadataIndex: success
deactivate CacheTier2
MetadataIndex-->>TieredBackend: bool
deactivate MetadataIndex
TieredBackend-->>Client: bool
deactivate TieredBackend
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (3 warnings)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (9)
mooncake-store/src/CMakeLists.txt (1)
15-20: Pre-existing issue: Duplicate source file entries.
etcd_helper.cppandha_helper.cppappear twice inMOONCAKE_STORE_SOURCES(lines 15-16 and 19-20). While CMake typically handles this gracefully, it's worth cleaning up to avoid confusion.mooncake-store/include/tiered_cache/cache_tier.h (1)
6-6: Remove unused include.
<optional>is included but not used in this header.-#include <optional>mooncake-store/include/tiered_cache/copier_registry.h (1)
12-13: Remove redundant forward declaration.
DataCopierBuilderis forward-declared here, butdata_copier.his already included on line 4, which definesDataCopierBuilder. The forward declaration is unnecessary.namespace mooncake { -// Forward declaration from data_copier.h to avoid circular dependency -class DataCopierBuilder; - // Holds the registration information for a memory type.mooncake-store/src/tiered_cache/data_copier.cpp (1)
3-3: Remove unused include.
<fstream>is included but not used in this file.-#include <fstream>mooncake-store/include/tiered_cache/data_copier.h (2)
1-9: Minor include cleanup opportunity.
<stdexcept>and<vector>don't appear to be directly used in this header. Consider removing unused includes to reduce compilation overhead and keep dependencies minimal.
60-88: Consider adding move semantics.While copy operations are deleted (appropriate for this class), move operations are implicitly deleted due to the deleted copy operations. If
DataCopierneeds to be moved (e.g., stored in containers or returned from functions), consider explicitly defaulting the move constructor and move assignment operator.~DataCopier() = default; DataCopier(const DataCopier&) = delete; DataCopier& operator=(const DataCopier&) = delete; + DataCopier(DataCopier&&) = default; + DataCopier& operator=(DataCopier&&) = default;mooncake-store/src/tiered_cache/tiered_backend.cpp (2)
172-180: Consider sorted insertion instead of sort-after-insert.When a new replica is added, the entire vector is sorted. For small replica counts this is fine, but consider using
std::lower_boundfor O(log n) insertion point lookup followed by insert, which maintains sorted order without full re-sort.
241-248: Variableitshadows outer scope variable.The inner loop variable
iton line 242 shadows the outeritfrom line 235. While this compiles correctly, it's confusing and error-prone. Consider renaming one of them.std::unique_lock<std::shared_mutex> entry_write_lock( entry->mutex); auto tier_it = entry->replicas.end(); - for (auto it = entry->replicas.begin(); - it != entry->replicas.end(); ++it) { - if (it->first == *tier_id) { - tier_it = it; + for (auto replica_it = entry->replicas.begin(); + replica_it != entry->replicas.end(); ++replica_it) { + if (replica_it->first == *tier_id) { + tier_it = replica_it; break; } }mooncake-store/include/tiered_cache/tiered_backend.h (1)
24-27: Remove redundantstructkeyword.The
structkeyword beforeDataSourceis unnecessary in C++ and is a C-style declaration.struct TieredLocation { uint64_t tier_id; - struct DataSource data; + DataSource data; };
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
mooncake-store/include/tiered_cache/cache_tier.h(1 hunks)mooncake-store/include/tiered_cache/copier_registry.h(1 hunks)mooncake-store/include/tiered_cache/data_copier.h(1 hunks)mooncake-store/include/tiered_cache/tiered_backend.h(1 hunks)mooncake-store/src/CMakeLists.txt(1 hunks)mooncake-store/src/tiered_cache/copier_registry.cpp(1 hunks)mooncake-store/src/tiered_cache/data_copier.cpp(1 hunks)mooncake-store/src/tiered_cache/tiered_backend.cpp(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
mooncake-store/src/tiered_cache/data_copier.cpp (2)
mooncake-store/include/tiered_cache/data_copier.h (2)
DataCopierBuilder(25-53)DataCopier(60-88)mooncake-store/src/tiered_cache/copier_registry.cpp (2)
GetInstance(7-10)GetInstance(7-7)
mooncake-store/src/tiered_cache/tiered_backend.cpp (1)
mooncake-store/include/tiered_cache/tiered_backend.h (1)
TieredBackend(80-193)
mooncake-store/include/tiered_cache/copier_registry.h (3)
mooncake-store/include/tiered_cache/cache_tier.h (1)
mooncake(10-88)mooncake-store/include/tiered_cache/data_copier.h (2)
mooncake(11-90)DataCopierBuilder(25-53)mooncake-store/src/tiered_cache/copier_registry.cpp (11)
GetInstance(7-10)GetInstance(7-7)RegisterMemoryType(12-16)RegisterMemoryType(12-13)RegisterDirectPath(18-21)RegisterDirectPath(18-19)GetMemoryTypeRegistrations(23-26)GetMemoryTypeRegistrations(24-24)GetDirectPathRegistrations(28-31)GetDirectPathRegistrations(29-29)CopierRegistrar(33-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: build (3.12)
- GitHub Check: build-musa
- GitHub Check: build-flags (3.10)
- GitHub Check: build (3.10)
- GitHub Check: build-flags (3.12)
- GitHub Check: Build Docker Image
🔇 Additional comments (19)
mooncake-store/src/CMakeLists.txt (1)
29-31: LGTM!The new tiered cache source files are correctly added to the build.
mooncake-store/include/tiered_cache/cache_tier.h (1)
50-86: LGTM!The
CacheTierabstract interface is well-designed with clear separation of concerns for allocation, deallocation, and tier metadata accessors. The use of pure virtual methods ensures concrete implementations must provide all required functionality.mooncake-store/include/tiered_cache/copier_registry.h (2)
35-70: Consider thread-safety for concurrent registration.The singleton instance initialization is thread-safe (Meyers singleton), but
RegisterMemoryTypeandRegisterDirectPathmutate internal vectors without synchronization. If staticCopierRegistrarobjects are initialized concurrently across threads (e.g., in dynamically loaded libraries), this could cause data races.If all registrations occur during single-threaded static initialization (the typical use case), this is fine. Otherwise, consider adding a mutex or documenting the single-threaded initialization requirement.
80-84: LGTM!The
CopierRegistrarhelper provides a clean pattern for self-registration of memory types during static initialization.mooncake-store/src/tiered_cache/data_copier.cpp (3)
9-20: LGTM!The constructor correctly populates the copy matrix from the global registry, establishing both to/from DRAM paths and direct paths.
29-51: LGTM!The validation in
Build()ensures all registered non-DRAM types have their required DRAM fallback paths before constructing the immutableDataCopier. Usingstd::unique_ptrwith private constructor vianewis an appropriate pattern here.
63-114: LGTM!The
Copyimplementation correctly prioritizes direct paths and falls back to a two-step DRAM-based copy when needed. Error handling is thorough with appropriate logging at each failure point. The temporary buffer usesstd::nothrowand checks for allocation failure.mooncake-store/src/tiered_cache/copier_registry.cpp (4)
7-10: LGTM!The Meyers singleton pattern provides thread-safe initialization in C++11 and later.
12-31: LGTM!Registration and accessor methods are implemented correctly. Using
std::movefor the copy functions avoids unnecessary copies.
33-39: LGTM!The
CopierRegistrarconstructor correctly delegates to the singleton'sRegisterMemoryType, enabling static self-registration.
2-2: The#include "tiered_cache/data_copier.h"include is necessary.CopyFunctionis defined indata_copier.hand is used directly in this file in multiple function signatures:RegisterMemoryType()(line 12),RegisterDirectPath()(line 18), and theCopierRegistrarconstructor (line 33).Likely an incorrect or invalid review comment.
mooncake-store/include/tiered_cache/data_copier.h (1)
25-53: LGTM!The builder pattern is well-implemented with proper documentation. The
AddDirectPathmethod returns a reference for chaining, andBuild()correctly returns aunique_ptrfor ownership transfer.mooncake-store/src/tiered_cache/tiered_backend.cpp (5)
63-72: Potentialstd::out_of_rangeiftiers_andtier_info_are out of sync.
GetSortedTiersiterates overtiers_keys but accessestier_info_.at(). If a tier ID exists intiers_but not intier_info_, this will throw. Consider iterating overtier_info_instead (since that's what's actually populated), or add defensive checks.std::vector<uint64_t> TieredBackend::GetSortedTiers() const { std::vector<uint64_t> ids; - for (const auto& [id, _] : tiers_) ids.push_back(id); + for (const auto& [id, _] : tier_info_) ids.push_back(id); // Sort by priority descending (higher priority first) std::sort(ids.begin(), ids.end(), [this](uint64_t a, uint64_t b) {
74-102: LGTM with assumption.The allocation logic correctly tries the preferred tier first, then falls back to priority-sorted tiers. This assumes
tiers_is immutable afterInit()completes, which is a reasonable design for a tiered cache system.
123-129: LGTM!The
Writefunction correctly validates the handle and tier before delegating toDataCopier. The null check prevents crashes on invalid handles.
186-216: LGTM!The
Getfunction correctly uses read locks for concurrency. Returning ashared_ptrcopy while holding the lock ensures the reference count is incremented atomically before the lock is released.
315-337: LGTM!The
CopyDatafunction correctly leverages RAII for cleanup on failure paths. IfWriteorCommitfails, the uncommitteddest_handlewill be destroyed, triggeringFreeInternalto release the allocated space.mooncake-store/include/tiered_cache/tiered_backend.h (2)
80-193: LGTM!The
TieredBackendclass is well-designed with clear separation between client-centric operations, composite operations, and introspection APIs. The documentation is thorough and the use ofshared_mutexfor concurrent access is appropriate.
165-169: Design note: vector of pairs is appropriate here.Using
std::vector<std::pair<uint64_t, AllocationHandle>>instead of a map allows maintaining priority-sorted order, which aligns with theGet()fallback behavior returning the highest-priority replica.
| struct DataSource; | ||
| enum class MemoryType; | ||
| class TieredBackend; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove redundant forward declarations.
DataSource and MemoryType are forward-declared but then fully defined in the same header (lines 20 and 37). These forward declarations are unnecessary. Additionally, forward-declaring an enum class before its definition in the same translation unit is non-standard.
namespace mooncake {
-struct DataSource;
-enum class MemoryType;
class TieredBackend;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| struct DataSource; | |
| enum class MemoryType; | |
| class TieredBackend; | |
| class TieredBackend; |
🤖 Prompt for AI Agents
In mooncake-store/include/tiered_cache/cache_tier.h around lines 12-14, remove
the redundant forward declarations of DataSource and MemoryType (and the forward
declaration of TieredBackend if it duplicates a later definition) because
DataSource and MemoryType are fully defined later in the same header (around
lines 20 and 37) and forward-declaring an enum class in the same translation
unit is non-standard; delete those forward-declaration lines and ensure any
references still compile (no other headers rely on them here) so the file uses
the later full definitions.
| struct AllocationEntry { | ||
| TieredBackend* backend; | ||
| TieredLocation loc; | ||
|
|
||
| AllocationEntry(TieredBackend* b, TieredLocation l) : backend(b), loc(l) {} | ||
|
|
||
| // Destructor: Automatically releases the resource if valid | ||
| ~AllocationEntry(); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllocationEntry should be non-copyable to prevent double-free.
AllocationEntry manages a resource that's freed in its destructor. Without deleted copy operations, an implicit copy could lead to double-free when both copies are destroyed.
struct AllocationEntry {
TieredBackend* backend;
TieredLocation loc;
AllocationEntry(TieredBackend* b, TieredLocation l) : backend(b), loc(l) {}
+ AllocationEntry(const AllocationEntry&) = delete;
+ AllocationEntry& operator=(const AllocationEntry&) = delete;
// Destructor: Automatically releases the resource if valid
~AllocationEntry();
};🤖 Prompt for AI Agents
In mooncake-store/include/tiered_cache/tiered_backend.h around lines 51 to 59,
AllocationEntry currently allows implicit copies which can cause double-free;
make the class non-copyable by explicitly deleting the copy constructor and copy
assignment operator (AllocationEntry(const AllocationEntry&) = delete;
AllocationEntry& operator=(const AllocationEntry&) = delete) and if ownership
transfer is required, add move constructor and move assignment (or default them)
to allow safe moves. Ensure the rest of the code uses moves or references
instead of copying.
| } catch (const std::logic_error& e) { | ||
| LOG(FATAL) << "Failed to build DataCopier: " << e.what(); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG(FATAL) makes the return false unreachable.
LOG(FATAL) in glog terminates the program, so line 28 is dead code. Either downgrade to LOG(ERROR) if you want to return gracefully, or remove the return statement.
} catch (const std::logic_error& e) {
- LOG(FATAL) << "Failed to build DataCopier: " << e.what();
+ LOG(ERROR) << "Failed to build DataCopier: " << e.what();
return false;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } catch (const std::logic_error& e) { | |
| LOG(FATAL) << "Failed to build DataCopier: " << e.what(); | |
| return false; | |
| } | |
| } catch (const std::logic_error& e) { | |
| LOG(ERROR) << "Failed to build DataCopier: " << e.what(); | |
| return false; | |
| } |
🤖 Prompt for AI Agents
In mooncake-store/src/tiered_cache/tiered_backend.cpp around lines 26 to 29, the
catch uses LOG(FATAL) which terminates the process and makes the subsequent
"return false;" unreachable; change the log level to LOG(ERROR) (or LOG(WARNING)
as appropriate) so the function can return false gracefully, and keep the return
statement; alternatively, if the intention is to abort the process, remove the
return statement and keep LOG(FATAL) — choose one consistent behavior and update
the code accordingly.
| LOG(INFO) << "TieredBackend initialized successfully with " << tiers_.size() | ||
| << " tiers."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log message is misleading; tiers_ is never populated.
The log reports tiers_.size() but the tiers_ map is not populated anywhere (only tier_info_ is). This will always log "0 tiers" and GetSortedTiers() will return an empty vector, causing allocations to fail. Consider logging tier_info_.size() until the tier factory is implemented, or add a TODO comment clarifying this.
- LOG(INFO) << "TieredBackend initialized successfully with " << tiers_.size()
+ LOG(INFO) << "TieredBackend initialized successfully with " << tier_info_.size()
<< " tiers.";📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| LOG(INFO) << "TieredBackend initialized successfully with " << tiers_.size() | |
| << " tiers."; | |
| LOG(INFO) << "TieredBackend initialized successfully with " << tier_info_.size() | |
| << " tiers."; |
🤖 Prompt for AI Agents
mooncake-store/src/tiered_cache/tiered_backend.cpp lines 58-59: the INFO log
prints tiers_.size() but tiers_ is never populated (only tier_info_ is), so
change the log to report tier_info_.size() instead (or add a clear TODO noting
the tier factory is unimplemented and that tiers_ will be populated later); also
update/GetSortedTiers usage to rely on tier_info_ for now to avoid returning an
empty vector that breaks allocations — ensure the comment/TODO references the
factory implementation and expected future change.
| return (it != tiers_.end()) ? it->second.get() : nullptr; | ||
| } | ||
|
|
||
| const DataCopier& TieredBackend::GetDataCopier() const { return *data_copier_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential null dereference if Init() was not called or failed.
GetDataCopier() dereferences data_copier_ without a null check. If Init() was not called or threw during DataCopierBuilder::Build(), this will cause undefined behavior.
Consider adding an assertion or returning a pointer:
-const DataCopier& TieredBackend::GetDataCopier() const { return *data_copier_; }
+const DataCopier& TieredBackend::GetDataCopier() const {
+ CHECK(data_copier_) << "TieredBackend not initialized";
+ return *data_copier_;
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const DataCopier& TieredBackend::GetDataCopier() const { return *data_copier_; } | |
| const DataCopier& TieredBackend::GetDataCopier() const { | |
| CHECK(data_copier_) << "TieredBackend not initialized"; | |
| return *data_copier_; | |
| } |
🤖 Prompt for AI Agents
In mooncake-store/src/tiered_cache/tiered_backend.cpp around line 356,
GetDataCopier() dereferences data_copier_ without verifying it was initialized;
add a defensive check to avoid undefined behavior by either (preferred) keeping
the current signature but asserting/validating and throwing a clear exception
when data_copier_ is null (e.g., DCHECK/ASSERT or throw std::runtime_error with
a descriptive message) or (alternative) change the API to return a pointer
(DataCopier*) or optional so callers handle absence; implement one of these
fixes and update callers/tests if you change the return type.
|
@coderabbitai generate docstrings |
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @YiXR. * #4 (comment) The following files were modified: * `mooncake-store/include/tiered_cache/cache_tier.h` * `mooncake-store/include/tiered_cache/copier_registry.h` * `mooncake-store/include/tiered_cache/data_copier.h` * `mooncake-store/include/tiered_cache/tiered_backend.h` * `mooncake-store/src/tiered_cache/copier_registry.cpp` * `mooncake-store/src/tiered_cache/data_copier.cpp` * `mooncake-store/src/tiered_cache/tiered_backend.cpp`
Description
Type of Change
How Has This Been Tested?
Checklist
Summary by CodeRabbit
Release Notes
New Features
✏️ Tip: You can customize this high-level summary in your review settings.