-
Notifications
You must be signed in to change notification settings - Fork 500
[WIP]client rpc接口定义和data_manager初步实现 #1329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]client rpc接口定义和data_manager初步实现 #1329
Conversation
Summary of ChangesHello @swallowCXY, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求为 Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
这个PR引入了客户端RPC服务和用于处理远程数据操作的DataManager。整体结构是一个不错的起点。但是,有几个关键问题需要解决。其中包括由于std::expected和tl::expected使用不一致导致的编译错误,以及rpc_client_pool缺少定义。此外,还存在严重的性能问题,例如批量操作的顺序实现以及使用低效的轮询循环来等待传输完成。数据传输逻辑中也存在大量重复代码,应该进行重构。我提供了具体的评论和建议来解决这些问题。
| std::expected<void, ErrorCode> ReadRemoteData(const RemoteReadRequest& request); | ||
|
|
||
| /** | ||
| * @brief Write remote data: Client A requests Client B to write data from A | ||
| * @param request RemoteWriteRequest containing key, source buffers, and target_tier_id | ||
| * @return ErrorCode indicating success or failure | ||
| */ | ||
| std::expected<void, ErrorCode> WriteRemoteData(const RemoteWriteRequest& request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadRemoteData和WriteRemoteData的声明使用了std::expected,但它们在client_rpc_service.cpp中的定义使用了tl::expected。这将导致编译错误。请统一使用tl::expected以保持一致性。
| std::expected<void, ErrorCode> ReadRemoteData(const RemoteReadRequest& request); | |
| /** | |
| * @brief Write remote data: Client A requests Client B to write data from A | |
| * @param request RemoteWriteRequest containing key, source buffers, and target_tier_id | |
| * @return ErrorCode indicating success or failure | |
| */ | |
| std::expected<void, ErrorCode> WriteRemoteData(const RemoteWriteRequest& request); | |
| tl::expected<void, ErrorCode> ReadRemoteData(const RemoteReadRequest& request); | |
| /** | |
| * @brief Write remote data: Client A requests Client B to write data from A | |
| * @param request RemoteWriteRequest containing key, source buffers, and target_tier_id | |
| * @return ErrorCode indicating success or failure | |
| */ | |
| tl::expected<void, ErrorCode> WriteRemoteData(const RemoteWriteRequest& request); |
| // Wait for transfer completion | ||
| bool all_completed = false; | ||
| const int max_wait_iterations = 1000; // Timeout protection | ||
| int iteration = 0; | ||
|
|
||
| while (!all_completed && iteration < max_wait_iterations) { | ||
| all_completed = true; | ||
| for (size_t i = 0; i < transfer_requests.size(); ++i) { | ||
| TransferStatus status; | ||
| Status status_result = transfer_engine_->getTransferStatus(batch_id, i, status); | ||
|
|
||
| if (!status_result.ok()) { | ||
| LOG(ERROR) << "TransferDataFromRemote: Failed to get transfer status for task " << i; | ||
| transfer_engine_->freeBatchID(batch_id); | ||
| transfer_engine_->closeSegment(remote_segment_id); | ||
| return std::unexpected(ErrorCode::TRANSFER_FAIL); | ||
| } | ||
|
|
||
| if (status.s == TransferStatusEnum::FAILED || | ||
| status.s == TransferStatusEnum::CANCELED || | ||
| status.s == TransferStatusEnum::TIMEOUT) { | ||
| LOG(ERROR) << "TransferDataFromRemote: Transfer failed for task " << i | ||
| << ", status: " << static_cast<int>(status.s); | ||
| transfer_engine_->freeBatchID(batch_id); | ||
| transfer_engine_->closeSegment(remote_segment_id); | ||
| return std::unexpected(ErrorCode::TRANSFER_FAIL); | ||
| } | ||
|
|
||
| if (status.s != TransferStatusEnum::COMPLETED) { | ||
| all_completed = false; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| if (!all_completed) { | ||
| std::this_thread::sleep_for(std::chrono::microseconds(100)); | ||
| ++iteration; | ||
| } | ||
| } | ||
|
|
||
| if (!all_completed) { | ||
| LOG(ERROR) << "TransferDataFromRemote: Transfer timeout"; | ||
| transfer_engine_->freeBatchID(batch_id); | ||
| transfer_engine_->closeSegment(remote_segment_id); | ||
| return std::unexpected(ErrorCode::TRANSFER_FAIL); | ||
| } | ||
|
|
||
| // Cleanup | ||
| transfer_engine_->freeBatchID(batch_id); | ||
| transfer_engine_->closeSegment(remote_segment_id); | ||
|
|
||
| return {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::expected<void, ErrorCode> ReadRemoteData(const RemoteReadRequest& request); | ||
| std::expected<void, ErrorCode> WriteRemoteData(const RemoteWriteRequest& request); | ||
| std::vector<tl::expected<void, ErrorCode>> BatchReadRemoteData(const BatchRemoteReadRequest& request); | ||
| std::vector<tl::expected<void, ErrorCode>> BatchWriteRemoteData(const BatchRemoteWriteRequest& request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::expected和tl::expected的使用不一致。为了与代码库的其他部分(如client_rpc_service.cpp)保持一致,最好统一使用tl::expected。
| std::expected<void, ErrorCode> ReadRemoteData(const RemoteReadRequest& request); | |
| std::expected<void, ErrorCode> WriteRemoteData(const RemoteWriteRequest& request); | |
| std::vector<tl::expected<void, ErrorCode>> BatchReadRemoteData(const BatchRemoteReadRequest& request); | |
| std::vector<tl::expected<void, ErrorCode>> BatchWriteRemoteData(const BatchRemoteWriteRequest& request); | |
| tl::expected<void, ErrorCode> ReadRemoteData(const RemoteReadRequest& request); | |
| tl::expected<void, ErrorCode> WriteRemoteData(const RemoteWriteRequest& request); | |
| std::vector<tl::expected<void, ErrorCode>> BatchReadRemoteData(const BatchRemoteReadRequest& request); | |
| std::vector<tl::expected<void, ErrorCode>> BatchWriteRemoteData(const BatchRemoteWriteRequest& request); |
| std::vector<tl::expected<void, ErrorCode>> BatchReadRemoteData(const BatchRemoteReadRequest& request); | ||
| std::vector<tl::expected<void, ErrorCode>> BatchWriteRemoteData(const BatchRemoteWriteRequest& request); | ||
| private: | ||
| rpc_client_pool client_pool_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Wait for transfer completion | ||
| bool all_completed = false; | ||
| const int max_wait_iterations = 1000; // Timeout protection | ||
| int iteration = 0; | ||
|
|
||
| while (!all_completed && iteration < max_wait_iterations) { | ||
| all_completed = true; | ||
| for (size_t i = 0; i < transfer_requests.size(); ++i) { | ||
| TransferStatus status; | ||
| Status status_result = transfer_engine_->getTransferStatus(batch_id, i, status); | ||
|
|
||
| if (!status_result.ok()) { | ||
| LOG(ERROR) << "TransferDataToRemote: Failed to get transfer status for task " << i; | ||
| transfer_engine_->freeBatchID(batch_id); | ||
| transfer_engine_->closeSegment(remote_segment_id); | ||
| return std::unexpected(ErrorCode::TRANSFER_FAIL); | ||
| } | ||
|
|
||
| if (status.s == TransferStatusEnum::FAILED || | ||
| status.s == TransferStatusEnum::CANCELED || | ||
| status.s == TransferStatusEnum::TIMEOUT) { | ||
| LOG(ERROR) << "TransferDataToRemote: Transfer failed for task " << i | ||
| << ", status: " << static_cast<int>(status.s); | ||
| transfer_engine_->freeBatchID(batch_id); | ||
| transfer_engine_->closeSegment(remote_segment_id); | ||
| return std::unexpected(ErrorCode::TRANSFER_FAIL); | ||
| } | ||
|
|
||
| if (status.s != TransferStatusEnum::COMPLETED) { | ||
| all_completed = false; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| if (!all_completed) { | ||
| std::this_thread::sleep_for(std::chrono::microseconds(100)); | ||
| ++iteration; | ||
| } | ||
| } | ||
|
|
||
| if (!all_completed) { | ||
| LOG(ERROR) << "TransferDataToRemote: Transfer timeout after " << max_wait_iterations << " iterations"; | ||
| transfer_engine_->freeBatchID(batch_id); | ||
| transfer_engine_->closeSegment(remote_segment_id); | ||
| return std::unexpected(ErrorCode::TRANSFER_FAIL); | ||
| } | ||
|
|
||
| // Cleanup | ||
| transfer_engine_->freeBatchID(batch_id); | ||
| transfer_engine_->closeSegment(remote_segment_id); | ||
|
|
||
| return {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto handle_result = tiered_backend_->Get(key); | ||
| if (!handle_result.has_value()) { | ||
| LOG(ERROR) << "ReadData: Failed to get data for key: " << key | ||
| << ", error: " << toString(handle_result.error()); | ||
| timer.LogResponse("error_code=", handle_result.error()); | ||
| return std::unexpected(ErrorCode::OBJECT_NOT_FOUND); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在这里,来自tiered_backend_->Get()的特定错误被捕获、记录,然后被替换为更通用的ErrorCode::OBJECT_NOT_FOUND。虽然记录原始错误是好的,但将特定的错误代码(handle_result.error())传播给调用者将为上游的错误处理和调试提供更详细的信息。
if (!handle_result.has_value()) {
LOG(ERROR) << "ReadData: Failed to get data for key: " << key
<< ", error: " << toString(handle_result.error());
timer.LogResponse("error_code=", handle_result.error());
return std::unexpected(handle_result.error());
}| auto handle_result = tiered_backend_->Allocate(total_size, tier_id); | ||
| if (!handle_result.has_value()) { | ||
| LOG(ERROR) << "WriteData: Failed to allocate space for key: " << key; | ||
| timer.LogResponse("error_code=", handle_result.error()); | ||
| return std::unexpected(ErrorCode::NO_AVAILABLE_HANDLE); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
与ReadData中的情况类似,这里将来自tiered_backend_->Allocate的特定错误替换为通用的ErrorCode::NO_AVAILABLE_HANDLE。为了更好的错误追溯,建议向上层传播原始的错误码handle_result.error()。
if (!handle_result.has_value()) {
LOG(ERROR) << "WriteData: Failed to allocate space for key: " << key;
timer.LogResponse("error_code=", handle_result.error());
return std::unexpected(handle_result.error());
}| } | ||
|
|
||
| // Open remote segment | ||
| SegmentHandle remote_segment_id = transfer_engine_->openSegment(remote_segment_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| // Convert uint64_t tier_id to UUID | ||
| // UUID is pair<uint64_t, uint64_t>, so we use the first part for tier_id | ||
| // TODO: This is a temporary solution. Should use proper UUID mapping or change RPC protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| coro_rpc_server rpc_server_; | ||
| ClientRpcService rpc_service_; | ||
| DataManager data_manager_; | ||
| std::map<std::string, std::unique_ptr<PeerClient>> peer_clients_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rpc_service_依赖于data_manager_,因此data_manager_应该在rpc_service_之前声明,以确保正确的初始化顺序。此外,DataManager和ClientRpcService没有默认构造函数,因此它们必须在Client构造函数的成员初始化列表中进行初始化。按依赖顺序声明可以提高代码的清晰度和安全性。
| coro_rpc_server rpc_server_; | |
| ClientRpcService rpc_service_; | |
| DataManager data_manager_; | |
| std::map<std::string, std::unique_ptr<PeerClient>> peer_clients_; | |
| coro_rpc_server rpc_server_; | |
| DataManager data_manager_; | |
| ClientRpcService rpc_service_; | |
| std::map<std::string, std::unique_ptr<PeerClient>> peer_clients_; |
Description
Type of Change
How Has This Been Tested?
Checklist