Skip to content

Conversation

@swallowCXY
Copy link
Owner

Description

Implemented the PeerClient class , supporting peer-to-peer RPC communication between clients for remote data read and write operations.

Main Features:

  • PeerClient: Client-side RPC client used to send RPC requests to remote peer clients
  • Connection Management: Uses coro_io::client_pools to manage connection pools, supporting multiple endpoints
  • Protocol Support: Supports TCP and RDMA protocols (configured via the MC_RPC_PROTOCOL environment variable)
  • RPC Template Methods: Implements invoke_rpc and invoke_batch_rpc template methods, providing a unified RPC call interface and error handling
  • Logging and Monitoring: Uses RpcNameTraits to provide RPC method names for easier logging and performance monitoring

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

RPC call path is tested locally with the following test cases:

  • ConnectSuccess: Tests successful connection scenarios
  • ConnectInvalidEndpoint: Tests handling of invalid endpoints
  • ConnectTwice: Tests handling of duplicate connections
  • ReadRemoteDataSuccess: Tests remote data reading
  • ReadRemoteDataKeyNotFound: Tests error handling when a key does not exist
  • ReadRemoteDataEmptyKey: Tests validation for empty key parameters
  • ReadRemoteDataEmptyBuffers: Tests validation for empty buffer parameters
  • WriteRemoteDataSuccess: Tests remote data writing
  • WriteRemoteDataEmptyKey: Test validation for writing with an empty key
  • WriteRemoteDataEmptyBuffers: Test validation for writing with empty buffers
  • BatchReadRemoteDataSuccess: Test batch reading
  • BatchReadRemoteDataKeyCountMismatch: Test batch reading with mismatched parameters
  • BatchWriteRemoteDataSuccess: Test batch writing
  • BatchWriteRemoteDataKeyCountMismatch: Test batch writing with mismatched parameters
  • OperationsWithoutConnect: Test error handling when operations are performed without connection
  • Test Environment:
  • Use real TieredBackend and DataManager for integration testing
  • Start coro_rpc_server in a separate thread to simulate the server
  • Verify RPC call paths, parameter validation, and error handling

[----------] 15 tests from PeerClientTest (1732 ms total)
[----------] Global test environment tear-down
[==========] 15 tests from 1 test suite ran. (1732 ms total)
[ PASSED ] 15 tests.

Checklist

  • I have performed a self-review of my own code.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

ccccccxy and others added 24 commits January 5, 2026 20:33
[WIP]client rpc接口定义和data_manager初步实现
同步上游tiered backend 更新 + 更新data_manager UT
stub implementation and format change
Adhere to the clang-format coding style
Added two test cases to verify that removing read locks from
DataManager::Get and ReadRemoteData is safe:
- ConcurrentGetAndDelete: Tests concurrent Get/Delete operations
- HandleKeepsDataAliveAfterDelete: Verifies handle keeps data alive after deletion

These tests validate that TieredBackend's internal locking and shared_ptr
reference counting provide sufficient thread safety.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add tests for read lock removal safety verification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants