-
Notifications
You must be signed in to change notification settings - Fork 0
Pr1192 test #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Pr1192 test #6
Conversation
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Cruz Zhao <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
WalkthroughIntroduces zero-copy tensor operations for PyTorch models via new APIs on MooncakeStore. Adds Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Store as MooncakeStore
participant Buf as Pre-allocated Buffer
participant PyT as PyTorch Tensor
App->>App: Allocate buffer (via register_buffer)
App->>Store: get_tensor_into(key, buffer_ptr, size)
Note over Store: Parse metadata, resolve dtype
alt TP Mode (tp_size > 1)
Store->>Store: Transform key → key_tp_rank
Note over Store: Fetch tensor shard for rank
else Standard Mode
Note over Store: Fetch full tensor
end
Store->>Buf: Copy/stream tensor data into buffer
Store->>Store: Apply dtype view (if BFLOAT16/FLOAT8)
Store->>PyT: Reconstruct PyTorch tensor from buffer view
PyT-->>App: Return tensor (zero-copy reference)
Note over App,PyT: Batch variant: repeat for each key/buffer pair<br/>TP batch: shard keys per rank, fetch in parallel
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (4)
scripts/test_tensor_api.py (2)
45-92: Unusedrtolandatolparameters.These parameters are declared but never used. The function uses
np.array_equalfor exact comparison. If approximate comparison was intended, consider usingnp.allclose(orig_np, recv_np, rtol=rtol, atol=atol)or remove the unused parameters.🔎 Option 1: Remove unused parameters
-def verify_tensor_equality(original, received, rtol=0, atol=0, verbose=True): +def verify_tensor_equality(original, received, verbose=True):🔎 Option 2: Use parameters for approximate comparison
- if np.array_equal(orig_np, recv_np): - - return True + if rtol == 0 and atol == 0: + if np.array_equal(orig_np, recv_np): + return True + else: + if np.allclose(orig_np, recv_np, rtol=rtol, atol=atol): + return True
29-43:DTYPE_MAPappears unused in the changed code.If this is for future use, consider adding a comment explaining its purpose. Otherwise, it can be removed to avoid dead code.
mooncake-integration/store/store_py.cpp (2)
451-454: Redundant GIL acquire.At line 453,
py::gil_scoped_acquire acquire_gil;is called, but the GIL is already held at this point—thepy::gil_scoped_releasescope ended at line 449. This is harmless but adds confusion.🔎 Suggested fix:
py::list results_list; try { - py::gil_scoped_acquire acquire_gil; auto torch = torch_module();
549-573: Unusedsplit_dimparameter.The
split_dimparameter is accepted but never used in this method. The existingget_tensor_with_tp(line 267) also hassplit_dimunused, so this may be intentional for API symmetry, but it could mislead users into thinking it affects behavior.Consider either:
- Adding a comment explaining it's reserved for future use / API consistency
- Removing it if not needed
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
docs/source/python-api-reference/mooncake-store.md(1 hunks)mooncake-integration/integration_utils.h(1 hunks)mooncake-integration/store/store_py.cpp(2 hunks)mooncake-store/include/dummy_client.h(1 hunks)mooncake-store/src/dummy_client.cpp(1 hunks)scripts/test_tensor_api.py(13 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/source/python-api-reference/mooncake-store.md
1023-1023: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1024-1024: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1025-1025: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1029-1029: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1041-1041: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1042-1042: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1043-1043: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1047-1047: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1059-1059: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1060-1060: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1061-1061: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1062-1062: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1063-1063: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1064-1064: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1068-1068: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1080-1080: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1081-1081: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1082-1082: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1083-1083: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1084-1084: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1088-1088: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
🪛 Ruff (0.14.8)
scripts/test_tensor_api.py
45-45: Unused function argument: rtol
(ARG001)
45-45: Unused function argument: atol
(ARG001)
57-57: Avoid specifying long messages outside the exception class
(TRY003)
62-62: Do not catch blind exception: Exception
(BLE001)
444-444: f-string without any placeholders
Remove extraneous f prefix
(F541)
448-448: Loop control variable large_buffer not used within loop body
(B007)
448-448: Loop control variable size not used within loop body
Rename unused size to _size
(B007)
527-527: Loop control variable large_buffer not used within loop body
(B007)
527-527: Loop control variable size not used within loop body
Rename unused size to _size
(B007)
639-639: Loop control variable i not used within loop body
(B007)
709-709: Loop control variable i not used within loop body
(B007)
950-950: Do not catch blind exception: Exception
(BLE001)
959-959: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (10)
mooncake-store/include/dummy_client.h (1)
231-231: LGTM — formatting-only change.EOF newline adjustment; no functional impact.
mooncake-store/src/dummy_client.cpp (1)
758-758: LGTM — formatting-only change.EOF newline adjustment; no functional impact.
mooncake-integration/integration_utils.h (1)
65-70: LGTM — non-owning array view implementation.Using
py::none()as base correctly creates a view without ownership. The caller is responsible for ensuring the buffer outlives the returned array, which aligns with the zero-copy contract.scripts/test_tensor_api.py (7)
301-313: LGTM — proper buffer lifecycle management in TP consistency test.Buffers are registered before use and unregistered after, following the correct zero-copy workflow.
315-343: LGTM — comprehensive zero-copy get_into test.Proper buffer registration, operation, verification, and cleanup.
345-385: LGTM — efficient batch zero-copy test with single buffer allocation.Good pattern of allocating one large contiguous buffer and using offset-based addressing for individual tensors.
387-451: LGTM — good multi-rank buffer management pattern.Correctly keeps buffer objects alive until unregistration. Minor: Line 444 has an f-string without placeholders (can remove the
fprefix).
611-751: LGTM — well-structured zero-copy benchmarks.Proper buffer lifecycle management and result verification integrated into performance tests.
929-958: LGTM — extended dtype tests to cover zero-copy path.Good addition to verify that zero-copy operations preserve dtype correctly.
24-25: Large default buffer sizes (16 GiB / 8 GB).These are substantial memory requirements. Ensure the test environment has sufficient resources, or consider adding a smaller default with an option to override for high-memory environments.
| #### get_tensor_into() | ||
|
|
||
| Get a PyTorch tensor from the store directly into a pre-allocated buffer. | ||
|
|
||
| ```python | ||
| def get_tensor_with_tp(self, key: str, buffer_ptr: int, size: int) -> torch.Tensor | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| - `key` (str): Base identifier of the tensor. | ||
| - `buffer_ptr` (int): The buffer pointer pre-allocated for tensor, and the buffer should be registered. | ||
| - `size` (int): The size of buffer. | ||
|
|
||
| **Returns:** | ||
|
|
||
| - `torch.Tensor`: The retrieved tensor (or shard). Returns `None` if not found. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation has incorrect function signatures.
The section title is get_tensor_into() but the code block shows get_tensor_with_tp. Similarly, the next section batch_get_tensor() shows batch_get_tensor_with_tp in its signature. These appear to be copy-paste errors.
🔎 Apply this diff to fix the signatures:
#### get_tensor_into()
Get a PyTorch tensor from the store directly into a pre-allocated buffer.
```python
-def get_tensor_with_tp(self, key: str, buffer_ptr: int, size: int) -> torch.Tensor
+def get_tensor_into(self, key: str, buffer_ptr: int, size: int) -> torch.Tensor
```diff
-#### batch_get_tensor()
+#### batch_get_tensor_into()
Get a batch of PyTorch tensor from the store directly into a pre-allocated buffer.
```python
-def batch_get_tensor_with_tp(self, base_keys: List[str], buffer_ptrs: List[int], sizes: List[int]) -> List[torch.Tensor]
+def batch_get_tensor_into(self, keys: List[str], buffer_ptrs: List[int], sizes: List[int]) -> List[torch.Tensor]
</details>
> Committable suggestion skipped: line range outside the PR's diff.
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>
1023-1023: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
---
1024-1024: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
---
1025-1025: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
---
1029-1029: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In docs/source/python-api-reference/mooncake-store.md around lines 1013 to 1030,
the function signatures are incorrect: the section titled get_tensor_into()
shows def get_tensor_with_tp(...) and batch_get_tensor() shows def
batch_get_tensor_with_tp(...). Update the signatures to match the section titles
and intended parameter names: rename get_tensor_with_tp to get_tensor_into(self,
key: str, buffer_ptr: int, size: int) -> torch.Tensor, and rename
batch_get_tensor_with_tp to batch_get_tensor_into with the correct parameter
names and types (e.g. keys: List[str], buffer_ptrs: List[int], sizes: List[int])
returning List[torch.Tensor]; ensure surrounding code blocks and descriptions
remain consistent.
</details>
<!-- fingerprinting:phantom:medusa:ocelot -->
<!-- This is an auto-generated comment by CodeRabbit -->
| #### get_tensor_into_with_tp() | ||
|
|
||
| Get a PyTorch tensor from the store, specifically retrieving the shard corresponding to the given Tensor Parallel rank, directly into the pre-allocated buffer. | ||
|
|
||
| ```python | ||
| def get_tensor_with_tp(self, key: str, buffer_ptr: int, size: int, tp_rank: int = 0, tp_size: int = 1, split_dim: int = 0) -> torch.Tensor | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| - `key` (str): Base identifier of the tensor. | ||
| - `buffer_ptr` (int): The buffer pointer pre-allocated for tensor, and the buffer should be registered. | ||
| - `size` (int): The size of buffer. | ||
| - `tp_rank` (int): The tensor parallel rank to retrieve (default: 0). Fetches key `key_tp_{rank}` if `tp_size > 1`. | ||
| - `tp_size` (int): Total tensor parallel size (default: 1). | ||
| - `split_dim` (int): The dimension used during splitting (default: 0). | ||
|
|
||
| **Returns:** | ||
|
|
||
| - `torch.Tensor`: The retrieved tensor (or shard). Returns `None` if not found. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function signature mismatch for get_tensor_into_with_tp.
The code block shows get_tensor_with_tp but should be get_tensor_into_with_tp to match the section title.
🔎 Apply this diff:
```python
-def get_tensor_with_tp(self, key: str, buffer_ptr: int, size: int, tp_rank: int = 0, tp_size: int = 1, split_dim: int = 0) -> torch.Tensor
+def get_tensor_into_with_tp(self, key: str, buffer_ptr: int, size: int, tp_rank: int = 0, tp_size: int = 1, split_dim: int = 0) -> torch.Tensor</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
#### get_tensor_into_with_tp()
Get a PyTorch tensor from the store, specifically retrieving the shard corresponding to the given Tensor Parallel rank, directly into the pre-allocated buffer.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
1059-1059: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1060-1060: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1061-1061: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1062-1062: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1063-1063: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1064-1064: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1068-1068: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
🤖 Prompt for AI Agents
In docs/source/python-api-reference/mooncake-store.md around lines 1049 to 1069
the function signature in the code block is incorrectly named get_tensor_with_tp
but the section is about get_tensor_into_with_tp; update the signature in the
code block to use get_tensor_into_with_tp with the same parameters and return
type so the docs header and function name match.
| #### batch_get_tensor_with_tp() | ||
|
|
||
| Get a batch of PyTorch tensor shards from the store for a given Tensor Parallel rank, directly into the pre-allocated buffer. | ||
|
|
||
| ```python | ||
| def batch_get_tensor_with_tp(self, base_keys: List[str], buffer_ptrs: List[int], sizes: List[int], tp_rank: int = 0, tp_size: int = 1) -> List[torch.Tensor] | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| - `base_keys` (List[str]): List of base identifiers. | ||
| - `buffer_ptrs` (List[int]): List of the buffers pointer pre-allocated for tensor, and the buffers should be registered. | ||
| - `sizes` (List[int]): List of the size of buffers. | ||
| - `tp_rank` (int): The tensor parallel rank to retrieve (default: 0). | ||
| - `tp_size` (int): Total tensor parallel size (default: 1). | ||
|
|
||
| **Returns:** | ||
|
|
||
| - `List[torch.Tensor]`: List of retrieved tensors (or shards). Contains `None` for missing keys. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Section title should be batch_get_tensor_into_with_tp for the zero-copy API.
This section is under "PyTorch Tensor Operations (Zero Copy)" but uses batch_get_tensor_with_tp which is the non-zero-copy variant. Based on the test file patterns, the zero-copy variant should be batch_get_tensor_into_with_tp.
🔎 Apply this diff:
-#### batch_get_tensor_with_tp()
+#### batch_get_tensor_into_with_tp()
Get a batch of PyTorch tensor shards from the store for a given Tensor Parallel rank, directly into the pre-allocated buffer.
```python
-def batch_get_tensor_with_tp(self, base_keys: List[str], buffer_ptrs: List[int], sizes: List[int], tp_rank: int = 0, tp_size: int = 1) -> List[torch.Tensor]
+def batch_get_tensor_into_with_tp(self, base_keys: List[str], buffer_ptrs: List[int], sizes: List[int], tp_rank: int = 0, tp_size: int = 1) -> List[torch.Tensor]</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
#### batch_get_tensor_into_with_tp()
Get a batch of PyTorch tensor shards from the store for a given Tensor Parallel rank, directly into the pre-allocated buffer.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
1080-1080: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1081-1081: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1082-1082: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1083-1083: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1084-1084: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
1088-1088: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
🤖 Prompt for AI Agents
In docs/source/python-api-reference/mooncake-store.md around lines 1070 to 1089,
the function name documented for the zero-copy PyTorch API is incorrect (uses
batch_get_tensor_with_tp). Update the section to use the zero-copy API name
batch_get_tensor_into_with_tp: change the function signature and any occurrences
of batch_get_tensor_with_tp in this block to batch_get_tensor_into_with_tp so it
matches the zero-copy tests and section heading, keeping parameters and return
description the same.
| static const std::array<ArrayCreatorFunc, 16> array_creators_view = {{ | ||
| create_typed_array_view<float>, // FLOAT32 = 0 | ||
| create_typed_array_view<double>, // FLOAT64 = 1 | ||
| create_typed_array_view<int8_t>, // INT8 = 2 | ||
| create_typed_array_view<uint8_t>, // UINT8 = 3 | ||
| create_typed_array_view<int16_t>, // INT16 = 4 | ||
| create_typed_array_view<uint16_t>, // UINT16 = 5 | ||
| create_typed_array_view<int32_t>, // INT32 = 6 | ||
| create_typed_array_view<uint32_t>, // UINT32 = 7 | ||
| create_typed_array_view<int64_t>, // INT64 = 8 | ||
| create_typed_array_view<uint64_t>, // UINT64 = 9 | ||
| create_typed_array_view<bool>, // BOOL = 10 | ||
| create_typed_array_view<uint16_t>, // FLOAT16 = 11 (using uint16_t as | ||
| // storage) | ||
| create_typed_array_view<uint16_t>, // BFLOAT16 = 12 (using uint16_t as | ||
| // storage) | ||
| create_typed_array_view<uint8_t>, // FLOAT8_E4M3 = 13 (using uint8_t as | ||
| // storage) | ||
| create_typed_array_view<uint8_t>, // FLOAT8_E5M2 = 14 (using uint8_t as | ||
| // storage) | ||
| }}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size mismatch: array_creators_view has 16 entries but array_creators has 15.
Both arrays should map the same TensorDtype enum values (0–14), yet array_creators_view is declared with size 16 while array_creators uses size 15. This inconsistency could cause out-of-bounds access or confusion when indexing by dtype.
🔎 Apply this diff to fix the size:
-static const std::array<ArrayCreatorFunc, 16> array_creators_view = {{
+static const std::array<ArrayCreatorFunc, 15> array_creators_view = {{
create_typed_array_view<float>, // FLOAT32 = 0
create_typed_array_view<double>, // FLOAT64 = 1
create_typed_array_view<int8_t>, // INT8 = 2
create_typed_array_view<uint8_t>, // UINT8 = 3
create_typed_array_view<int16_t>, // INT16 = 4
create_typed_array_view<uint16_t>, // UINT16 = 5
create_typed_array_view<int32_t>, // INT32 = 6
create_typed_array_view<uint32_t>, // UINT32 = 7
create_typed_array_view<int64_t>, // INT64 = 8
create_typed_array_view<uint64_t>, // UINT64 = 9
create_typed_array_view<bool>, // BOOL = 10
create_typed_array_view<uint16_t>, // FLOAT16 = 11 (using uint16_t as
// storage)
create_typed_array_view<uint16_t>, // BFLOAT16 = 12 (using uint16_t as
// storage)
create_typed_array_view<uint8_t>, // FLOAT8_E4M3 = 13 (using uint8_t as
// storage)
create_typed_array_view<uint8_t>, // FLOAT8_E5M2 = 14 (using uint8_t as
// storage)
}};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static const std::array<ArrayCreatorFunc, 16> array_creators_view = {{ | |
| create_typed_array_view<float>, // FLOAT32 = 0 | |
| create_typed_array_view<double>, // FLOAT64 = 1 | |
| create_typed_array_view<int8_t>, // INT8 = 2 | |
| create_typed_array_view<uint8_t>, // UINT8 = 3 | |
| create_typed_array_view<int16_t>, // INT16 = 4 | |
| create_typed_array_view<uint16_t>, // UINT16 = 5 | |
| create_typed_array_view<int32_t>, // INT32 = 6 | |
| create_typed_array_view<uint32_t>, // UINT32 = 7 | |
| create_typed_array_view<int64_t>, // INT64 = 8 | |
| create_typed_array_view<uint64_t>, // UINT64 = 9 | |
| create_typed_array_view<bool>, // BOOL = 10 | |
| create_typed_array_view<uint16_t>, // FLOAT16 = 11 (using uint16_t as | |
| // storage) | |
| create_typed_array_view<uint16_t>, // BFLOAT16 = 12 (using uint16_t as | |
| // storage) | |
| create_typed_array_view<uint8_t>, // FLOAT8_E4M3 = 13 (using uint8_t as | |
| // storage) | |
| create_typed_array_view<uint8_t>, // FLOAT8_E5M2 = 14 (using uint8_t as | |
| // storage) | |
| }}; | |
| static const std::array<ArrayCreatorFunc, 15> array_creators_view = {{ | |
| create_typed_array_view<float>, // FLOAT32 = 0 | |
| create_typed_array_view<double>, // FLOAT64 = 1 | |
| create_typed_array_view<int8_t>, // INT8 = 2 | |
| create_typed_array_view<uint8_t>, // UINT8 = 3 | |
| create_typed_array_view<int16_t>, // INT16 = 4 | |
| create_typed_array_view<uint16_t>, // UINT16 = 5 | |
| create_typed_array_view<int32_t>, // INT32 = 6 | |
| create_typed_array_view<uint32_t>, // UINT32 = 7 | |
| create_typed_array_view<int64_t>, // INT64 = 8 | |
| create_typed_array_view<uint64_t>, // UINT64 = 9 | |
| create_typed_array_view<bool>, // BOOL = 10 | |
| create_typed_array_view<uint16_t>, // FLOAT16 = 11 (using uint16_t as | |
| // storage) | |
| create_typed_array_view<uint16_t>, // BFLOAT16 = 12 (using uint16_t as | |
| // storage) | |
| create_typed_array_view<uint8_t>, // FLOAT8_E4M3 = 13 (using uint8_t as | |
| // storage) | |
| create_typed_array_view<uint8_t>, // FLOAT8_E5M2 = 14 (using uint8_t as | |
| // storage) | |
| }}; |
🤖 Prompt for AI Agents
In mooncake-integration/integration_utils.h around lines 72 to 92,
array_creators_view is declared with size 16 while the corresponding
array_creators has size 15; make the sizes consistent by changing the
array_creators_view declaration to std::array<ArrayCreatorFunc, 15> (or add the
missing 16th dtype entry if intended), and ensure the initializer list length
matches the array size so dtype indexing cannot go out-of-bounds.
| auto total_length = store_->get_into(key, buffer, size); | ||
| if (total_length <= 0) { | ||
| py::gil_scoped_acquire acquire_gil; | ||
| return pybind11::none(); | ||
| } | ||
|
|
||
| TensorMetadata metadata; | ||
| // Copy data from buffer to contiguous memory | ||
| memcpy(&metadata, static_cast<char *>(buffer), | ||
| sizeof(TensorMetadata)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing buffer size validation before metadata read.
The code reads sizeof(TensorMetadata) bytes from the buffer at line 347, but there's no validation that the provided size parameter is at least sizeof(TensorMetadata). If the user provides a smaller buffer, this could read beyond the allocated memory.
Additionally, total_length should be checked against sizeof(TensorMetadata) (similar to batch_get_tensor_into at line 470) before the memcpy.
🔎 Suggested fix:
auto total_length = store_->get_into(key, buffer, size);
if (total_length <= 0) {
py::gil_scoped_acquire acquire_gil;
return pybind11::none();
}
+ if (total_length <= static_cast<long>(sizeof(TensorMetadata))) {
+ py::gil_scoped_acquire acquire_gil;
+ LOG(ERROR) << "Invalid data format: insufficient data for metadata";
+ return pybind11::none();
+ }
TensorMetadata metadata;
// Copy data from buffer to contiguous memory
memcpy(&metadata, static_cast<char *>(buffer),
sizeof(TensorMetadata));🤖 Prompt for AI Agents
In mooncake-integration/store/store_py.cpp around lines 339 to 348, the code
copies sizeof(TensorMetadata) bytes from the buffer without validating that the
provided size and the returned total_length are large enough; add checks to
ensure both size >= sizeof(TensorMetadata) and total_length >=
sizeof(TensorMetadata) before performing the memcpy (mirroring the pattern used
in batch_get_tensor_into at line 470), and if the checks fail acquire the GIL
and return pybind11::none() to avoid reading past the buffer.
| pybind11::list batch_get_tensor_into( | ||
| const std::vector<std::string> &keys, | ||
| const std::vector<uintptr_t> &buffer_ptrs, | ||
| const std::vector<size_t> &sizes) { | ||
| std::vector<void *> buffers; | ||
| buffers.reserve(buffer_ptrs.size()); | ||
| for (uintptr_t ptr : buffer_ptrs) { | ||
| buffers.push_back(reinterpret_cast<void *>(ptr)); | ||
| } | ||
|
|
||
| if (!is_client_initialized()) { | ||
| LOG(ERROR) << "Client is not initialized"; | ||
| py::list empty_list; | ||
| for (size_t i = 0; i < keys.size(); ++i) { | ||
| empty_list.append(to_py_ret(ErrorCode::INVALID_PARAMS)); | ||
| } | ||
| return empty_list; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing input size validation.
The method doesn't validate that keys.size(), buffer_ptrs.size(), and sizes.size() are equal. If they differ, iterating through total_lengths (which corresponds to keys) while accessing buffers[i] could cause out-of-bounds access if buffer_ptrs is shorter than keys.
🔎 Suggested fix:
pybind11::list batch_get_tensor_into(
const std::vector<std::string> &keys,
const std::vector<uintptr_t> &buffer_ptrs,
const std::vector<size_t> &sizes) {
std::vector<void *> buffers;
buffers.reserve(buffer_ptrs.size());
for (uintptr_t ptr : buffer_ptrs) {
buffers.push_back(reinterpret_cast<void *>(ptr));
}
+ if (keys.size() != buffer_ptrs.size() || keys.size() != sizes.size()) {
+ LOG(ERROR) << "Size mismatch: keys, buffer_ptrs, and sizes must have the same length";
+ py::list empty_list;
+ for (size_t i = 0; i < keys.size(); ++i) {
+ empty_list.append(to_py_ret(ErrorCode::INVALID_PARAMS));
+ }
+ return empty_list;
+ }
+
if (!is_client_initialized()) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pybind11::list batch_get_tensor_into( | |
| const std::vector<std::string> &keys, | |
| const std::vector<uintptr_t> &buffer_ptrs, | |
| const std::vector<size_t> &sizes) { | |
| std::vector<void *> buffers; | |
| buffers.reserve(buffer_ptrs.size()); | |
| for (uintptr_t ptr : buffer_ptrs) { | |
| buffers.push_back(reinterpret_cast<void *>(ptr)); | |
| } | |
| if (!is_client_initialized()) { | |
| LOG(ERROR) << "Client is not initialized"; | |
| py::list empty_list; | |
| for (size_t i = 0; i < keys.size(); ++i) { | |
| empty_list.append(to_py_ret(ErrorCode::INVALID_PARAMS)); | |
| } | |
| return empty_list; | |
| } | |
| pybind11::list batch_get_tensor_into( | |
| const std::vector<std::string> &keys, | |
| const std::vector<uintptr_t> &buffer_ptrs, | |
| const std::vector<size_t> &sizes) { | |
| std::vector<void *> buffers; | |
| buffers.reserve(buffer_ptrs.size()); | |
| for (uintptr_t ptr : buffer_ptrs) { | |
| buffers.push_back(reinterpret_cast<void *>(ptr)); | |
| } | |
| if (keys.size() != buffer_ptrs.size() || keys.size() != sizes.size()) { | |
| LOG(ERROR) << "Size mismatch: keys, buffer_ptrs, and sizes must have the same length"; | |
| py::list empty_list; | |
| for (size_t i = 0; i < keys.size(); ++i) { | |
| empty_list.append(to_py_ret(ErrorCode::INVALID_PARAMS)); | |
| } | |
| return empty_list; | |
| } | |
| if (!is_client_initialized()) { | |
| LOG(ERROR) << "Client is not initialized"; | |
| py::list empty_list; | |
| for (size_t i = 0; i < keys.size(); ++i) { | |
| empty_list.append(to_py_ret(ErrorCode::INVALID_PARAMS)); | |
| } | |
| return empty_list; | |
| } |
🤖 Prompt for AI Agents
In mooncake-integration/store/store_py.cpp around lines 414 to 431, the function
lacks validation that keys.size(), buffer_ptrs.size(), and sizes.size() are
equal which can lead to out-of-bounds access; move or add a precondition check
before converting buffer_ptrs to void* and if the sizes differ, log an error and
return a py::list sized to keys.size() populated with
to_py_ret(ErrorCode::INVALID_PARAMS) for each entry (or otherwise consistently
handle the mismatch), ensuring no buffer access occurs when counts are
inconsistent.
| } catch (const pybind11::error_already_set &e) { | ||
| LOG(ERROR) << "Failed during batch tensor deserialization: " | ||
| << e.what(); | ||
| } | ||
| return results_list; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete results on exception.
If an exception occurs mid-iteration, the method returns a partial results_list without populating entries for unprocessed keys. Callers may not be able to correlate results with input keys.
Consider appending error codes for remaining entries before returning, or document this behavior clearly in the API.
🤖 Prompt for AI Agents
In mooncake-integration/store/store_py.cpp around lines 542 to 546, the catch
block logs the pybind11 exception and returns a partially populated results_list
which breaks correlation with input keys; modify the catch handling to append an
error result for each remaining unprocessed input (e.g., push back consistent
error-code/result objects for each missing key or index) before returning so the
returned list length and ordering match the input keys, and ensure the error
entries contain enough context (error code/message) for callers to detect
failures.
| if (!is_client_initialized()) { | ||
| LOG(ERROR) << "Client is not initialized"; | ||
| py::list empty_list; | ||
| for (size_t i = 0; i < base_keys.size(); ++i) { | ||
| empty_list.append(py::none()); | ||
| } | ||
| return empty_list; | ||
| } | ||
|
|
||
| if (use_dummy_client_) { | ||
| LOG(ERROR) << "batch_get_tensor_into_with_tp is not supported for " | ||
| "dummy client"; | ||
| py::list empty_list; | ||
| for (size_t i = 0; i < base_keys.size(); ++i) { | ||
| empty_list.append(py::none()); | ||
| } | ||
| return empty_list; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent error return type.
When the client is not initialized or dummy client is used, this method returns a list of py::none() values, but batch_get_tensor_into returns a list of ErrorCode::INVALID_PARAMS integers. This inconsistency could confuse callers who need to handle errors uniformly.
Consider aligning the error handling with batch_get_tensor_into:
🔎 Suggested fix:
if (!is_client_initialized()) {
LOG(ERROR) << "Client is not initialized";
py::list empty_list;
for (size_t i = 0; i < base_keys.size(); ++i) {
- empty_list.append(py::none());
+ empty_list.append(to_py_ret(ErrorCode::INVALID_PARAMS));
}
return empty_list;
}
if (use_dummy_client_) {
LOG(ERROR) << "batch_get_tensor_into_with_tp is not supported for "
"dummy client";
py::list empty_list;
for (size_t i = 0; i < base_keys.size(); ++i) {
- empty_list.append(py::none());
+ empty_list.append(to_py_ret(ErrorCode::INVALID_PARAMS));
}
return empty_list;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!is_client_initialized()) { | |
| LOG(ERROR) << "Client is not initialized"; | |
| py::list empty_list; | |
| for (size_t i = 0; i < base_keys.size(); ++i) { | |
| empty_list.append(py::none()); | |
| } | |
| return empty_list; | |
| } | |
| if (use_dummy_client_) { | |
| LOG(ERROR) << "batch_get_tensor_into_with_tp is not supported for " | |
| "dummy client"; | |
| py::list empty_list; | |
| for (size_t i = 0; i < base_keys.size(); ++i) { | |
| empty_list.append(py::none()); | |
| } | |
| return empty_list; | |
| } | |
| if (!is_client_initialized()) { | |
| LOG(ERROR) << "Client is not initialized"; | |
| py::list empty_list; | |
| for (size_t i = 0; i < base_keys.size(); ++i) { | |
| empty_list.append(to_py_ret(ErrorCode::INVALID_PARAMS)); | |
| } | |
| return empty_list; | |
| } | |
| if (use_dummy_client_) { | |
| LOG(ERROR) << "batch_get_tensor_into_with_tp is not supported for " | |
| "dummy client"; | |
| py::list empty_list; | |
| for (size_t i = 0; i < base_keys.size(); ++i) { | |
| empty_list.append(to_py_ret(ErrorCode::INVALID_PARAMS)); | |
| } | |
| return empty_list; | |
| } |
🤖 Prompt for AI Agents
In mooncake-integration/store/store_py.cpp around lines 579 to 596, the function
returns a py::list of py::none() when the client is uninitialized or a dummy
client is used, but batch_get_tensor_into returns a list of
ErrorCode::INVALID_PARAMS integers; change these early-return branches to build
and return a py::list of integers with ErrorCode::INVALID_PARAMS (cast/convert
to Python int) for each entry in base_keys so the error return type matches
batch_get_tensor_into and callers can handle errors uniformly.
| "Get a PyTorch tensor from the store directly into a pre-allocated" | ||
| "buffer, optionally sliced for Tensor Parallelism.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo in docstrings.
Missing space in documentation strings:
- Line 1233-1234:
"pre-allocated" "buffer"→ should be"pre-allocated buffer" - Line 1246-1247:
"directly into" "pre-allocated"→ should be"directly into pre-allocated"
🔎 Suggested fix:
.def(
"get_tensor_into_with_tp",
&MooncakeStorePyWrapper::get_tensor_into_with_tp, py::arg("key"),
py::arg("buffer_ptr"), py::arg("size"), py::arg("tp_rank") = 0,
py::arg("tp_size") = 1, py::arg("split_dim") = 0,
- "Get a PyTorch tensor from the store directly into a pre-allocated"
- "buffer, optionally sliced for Tensor Parallelism.\n"
+ "Get a PyTorch tensor from the store directly into a pre-allocated "
+ "buffer, optionally sliced for Tensor Parallelism.\n"
"Args:\n"
" key: The key of the tensor.\n"
" buffer_ptr: The buffer pointer pre-allocated for tensor.\n"
" size: The size of buffer.\n"
" tp_rank: The current tensor parallel rank (default 0).\n"
" tp_size: The total tensor parallel size (default 1).\n"
" split_dim: The dimension to split the tensor along (default 0).")
.def("batch_get_tensor_into_with_tp",
&MooncakeStorePyWrapper::batch_get_tensor_into_with_tp,
py::arg("base_keys"), py::arg("buffer_ptrs"), py::arg("sizes"),
py::arg("tp_rank") = 0, py::arg("tp_size") = 1,
- "Get a batch of PyTorch tensor shards from the store directly into"
- "pre-allocated buffers for a given Tensor Parallel rank.")
+ "Get a batch of PyTorch tensor shards from the store directly into "
+ "pre-allocated buffers for a given Tensor Parallel rank.")Also applies to: 1246-1247
🤖 Prompt for AI Agents
In mooncake-integration/store/store_py.cpp around lines 1233-1234 and 1246-1247,
fix the docstring concatenation typo where two adjacent string literals lack a
separating space: replace occurrences like `"pre-allocated" "buffer"` and
`"directly into" "pre-allocated"` with a single combined literal containing the
space (e.g., `"pre-allocated buffer"` and `"directly into pre-allocated"`),
ensuring the docstrings read correctly.
| recon = torch.cat(reconstruction_parts, dim=split_dim) | ||
| self.assertTrue(torch.equal(recon, original), f"Tensor {i} final reconstruction mismatch") | ||
| self.assertTrue( | ||
| torch.equal(recon, original), | ||
| f"Tensor {i} final reconstruction mismatch" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate assertion — remove redundant check.
The same torch.equal(recon, original) assertion appears twice consecutively.
🔎 Apply this diff:
recon = torch.cat(reconstruction_parts, dim=split_dim)
self.assertTrue(torch.equal(recon, original), f"Tensor {i} final reconstruction mismatch")
- self.assertTrue(
- torch.equal(recon, original),
- f"Tensor {i} final reconstruction mismatch"
- )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| recon = torch.cat(reconstruction_parts, dim=split_dim) | |
| self.assertTrue(torch.equal(recon, original), f"Tensor {i} final reconstruction mismatch") | |
| self.assertTrue( | |
| torch.equal(recon, original), | |
| f"Tensor {i} final reconstruction mismatch" | |
| ) | |
| recon = torch.cat(reconstruction_parts, dim=split_dim) | |
| self.assertTrue(torch.equal(recon, original), f"Tensor {i} final reconstruction mismatch") |
🤖 Prompt for AI Agents
In scripts/test_tensor_api.py around lines 519 to 524, there is a duplicated
assertion calling torch.equal(recon, original) twice in a row; remove the
redundant second assertion so the reconstruction check appears only once (keep
one assertTrue with the existing message and delete the duplicate block).
Description
Type of Change
How Has This Been Tested?
Checklist
Summary by CodeRabbit
Release Notes
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.