Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions docs/source/python-api-reference/mooncake-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -1004,6 +1004,91 @@ def batch_get_tensor_with_tp(self, base_keys: List[str], tp_rank: int = 0, tp_si

---

### PyTorch Tensor Operations (Zero Copy)

These methods provide direct support for storing and retrieving PyTorch tensors. They automatically handle serialization and metadata, and include built-in support for **Tensor Parallelism (TP)** by automatically splitting and reconstructing tensor shards.

⚠️ **Note**: These methods require `torch` to be installed and available in the environment.

#### get_tensor_into()

Get a PyTorch tensor from the store directly into a pre-allocated buffer.

```python
def get_tensor_with_tp(self, key: str, buffer_ptr: int, size: int) -> torch.Tensor
```

**Parameters:**

- `key` (str): Base identifier of the tensor.
- `buffer_ptr` (int): The buffer pointer pre-allocated for tensor, and the buffer should be registered.
- `size` (int): The size of buffer.

**Returns:**

- `torch.Tensor`: The retrieved tensor (or shard). Returns `None` if not found.

Comment on lines +1013 to +1030
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation has incorrect function signatures.

The section title is get_tensor_into() but the code block shows get_tensor_with_tp. Similarly, the next section batch_get_tensor() shows batch_get_tensor_with_tp in its signature. These appear to be copy-paste errors.

🔎 Apply this diff to fix the signatures:
 #### get_tensor_into()
 
 Get a PyTorch tensor from the store directly into a pre-allocated buffer.
 
 ```python
-def get_tensor_with_tp(self, key: str, buffer_ptr: int, size: int) -> torch.Tensor
+def get_tensor_into(self, key: str, buffer_ptr: int, size: int) -> torch.Tensor

```diff
-#### batch_get_tensor()
+#### batch_get_tensor_into()
 
 Get a batch of PyTorch tensor from the store directly into a pre-allocated buffer.
 
 ```python
-def batch_get_tensor_with_tp(self, base_keys: List[str], buffer_ptrs: List[int], sizes: List[int]) -> List[torch.Tensor]
+def batch_get_tensor_into(self, keys: List[str], buffer_ptrs: List[int], sizes: List[int]) -> List[torch.Tensor]
</details>


> Committable suggestion skipped: line range outside the PR's diff.

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

1023-1023: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

---

1024-1024: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

---

1025-1025: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

---

1029-1029: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In docs/source/python-api-reference/mooncake-store.md around lines 1013 to 1030,
the function signatures are incorrect: the section titled get_tensor_into()
shows def get_tensor_with_tp(...) and batch_get_tensor() shows def
batch_get_tensor_with_tp(...). Update the signatures to match the section titles
and intended parameter names: rename get_tensor_with_tp to get_tensor_into(self,
key: str, buffer_ptr: int, size: int) -> torch.Tensor, and rename
batch_get_tensor_with_tp to batch_get_tensor_into with the correct parameter
names and types (e.g. keys: List[str], buffer_ptrs: List[int], sizes: List[int])
returning List[torch.Tensor]; ensure surrounding code blocks and descriptions
remain consistent.


</details>

<!-- fingerprinting:phantom:medusa:ocelot -->

<!-- This is an auto-generated comment by CodeRabbit -->

#### batch_get_tensor()

Get a batch of PyTorch tensor from the store directly into a pre-allocated buffer.

```python
def batch_get_tensor_with_tp(self, base_keys: List[str], buffer_ptrs: List[int], sizes: List[int]) -> List[torch.Tensor]
```

**Parameters:**

- `base_keys` (List[str]): List of base identifiers.
- `buffer_ptrs` (List[int]): List of the buffers pointer pre-allocated for tensor, and the buffers should be registered.
- `sizes` (List[int]): List of the size of buffers.

**Returns:**

- `List[torch.Tensor]`: List of retrieved tensors (or shards). Contains `None` for missing keys.

#### get_tensor_into_with_tp()

Get a PyTorch tensor from the store, specifically retrieving the shard corresponding to the given Tensor Parallel rank, directly into the pre-allocated buffer.

```python
def get_tensor_with_tp(self, key: str, buffer_ptr: int, size: int, tp_rank: int = 0, tp_size: int = 1, split_dim: int = 0) -> torch.Tensor
```

**Parameters:**

- `key` (str): Base identifier of the tensor.
- `buffer_ptr` (int): The buffer pointer pre-allocated for tensor, and the buffer should be registered.
- `size` (int): The size of buffer.
- `tp_rank` (int): The tensor parallel rank to retrieve (default: 0). Fetches key `key_tp_{rank}` if `tp_size > 1`.
- `tp_size` (int): Total tensor parallel size (default: 1).
- `split_dim` (int): The dimension used during splitting (default: 0).

**Returns:**

- `torch.Tensor`: The retrieved tensor (or shard). Returns `None` if not found.

Comment on lines +1049 to +1069
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Function signature mismatch for get_tensor_into_with_tp.

The code block shows get_tensor_with_tp but should be get_tensor_into_with_tp to match the section title.

🔎 Apply this diff:
 ```python
-def get_tensor_with_tp(self, key: str, buffer_ptr: int, size: int, tp_rank: int = 0, tp_size: int = 1, split_dim: int = 0) -> torch.Tensor
+def get_tensor_into_with_tp(self, key: str, buffer_ptr: int, size: int, tp_rank: int = 0, tp_size: int = 1, split_dim: int = 0) -> torch.Tensor
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion
#### get_tensor_into_with_tp()

Get a PyTorch tensor from the store, specifically retrieving the shard corresponding to the given Tensor Parallel rank, directly into the pre-allocated buffer.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

1059-1059: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1060-1060: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1061-1061: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1062-1062: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1063-1063: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1064-1064: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1068-1068: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

🤖 Prompt for AI Agents
In docs/source/python-api-reference/mooncake-store.md around lines 1049 to 1069
the function signature in the code block is incorrectly named get_tensor_with_tp
but the section is about get_tensor_into_with_tp; update the signature in the
code block to use get_tensor_into_with_tp with the same parameters and return
type so the docs header and function name match.

#### batch_get_tensor_with_tp()

Get a batch of PyTorch tensor shards from the store for a given Tensor Parallel rank, directly into the pre-allocated buffer.

```python
def batch_get_tensor_with_tp(self, base_keys: List[str], buffer_ptrs: List[int], sizes: List[int], tp_rank: int = 0, tp_size: int = 1) -> List[torch.Tensor]
```

**Parameters:**

- `base_keys` (List[str]): List of base identifiers.
- `buffer_ptrs` (List[int]): List of the buffers pointer pre-allocated for tensor, and the buffers should be registered.
- `sizes` (List[int]): List of the size of buffers.
- `tp_rank` (int): The tensor parallel rank to retrieve (default: 0).
- `tp_size` (int): Total tensor parallel size (default: 1).

**Returns:**

- `List[torch.Tensor]`: List of retrieved tensors (or shards). Contains `None` for missing keys.

Comment on lines +1070 to +1089
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Section title should be batch_get_tensor_into_with_tp for the zero-copy API.

This section is under "PyTorch Tensor Operations (Zero Copy)" but uses batch_get_tensor_with_tp which is the non-zero-copy variant. Based on the test file patterns, the zero-copy variant should be batch_get_tensor_into_with_tp.

🔎 Apply this diff:
-#### batch_get_tensor_with_tp()
+#### batch_get_tensor_into_with_tp()
 
 Get a batch of PyTorch tensor shards from the store for a given Tensor Parallel rank, directly into the pre-allocated buffer.
 
 ```python
-def batch_get_tensor_with_tp(self, base_keys: List[str], buffer_ptrs: List[int], sizes: List[int], tp_rank: int = 0, tp_size: int = 1) -> List[torch.Tensor]
+def batch_get_tensor_into_with_tp(self, base_keys: List[str], buffer_ptrs: List[int], sizes: List[int], tp_rank: int = 0, tp_size: int = 1) -> List[torch.Tensor]
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion
#### batch_get_tensor_into_with_tp()

Get a batch of PyTorch tensor shards from the store for a given Tensor Parallel rank, directly into the pre-allocated buffer.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

1080-1080: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1081-1081: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1082-1082: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1083-1083: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1084-1084: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


1088-1088: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

🤖 Prompt for AI Agents
In docs/source/python-api-reference/mooncake-store.md around lines 1070 to 1089,
the function name documented for the zero-copy PyTorch API is incorrect (uses
batch_get_tensor_with_tp). Update the section to use the zero-copy API name
batch_get_tensor_into_with_tp: change the function signature and any occurrences
of batch_get_tensor_with_tp in this block to batch_get_tensor_into_with_tp so it
matches the zero-copy tests and section heading, keeping parameters and return
description the same.

---

### Batch Zero-Copy Operations

#### batch_put_from()
Expand Down
29 changes: 29 additions & 0 deletions mooncake-integration/integration_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,35 @@ static const std::array<ArrayCreatorFunc, 15> array_creators = {{
create_typed_array<uint8_t>, // FLOAT8_E5M2 = 14 (using uint8_t as storage)
}};

template <typename T>
py::array create_typed_array_view(char *data_ptr, size_t offset,
size_t total_length) {
return py::array_t<T>({static_cast<ssize_t>(total_length / sizeof(T))},
(T *)(data_ptr + offset), py::none());
}

static const std::array<ArrayCreatorFunc, 16> array_creators_view = {{
create_typed_array_view<float>, // FLOAT32 = 0
create_typed_array_view<double>, // FLOAT64 = 1
create_typed_array_view<int8_t>, // INT8 = 2
create_typed_array_view<uint8_t>, // UINT8 = 3
create_typed_array_view<int16_t>, // INT16 = 4
create_typed_array_view<uint16_t>, // UINT16 = 5
create_typed_array_view<int32_t>, // INT32 = 6
create_typed_array_view<uint32_t>, // UINT32 = 7
create_typed_array_view<int64_t>, // INT64 = 8
create_typed_array_view<uint64_t>, // UINT64 = 9
create_typed_array_view<bool>, // BOOL = 10
create_typed_array_view<uint16_t>, // FLOAT16 = 11 (using uint16_t as
// storage)
create_typed_array_view<uint16_t>, // BFLOAT16 = 12 (using uint16_t as
// storage)
create_typed_array_view<uint8_t>, // FLOAT8_E4M3 = 13 (using uint8_t as
// storage)
create_typed_array_view<uint8_t>, // FLOAT8_E5M2 = 14 (using uint8_t as
// storage)
}};
Comment on lines +72 to +92
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Size mismatch: array_creators_view has 16 entries but array_creators has 15.

Both arrays should map the same TensorDtype enum values (0–14), yet array_creators_view is declared with size 16 while array_creators uses size 15. This inconsistency could cause out-of-bounds access or confusion when indexing by dtype.

🔎 Apply this diff to fix the size:
-static const std::array<ArrayCreatorFunc, 16> array_creators_view = {{
+static const std::array<ArrayCreatorFunc, 15> array_creators_view = {{
     create_typed_array_view<float>,     // FLOAT32 = 0
     create_typed_array_view<double>,    // FLOAT64 = 1
     create_typed_array_view<int8_t>,    // INT8 = 2
     create_typed_array_view<uint8_t>,   // UINT8 = 3
     create_typed_array_view<int16_t>,   // INT16 = 4
     create_typed_array_view<uint16_t>,  // UINT16 = 5
     create_typed_array_view<int32_t>,   // INT32 = 6
     create_typed_array_view<uint32_t>,  // UINT32 = 7
     create_typed_array_view<int64_t>,   // INT64 = 8
     create_typed_array_view<uint64_t>,  // UINT64 = 9
     create_typed_array_view<bool>,      // BOOL = 10
     create_typed_array_view<uint16_t>,  // FLOAT16 = 11 (using uint16_t as
                                         // storage)
     create_typed_array_view<uint16_t>,  // BFLOAT16 = 12 (using uint16_t as
                                         // storage)
     create_typed_array_view<uint8_t>,   // FLOAT8_E4M3 = 13 (using uint8_t as
                                         // storage)
     create_typed_array_view<uint8_t>,   // FLOAT8_E5M2 = 14 (using uint8_t as
                                         // storage)
 }};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static const std::array<ArrayCreatorFunc, 16> array_creators_view = {{
create_typed_array_view<float>, // FLOAT32 = 0
create_typed_array_view<double>, // FLOAT64 = 1
create_typed_array_view<int8_t>, // INT8 = 2
create_typed_array_view<uint8_t>, // UINT8 = 3
create_typed_array_view<int16_t>, // INT16 = 4
create_typed_array_view<uint16_t>, // UINT16 = 5
create_typed_array_view<int32_t>, // INT32 = 6
create_typed_array_view<uint32_t>, // UINT32 = 7
create_typed_array_view<int64_t>, // INT64 = 8
create_typed_array_view<uint64_t>, // UINT64 = 9
create_typed_array_view<bool>, // BOOL = 10
create_typed_array_view<uint16_t>, // FLOAT16 = 11 (using uint16_t as
// storage)
create_typed_array_view<uint16_t>, // BFLOAT16 = 12 (using uint16_t as
// storage)
create_typed_array_view<uint8_t>, // FLOAT8_E4M3 = 13 (using uint8_t as
// storage)
create_typed_array_view<uint8_t>, // FLOAT8_E5M2 = 14 (using uint8_t as
// storage)
}};
static const std::array<ArrayCreatorFunc, 15> array_creators_view = {{
create_typed_array_view<float>, // FLOAT32 = 0
create_typed_array_view<double>, // FLOAT64 = 1
create_typed_array_view<int8_t>, // INT8 = 2
create_typed_array_view<uint8_t>, // UINT8 = 3
create_typed_array_view<int16_t>, // INT16 = 4
create_typed_array_view<uint16_t>, // UINT16 = 5
create_typed_array_view<int32_t>, // INT32 = 6
create_typed_array_view<uint32_t>, // UINT32 = 7
create_typed_array_view<int64_t>, // INT64 = 8
create_typed_array_view<uint64_t>, // UINT64 = 9
create_typed_array_view<bool>, // BOOL = 10
create_typed_array_view<uint16_t>, // FLOAT16 = 11 (using uint16_t as
// storage)
create_typed_array_view<uint16_t>, // BFLOAT16 = 12 (using uint16_t as
// storage)
create_typed_array_view<uint8_t>, // FLOAT8_E4M3 = 13 (using uint8_t as
// storage)
create_typed_array_view<uint8_t>, // FLOAT8_E5M2 = 14 (using uint8_t as
// storage)
}};
🤖 Prompt for AI Agents
In mooncake-integration/integration_utils.h around lines 72 to 92,
array_creators_view is declared with size 16 while the corresponding
array_creators has size 15; make the sizes consistent by changing the
array_creators_view declaration to std::array<ArrayCreatorFunc, 15> (or add the
missing 16th dtype entry if intended), and ensure the initializer list length
matches the array size so dtype indexing cannot go out-of-bounds.


inline TensorDtype get_tensor_dtype(py::object dtype_obj) {
if (dtype_obj.is_none()) {
return TensorDtype::UNKNOWN;
Expand Down
Loading
Loading