run_memorize.py is a group chat memory storage script that reads JSON files conforming to the GroupChatFormat format and stores them item by item into the memory system via HTTP API.
- ✅ Read and validate JSON files in GroupChatFormat format
- ✅ Support both
assistantandcompanionscenarios - ✅ Automatically save conversation metadata (conversation-meta)
- ✅ Call memorize interface item by item to process messages
- ✅ Provide format validation mode
- ✅ Detailed logging output
Store memories via HTTP API (must specify scene):
python src/bootstrap.py src/run_memorize.py \
--input data/group_chat.json \
--api-url http://localhost:1995/api/v1/memories \
--scene assistantpython src/bootstrap.py src/run_memorize.py \
--input data/group_chat.json \
--api-url http://localhost:1995/api/v1/memories \
--scene companionValidate whether the input file format is correct without performing storage (no API address needed):
python src/bootstrap.py src/run_memorize.py \
--input data/group_chat.json \
--scene assistant \
--validate-only| Argument | Required | Description |
|---|---|---|
--input |
Yes | Input group chat JSON file path (GroupChatFormat format) |
--scene |
Yes | Memory extraction scenario, only supports assistant or companion |
--api-url |
No* | memorize API address (required for non-validation mode) |
--validate-only |
No | Only validate input file format, do not perform storage |
*Note: When using --validate-only, no need to provide --api-url, otherwise it's required.
The input file must conform to the GroupChatFormat specification, see data_format/group_chat/group_chat_format.py.
{
"version": "1.0.0",
"conversation_meta": {
"name": "Smart Sales Assistant Project Team",
"description": "Development discussion group for Smart Sales Assistant project",
"group_id": "group_sales_ai_2025",
"created_at": "2025-02-01T01:00:00Z",
"default_timezone": "UTC",
"user_details": {
"user_101": {
"full_name": "Alex",
"role": "Tech Lead"
},
"user_102": {
"full_name": "Betty",
"role": "Product Manager"
}
},
"tags": ["AI", "Sales", "Project Development"]
},
"conversation_list": [
{
"message_id": "msg_001",
"create_time": "2025-02-01T02:00:00Z",
"sender": "user_101",
"sender_name": "Alex",
"type": "text",
"content": "Good morning everyone, let's discuss project progress today",
"refer_list": []
}
]
}The script executes the following steps:
-
Format Validation
- Read input JSON file
- Validate whether it conforms to GroupChatFormat specification
- Output data statistics
-
Save Conversation Metadata
- Call
conversation-metainterface - Save metadata such as scene, group information, user details
- API address:
{base_url}/api/v1/conversation-meta
- Call
-
Process Messages Item by Item
- Call
memorizeinterface sequentially for each message - Each message includes: message_id, create_time, sender, content, etc.
- Automatically add group_id, group_name, scene information
- API address:
{api_url}(specified by--api-urlargument)
- Call
-
Output Results
- Display number of successfully processed messages
- Display total number of saved memories
🚀 Group Chat Memory Storage Script
======================================================================
📄 Input File: /path/to/group_chat.json
🔍 Validation Mode: No
🌐 API Address: http://localhost:1995/api/v1/memories
======================================================================
======================================================================
Validating Input File Format
======================================================================
Reading file: /path/to/group_chat.json
Validating GroupChatFormat format...
✓ Format validation passed!
=== Data Statistics ===
Format Version: 1.0.0
Group Chat Name: Smart Sales Assistant Project Team
Group Chat ID: group_sales_ai_2025
Number of Users: 5
Number of Messages: 8
Time Range: 2025-02-01T02:00:00Z ~ 2025-02-01T02:05:00Z
======================================================================
Reading Group Chat Data
======================================================================
Reading file: /path/to/group_chat.json
Using simple direct single message format, processing item by item
======================================================================
Starting to Call memorize API Item by Item
======================================================================
Group Name: Smart Sales Assistant Project Team
Group ID: group_sales_ai_2025
Number of Messages: 8
API Address: http://localhost:1995/api/v1/memories
--- Saving Conversation Metadata (conversation-meta) ---
Saving conversation metadata to: http://localhost:1995/api/v1/conversation-meta
Scene: assistant, Group ID: group_sales_ai_2025
✓ Conversation metadata saved successfully
Scene: assistant
--- Processing Message 1/8 ---
✓ Successfully saved 1 memory
--- Processing Message 2/8 ---
⏳ Waiting for episode boundary
--- Processing Message 3/8 ---
✓ Successfully saved 2 memories
--- Processing Message 4/8 ---
⏳ Waiting for episode boundary
--- Processing Message 5/8 ---
⏳ Waiting for episode boundary
--- Processing Message 6/8 ---
✓ Successfully saved 1 memory
--- Processing Message 7/8 ---
⏳ Waiting for episode boundary
--- Processing Message 8/8 ---
✓ Successfully saved 2 memories
======================================================================
Processing Complete
======================================================================
✓ Successfully Processed: 8/8 messages
✓ Total Saved: 6 memories
======================================================================
✓ Processing Complete!
======================================================================
Error: Input file does not exist: /path/to/file.json
✗ Format validation failed!
Please ensure input file conforms to GroupChatFormat specification
✗ JSON parsing failed: Expecting value: line 1 column 1 (char 0)
infra_layer.adapters.input.api.mapper.group_chat_converter: Format validationhttpx: HTTP client (async requests)core.observation.logger: Logging utilities
The script calls two API endpoints:
-
conversation-meta: Save conversation metadata
- Path:
{base_url}/api/v1/conversation-meta - Method: POST
- Data: Contains metadata such as scene, group_id, user_details
- Path:
-
memorize: Store single message memory
- Path:
{api_url}(specified by--api-urlargument) - Method: POST
- Data: Contains message_id, sender, content, scene, etc.
- Path:
- Batch Processing: Support processing multiple files in a directory
- Progress Display: Add progress bar to show processing status
- Error Retry: Add failure retry mechanism
- Concurrent Processing: Support batch concurrent API calls (note: maintain message order)
- Result Export: Export storage results as JSON file
A: bootstrap.py automatically handles:
- Python path setup
- Environment variable loading
- Dependency injection container initialization
- Mock mode support
This ensures the script runs in a complete application context.
A:
- assistant: Assistant scenario, suitable for AI assistant and user conversations
- companion: Companion scenario, suitable for AI companion interactive conversations
Different scenarios affect memory extraction strategies and storage methods. Choose based on actual application scenario.
A: The memory system uses "Episode Boundary" to determine when to form complete memory fragments.
- Not every message immediately generates a memory
- The system waits for a complete conversation episode to end before extracting memories
- This is normal processing behavior, not a failure
A: No. The current version only supports calling via HTTP API, you must provide the --api-url argument (unless using --validate-only for format validation only).
A: Check the following:
- Ensure memory service is running
- Confirm API address is correct (including port number)
- View server logs to understand detailed error information
- Confirm input data format is correct