-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: 多模态支持 #1431
base: master
Are you sure you want to change the base?
feat: 多模态支持 #1431
Conversation
2.连接字符block的bug
Co-authored-by: Dark Litss <[email protected]>
1. 为Workflow和WorkflowBuilder添加id属性,用于唯一标识工作流 2. 增强openai和gemini适配器,支持单轮对话中的图片理解能力
1. 为Workflow和WorkflowBuilder添加id属性,用于唯一标识工作流 2. 增强openai和gemini适配器,支持单轮对话中的图片理解能力
1. 为Workflow和WorkflowBuilder添加id属性,用于唯一标识工作流 2. 增强openai和gemini适配器,支持单轮对话中的图片理解能力
# Conflicts: # kirara_ai/workflow/implementations/blocks/system/basic.py # pyproject.toml
- Implement MediaManager class for managing media files, including registration, metadata handling, and lifecycle management. - Add support for various media types (image, audio, video, file) with automatic MIME type detection. - Enhance MediaMessage class to integrate with MediaManager for seamless media registration and retrieval. - Introduce lazy loading for media data, allowing on-demand fetching of media attributes. - Create unit tests for MediaManager and MediaMessage to ensure functionality and reliability.
好的,这是翻译成中文的 pull request 总结: ## Sourcery 总结 此 pull request 增强了 LLM 聊天功能,通过添加 Gemini 适配器来支持图像理解,并包含工作流 ID。它还改进了文本处理,并将用户 ID 添加到系统提示中。 新功能: - 通过处理聊天消息中的图像 URL 和 base64 编码的图像,在 Gemini 适配器中添加对图像理解的支持。 - 将工作流 ID 添加到工作流类。 <details> <summary>Original summary in English</summary> ## Summary by Sourcery This pull request enhances the LLM chat functionality by adding support for image understanding with the Gemini adapter and including workflow IDs. It also improves text processing and adds user ID to system prompts. New Features: - Adds support for image understanding in the Gemini adapter by processing image URLs and base64 encoded images within chat messages. - Adds workflow ID to the workflow class. </details>
- Introduce MediaManager for efficient media file registration and lifecycle management, supporting various media types (image, audio, video, file). - Update MediaMessage class to integrate with MediaManager, allowing seamless media registration and retrieval. - Implement lazy loading for media data and enhance message handling to support image content in chat messages. - Add new utility functions for MIME type detection and media metadata management. - Create unit tests to ensure the reliability of the MediaManager and related functionalities.
OpenAIAdapter 与图像聊天的顺序图sequenceDiagram
participant OpenAIAdapter
participant LLMChatMessage
participant LLMChatImageContent
participant MediaManager
participant OpenAI
OpenAIAdapter->>LLMChatMessage: convert_llm_chat_message_to_openai_message(msg, media_manager)
activate LLMChatMessage
loop for each element in msg.content
LLMChatMessage->>LLMChatImageContent: isinstance(element, LLMChatImageContent)
alt is ImageContent
LLMChatMessage->>MediaManager: media = media_manager.get_media(element.media_id)
activate MediaManager
MediaManager-->>LLMChatMessage: media
deactivate MediaManager
LLMChatMessage->>media: get_url()
activate media
media->>MediaManager: get_url(media_id)
activate MediaManager
MediaManager-->>media: url
deactivate MediaManager
media-->>LLMChatMessage: url
deactivate media
LLMChatMessage->>OpenAI: {"type": "image_url", "image_url": {"url": url}}
else is TextContent
LLMChatMessage->>OpenAI: element.model_dump(mode="json")
end
end
LLMChatMessage-->>OpenAIAdapter: contents
deactivate LLMChatMessage
OpenAIAdapter->>OpenAI: POST /chat
activate OpenAI
OpenAI-->>OpenAIAdapter: response
deactivate OpenAI
MediaManager 的更新类图classDiagram
class MediaManager {
- media_dir: str
- metadata_dir: str
- files_dir: str
- metadata_cache: Dict[str, MediaMetadata]
- logger: Logger
- _pending_tasks: set
+ __init__(media_dir: str = "data/media")
+ register_media(
url: Optional[str] = None,
path: Optional[str] = None,
data: Optional[bytes] = None,
format: Optional[str] = None,
media_type: Optional[MediaType] = None,
size: Optional[int] = None,
source: Optional[str] = None,
description: Optional[str] = None,
tags: Optional[List[str]] = None,
reference_id: Optional[str] = None
) : str
+ register_from_path(
path: str,
source: Optional[str] = None,
description: Optional[str] = None,
tags: Optional[List[str]] = None,
reference_id: Optional[str] = None
) : str
+ register_from_url(
url: str,
source: Optional[str] = None,
description: Optional[str] = None,
tags: Optional[List[str]] = None,
reference_id: Optional[str] = None
) : str
+ register_from_data(
data: bytes,
format: Optional[str] = None,
source: Optional[str] = None,
description: Optional[str] = None,
tags: Optional[List[str]] = None,
reference_id: Optional[str] = None
) : str
+ add_reference(media_id: str, reference_id: str) : None
+ remove_reference(media_id: str, reference_id: str) : None
- _delete_media(media_id: str) : None
+ update_metadata(
media_id: str,
source: Optional[str] = None,
description: Optional[str] = None,
tags: Optional[List[str]] = None,
url: Optional[str] = None,
path: Optional[str] = None
) : None
+ add_tags(media_id: str, tags: List[str]) : None
+ remove_tags(media_id: str, tags: List[str]) : None
+ get_metadata(media_id: str) : Optional[MediaMetadata]
+ ensure_file_exists(media_id: str) : Optional[Path]
+ get_file_path(media_id: str) : Optional[Path]
+ get_data(media_id: str) : Optional[bytes]
+ get_url(media_id: str) : Optional[str]
+ search_by_tags(tags: List[str], match_all: bool = False) : List[str]
+ search_by_description(query: str) : List[str]
+ search_by_source(source: str) : List[str]
+ search_by_type(media_type: MediaType) : List[str]
+ get_all_media_ids() : List[str]
+ cleanup_unreferenced() : int
+ create_media_message(media_id: str) : Optional[MediaMessage]
+ get_media(media_id: str) : Optional[Media]
}
class MediaMetadata {
- media_id: str
- media_type: Optional[MediaType]
- format: Optional[str]
- size: Optional[int]
- created_at: Optional[datetime]
- source: Optional[str]
- description: Optional[str]
- tags: Optional[List[str]]
- references: Optional[Set[str]]
- url: Optional[str]
- path: Optional[str]
+ to_dict() : Dict[str, Any]
+ mime_type() : str
+ from_dict(data: Dict[str, Any]) : MediaMetadata
}
class MediaType {
IMAGE
AUDIO
VIDEO
FILE
+ from_mime(mime_type: str) : MediaType
}
MediaManager -- MediaMetadata : manages
MediaMetadata -- MediaType : type
LLMChatMessage 的更新类图classDiagram
class LLMChatMessage {
- role: Literal["system", "user", "assistant"]
- content: List[Union[LLMChatTextContent, LLMChatImageContent]]
}
class LLMChatTextContent {
- type: Literal["text"]
- text: str
}
class LLMChatImageContent {
- type: Literal["image"]
- media_id: str
}
LLMChatMessage -- LLMChatTextContent : contains
LLMChatMessage -- LLMChatImageContent : contains
文件级别更改
提示和命令与 Sourcery 互动
自定义您的体验访问您的 仪表板 以:
获取帮助Original review guide in EnglishReviewer's Guide by SourceryThis pull request introduces multi-modal support by adding a MediaManager class to handle media files and modifying LLM adapters to support image content. It also includes changes to the workflow core to support id and updates dispatch rules. Sequence diagram for registering media from URLsequenceDiagram
participant Client
participant MediaManager
participant MediaMetadata
participant FileSystem
Client->>MediaManager: register_from_url(url, source, description, tags, reference_id)
activate MediaManager
MediaManager->>MediaManager: register_media(url, source, description, tags, reference_id)
activate MediaManager
MediaManager->>MediaManager: Generate media_id
MediaManager->>MediaMetadata: Create MediaMetadata(media_id, url, source, description, tags, reference_id)
activate MediaMetadata
deactivate MediaMetadata
MediaManager->>MediaManager: _save_metadata(metadata)
MediaManager->>FileSystem: Save metadata to file
FileSystem-->>MediaManager: OK
deactivate MediaManager
MediaManager-->>Client: media_id
deactivate MediaManager
Sequence diagram for OpenAIAdapter chat with imagesequenceDiagram
participant OpenAIAdapter
participant LLMChatMessage
participant LLMChatImageContent
participant MediaManager
participant OpenAI
OpenAIAdapter->>LLMChatMessage: convert_llm_chat_message_to_openai_message(msg, media_manager)
activate LLMChatMessage
loop for each element in msg.content
LLMChatMessage->>LLMChatImageContent: isinstance(element, LLMChatImageContent)
alt is ImageContent
LLMChatMessage->>MediaManager: media = media_manager.get_media(element.media_id)
activate MediaManager
MediaManager-->>LLMChatMessage: media
deactivate MediaManager
LLMChatMessage->>media: get_url()
activate media
media->>MediaManager: get_url(media_id)
activate MediaManager
MediaManager-->>media: url
deactivate MediaManager
media-->>LLMChatMessage: url
deactivate media
LLMChatMessage->>OpenAI: {"type": "image_url", "image_url": {"url": url}}
else is TextContent
LLMChatMessage->>OpenAI: element.model_dump(mode="json")
end
end
LLMChatMessage-->>OpenAIAdapter: contents
deactivate LLMChatMessage
OpenAIAdapter->>OpenAI: POST /chat
activate OpenAI
OpenAI-->>OpenAIAdapter: response
deactivate OpenAI
Updated class diagram for MediaManagerclassDiagram
class MediaManager {
- media_dir: str
- metadata_dir: str
- files_dir: str
- metadata_cache: Dict[str, MediaMetadata]
- logger: Logger
- _pending_tasks: set
+ __init__(media_dir: str = "data/media")
+ register_media(
url: Optional[str] = None,
path: Optional[str] = None,
data: Optional[bytes] = None,
format: Optional[str] = None,
media_type: Optional[MediaType] = None,
size: Optional[int] = None,
source: Optional[str] = None,
description: Optional[str] = None,
tags: Optional[List[str]] = None,
reference_id: Optional[str] = None
) : str
+ register_from_path(
path: str,
source: Optional[str] = None,
description: Optional[str] = None,
tags: Optional[List[str]] = None,
reference_id: Optional[str] = None
) : str
+ register_from_url(
url: str,
source: Optional[str] = None,
description: Optional[str] = None,
tags: Optional[List[str]] = None,
reference_id: Optional[str] = None
) : str
+ register_from_data(
data: bytes,
format: Optional[str] = None,
source: Optional[str] = None,
description: Optional[str] = None,
tags: Optional[List[str]] = None,
reference_id: Optional[str] = None
) : str
+ add_reference(media_id: str, reference_id: str) : None
+ remove_reference(media_id: str, reference_id: str) : None
- _delete_media(media_id: str) : None
+ update_metadata(
media_id: str,
source: Optional[str] = None,
description: Optional[str] = None,
tags: Optional[List[str]] = None,
url: Optional[str] = None,
path: Optional[str] = None
) : None
+ add_tags(media_id: str, tags: List[str]) : None
+ remove_tags(media_id: str, tags: List[str]) : None
+ get_metadata(media_id: str) : Optional[MediaMetadata]
+ ensure_file_exists(media_id: str) : Optional[Path]
+ get_file_path(media_id: str) : Optional[Path]
+ get_data(media_id: str) : Optional[bytes]
+ get_url(media_id: str) : Optional[str]
+ search_by_tags(tags: List[str], match_all: bool = False) : List[str]
+ search_by_description(query: str) : List[str]
+ search_by_source(source: str) : List[str]
+ search_by_type(media_type: MediaType) : List[str]
+ get_all_media_ids() : List[str]
+ cleanup_unreferenced() : int
+ create_media_message(media_id: str) : Optional[MediaMessage]
+ get_media(media_id: str) : Optional[Media]
}
class MediaMetadata {
- media_id: str
- media_type: Optional[MediaType]
- format: Optional[str]
- size: Optional[int]
- created_at: Optional[datetime]
- source: Optional[str]
- description: Optional[str]
- tags: Optional[List[str]]
- references: Optional[Set[str]]
- url: Optional[str]
- path: Optional[str]
+ to_dict() : Dict[str, Any]
+ mime_type() : str
+ from_dict(data: Dict[str, Any]) : MediaMetadata
}
class MediaType {
IMAGE
AUDIO
VIDEO
FILE
+ from_mime(mime_type: str) : MediaType
}
MediaManager -- MediaMetadata : manages
MediaMetadata -- MediaType : type
Updated class diagram for LLMChatMessageclassDiagram
class LLMChatMessage {
- role: Literal["system", "user", "assistant"]
- content: List[Union[LLMChatTextContent, LLMChatImageContent]]
}
class LLMChatTextContent {
- type: Literal["text"]
- text: str
}
class LLMChatImageContent {
- type: Literal["image"]
- media_id: str
}
LLMChatMessage -- LLMChatTextContent : contains
LLMChatMessage -- LLMChatImageContent : contains
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
- Remove redundant file URL generation in MediaManager, focusing on data URL retrieval. - Update OpenAI adapter to include media handling for image content in chat messages, utilizing MediaManager for URL generation. - Introduce asynchronous processing for converting chat messages to OpenAI format, improving performance and responsiveness.
9d5dd81
to
d4f9feb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
牛逼!期待追问模式下的memory和block适配(分多次组装user和assistant消息)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!
- Introduce a new media management API with endpoints for uploading, retrieving, and deleting media files. - Implement media metadata models and response structures for better data handling. - Enhance MediaManager to support asynchronous file operations and improve error handling. - Add Pillow as a dependency for image processing and thumbnail generation. - Update authentication middleware to support token-based authorization from query parameters.
- Introduce MultiElementDecomposer for improved handling of multi-part messages in memory. - Update DefaultMemoryDecomposer to return a list of strings instead of a single string for better message management. - Modify DefaultMemoryComposer to support media message formatting, including media IDs in output. - Refactor ChatMessageConstructor to accommodate new memory content types and improve message construction. - Update ChatMemoryQuery to allow selection of different decomposer strategies for memory retrieval.
…mposables - Replace direct instantiation of registries in MemoryManager with dependency injection using Inject. - Update ScopeRegistry, ComposerRegistry, and DecomposerRegistry to utilize Inject for instance creation. - Introduce container attribute in MemoryComposer and MemoryDecomposer for improved dependency management. - Enhance MultiElementDecomposer to validate media resources and log warnings for invalid media IDs.
- Introduce MediaManager dependency in LLMBackendAdapter for improved media management. - Refactor LLMChatMessage to support a unified content type for text and image messages. - Update response handling in various LLM adapters to accommodate new message structures and media integration. - Enhance chat memory processing to handle multi-part messages, including images, more effectively.
- Implement threading for blocking media registration in MediaMessage, allowing for smoother execution. - Refactor _register_media method to be asynchronous, improving performance during media file registration. - Update MediaManager to support synchronous file downloads and ensure compatibility with async operations. - Modify LLM adapters to utilize async media registration methods, enhancing overall media handling in chat responses. - Remove deprecated methods in Media class for cleaner code and improved maintainability.
… and SHA1 hashing - Add SHA1 hashing for media ID generation to ensure uniqueness. - Implement robust error handling for file reading and downloading processes. - Refactor media registration logic to streamline data retrieval and type detection. - Remove deprecated lazy loading tests to focus on updated media management functionality.
…ndling - Implement a function to replace dots in URLs with full stops to avoid message filtering. - Refactor message sending logic to utilize the new URL replacement function. - Add explanatory text when URLs are modified to inform users of changes. - Introduce a retry mechanism in GeminiAdapter for robust API request handling. - Update thumbnail generation to optimize image processing and support WEBP format.
- Update LogBroadcaster to send all recent logs in a single callback instead of iterating through each log entry. - Remove error handling for individual log entry callbacks to streamline the logging process.
各平台多模态能力检查结果:
|
…andling - Implement get_base64_url method in Media class to generate base64 encoded media URLs. - Update OpenAIAdapter to utilize the new base64 URL for image content in chat messages. - Introduce URL_PATTERN utility in QQBotAdapter for improved URL handling and replacement logic. - Enhance MultiElementDecomposer to merge adjacent messages from the same role for better message management.
- Refactor OpenAIAdapter to safely extract message content and usage data from API responses. - Update media listing logic in routes.py to handle timezone-aware date comparisons for filtering media based on creation dates.
好的,这是翻译成中文的 pull request 总结:
Sourcery 总结
实现了多模态支持,引入了 MediaManager 来处理媒体文件,包括注册、存储和检索。 这包括对消息处理的更改以支持媒体内容,以及对 LLM 适配器的更新以处理 Gemini 和 OpenAI 的多模态输入。
新功能:
Original summary in English
Summary by Sourcery
Implements multi-modal support by introducing a MediaManager for handling media files, including registration, storage, and retrieval. This includes changes to message handling to support media content, and updates to LLM adapters to handle multi-modal inputs for Gemini and OpenAI.
New Features: