Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 多模态支持 #1431

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

feat: 多模态支持 #1431

wants to merge 25 commits into from

Conversation

lss233
Copy link
Owner

@lss233 lss233 commented Mar 14, 2025

  • 模型厂商能力 & 接口确认
  • LLMAbility 改造
  • 媒体 API 管理接口(WebUI CRUD)
  • 媒体缓存退出机制(销毁没有意义的媒体资源)
  • 转码
  • 历史消息组装为多轮对话格式 Block

好的,这是翻译成中文的 pull request 总结:

Sourcery 总结

实现了多模态支持,引入了 MediaManager 来处理媒体文件,包括注册、存储和检索。 这包括对消息处理的更改以支持媒体内容,以及对 LLM 适配器的更新以处理 Gemini 和 OpenAI 的多模态输入。

新功能:

  • 为 LLM 适配器添加了对多模态输入的支持,允许模型处理图像和其他媒体类型。
  • 引入了 MediaManager 类来处理媒体文件的注册、存储和检索。
  • 添加了新的消息类型来处理不同的媒体类型,例如图像、音频和视频。
  • 添加了媒体缓存和清理机制,以有效管理媒体资源。
  • 添加了媒体转码功能(差异中未提供详细信息)。
Original summary in English

Summary by Sourcery

Implements multi-modal support by introducing a MediaManager for handling media files, including registration, storage, and retrieval. This includes changes to message handling to support media content, and updates to LLM adapters to handle multi-modal inputs for Gemini and OpenAI.

New Features:

  • Adds support for multi-modal inputs to LLM adapters, allowing the models to process images and other media types.
  • Introduces a MediaManager class to handle media file registration, storage, and retrieval.
  • Adds new message types for handling different media types, such as images, audio, and video.
  • Adds media caching and cleanup mechanisms to manage media resources effectively.
  • Adds media transcoding capabilities (details not provided in diff).

黄传 and others added 13 commits March 4, 2025 23:14
2.连接字符block的bug
1. 为Workflow和WorkflowBuilder添加id属性,用于唯一标识工作流
2. 增强openai和gemini适配器,支持单轮对话中的图片理解能力
1. 为Workflow和WorkflowBuilder添加id属性,用于唯一标识工作流
2. 增强openai和gemini适配器,支持单轮对话中的图片理解能力
1. 为Workflow和WorkflowBuilder添加id属性,用于唯一标识工作流
2. 增强openai和gemini适配器,支持单轮对话中的图片理解能力
# Conflicts:
#	kirara_ai/workflow/implementations/blocks/system/basic.py
#	pyproject.toml
- Implement MediaManager class for managing media files, including registration, metadata handling, and lifecycle management.
- Add support for various media types (image, audio, video, file) with automatic MIME type detection.
- Enhance MediaMessage class to integrate with MediaManager for seamless media registration and retrieval.
- Introduce lazy loading for media data, allowing on-demand fetching of media attributes.
- Create unit tests for MediaManager and MediaMessage to ensure functionality and reliability.
好的,这是翻译成中文的 pull request 总结:

## Sourcery 总结

此 pull request 增强了 LLM 聊天功能,通过添加 Gemini 适配器来支持图像理解,并包含工作流
ID。它还改进了文本处理,并将用户 ID 添加到系统提示中。

新功能:
- 通过处理聊天消息中的图像 URL 和 base64 编码的图像,在 Gemini 适配器中添加对图像理解的支持。
- 将工作流 ID 添加到工作流类。

<details>
<summary>Original summary in English</summary>

## Summary by Sourcery

This pull request enhances the LLM chat functionality by adding support
for image understanding with the Gemini adapter and including workflow
IDs. It also improves text processing and adds user ID to system
prompts.

New Features:
- Adds support for image understanding in the Gemini adapter by
processing image URLs and base64 encoded images within chat messages.
- Adds workflow ID to the workflow class.

</details>
- Introduce MediaManager for efficient media file registration and lifecycle management, supporting various media types (image, audio, video, file).
- Update MediaMessage class to integrate with MediaManager, allowing seamless media registration and retrieval.
- Implement lazy loading for media data and enhance message handling to support image content in chat messages.
- Add new utility functions for MIME type detection and media metadata management.
- Create unit tests to ensure the reliability of the MediaManager and related functionalities.
Copy link
Contributor

sourcery-ai bot commented Mar 14, 2025

## Sourcery 评审员指南

此拉取请求引入了多模态支持,通过添加一个 MediaManager 类来处理媒体文件,并修改 LLM 适配器以支持图像内容。它还包括对工作流核心的更改,以支持 id 和更新分发规则。

#### 从 URL 注册媒体的顺序图

```mermaid
sequenceDiagram
    participant Client
    participant MediaManager
    participant MediaMetadata
    participant FileSystem

    Client->>MediaManager: register_from_url(url, source, description, tags, reference_id)
    activate MediaManager
    MediaManager->>MediaManager: register_media(url, source, description, tags, reference_id)
    activate MediaManager
    MediaManager->>MediaManager: Generate media_id
    MediaManager->>MediaMetadata: Create MediaMetadata(media_id, url, source, description, tags, reference_id)
    activate MediaMetadata
    deactivate MediaMetadata
    MediaManager->>MediaManager: _save_metadata(metadata)
    MediaManager->>FileSystem: Save metadata to file
    FileSystem-->>MediaManager: OK
    deactivate MediaManager
    MediaManager-->>Client: media_id
    deactivate MediaManager

OpenAIAdapter 与图像聊天的顺序图

sequenceDiagram
    participant OpenAIAdapter
    participant LLMChatMessage
    participant LLMChatImageContent
    participant MediaManager
    participant OpenAI

    OpenAIAdapter->>LLMChatMessage: convert_llm_chat_message_to_openai_message(msg, media_manager)
    activate LLMChatMessage
    loop for each element in msg.content
        LLMChatMessage->>LLMChatImageContent: isinstance(element, LLMChatImageContent)
        alt is ImageContent
            LLMChatMessage->>MediaManager: media = media_manager.get_media(element.media_id)
            activate MediaManager
            MediaManager-->>LLMChatMessage: media
            deactivate MediaManager
            LLMChatMessage->>media: get_url()
            activate media
            media->>MediaManager: get_url(media_id)
            activate MediaManager
            MediaManager-->>media: url
            deactivate MediaManager
            media-->>LLMChatMessage: url
            deactivate media
            LLMChatMessage->>OpenAI: {"type": "image_url", "image_url": {"url": url}}
        else is TextContent
            LLMChatMessage->>OpenAI: element.model_dump(mode="json")
        end
    end
    LLMChatMessage-->>OpenAIAdapter: contents
    deactivate LLMChatMessage
    OpenAIAdapter->>OpenAI: POST /chat
    activate OpenAI
    OpenAI-->>OpenAIAdapter: response
    deactivate OpenAI
Loading

MediaManager 的更新类图

classDiagram
    class MediaManager {
        - media_dir: str
        - metadata_dir: str
        - files_dir: str
        - metadata_cache: Dict[str, MediaMetadata]
        - logger: Logger
        - _pending_tasks: set
        + __init__(media_dir: str = "data/media")
        + register_media(
            url: Optional[str] = None,
            path: Optional[str] = None,
            data: Optional[bytes] = None,
            format: Optional[str] = None,
            media_type: Optional[MediaType] = None,
            size: Optional[int] = None,
            source: Optional[str] = None,
            description: Optional[str] = None,
            tags: Optional[List[str]] = None,
            reference_id: Optional[str] = None
        ) : str
        + register_from_path(
            path: str,
            source: Optional[str] = None,
            description: Optional[str] = None,
            tags: Optional[List[str]] = None,
            reference_id: Optional[str] = None
        ) : str
        + register_from_url(
            url: str,
            source: Optional[str] = None,
            description: Optional[str] = None,
            tags: Optional[List[str]] = None,
            reference_id: Optional[str] = None
        ) : str
        + register_from_data(
            data: bytes,
            format: Optional[str] = None,
            source: Optional[str] = None,
            description: Optional[str] = None,
            tags: Optional[List[str]] = None,
            reference_id: Optional[str] = None
        ) : str
        + add_reference(media_id: str, reference_id: str) : None
        + remove_reference(media_id: str, reference_id: str) : None
        - _delete_media(media_id: str) : None
        + update_metadata(
            media_id: str,
            source: Optional[str] = None,
            description: Optional[str] = None,
            tags: Optional[List[str]] = None,
            url: Optional[str] = None,
            path: Optional[str] = None
        ) : None
        + add_tags(media_id: str, tags: List[str]) : None
        + remove_tags(media_id: str, tags: List[str]) : None
        + get_metadata(media_id: str) : Optional[MediaMetadata]
        + ensure_file_exists(media_id: str) : Optional[Path]
        + get_file_path(media_id: str) : Optional[Path]
        + get_data(media_id: str) : Optional[bytes]
        + get_url(media_id: str) : Optional[str]
        + search_by_tags(tags: List[str], match_all: bool = False) : List[str]
        + search_by_description(query: str) : List[str]
        + search_by_source(source: str) : List[str]
        + search_by_type(media_type: MediaType) : List[str]
        + get_all_media_ids() : List[str]
        + cleanup_unreferenced() : int
        + create_media_message(media_id: str) : Optional[MediaMessage]
        + get_media(media_id: str) : Optional[Media]
    }
    class MediaMetadata {
        - media_id: str
        - media_type: Optional[MediaType]
        - format: Optional[str]
        - size: Optional[int]
        - created_at: Optional[datetime]
        - source: Optional[str]
        - description: Optional[str]
        - tags: Optional[List[str]]
        - references: Optional[Set[str]]
        - url: Optional[str]
        - path: Optional[str]
        + to_dict() : Dict[str, Any]
        + mime_type() : str
        + from_dict(data: Dict[str, Any]) : MediaMetadata
    }
    class MediaType {
        IMAGE
        AUDIO
        VIDEO
        FILE
        + from_mime(mime_type: str) : MediaType
    }
    MediaManager -- MediaMetadata : manages
    MediaMetadata -- MediaType : type
Loading

LLMChatMessage 的更新类图

classDiagram
    class LLMChatMessage {
        - role: Literal["system", "user", "assistant"]
        - content: List[Union[LLMChatTextContent, LLMChatImageContent]]
    }
    class LLMChatTextContent {
        - type: Literal["text"]
        - text: str
    }
    class LLMChatImageContent {
        - type: Literal["image"]
        - media_id: str
    }
    LLMChatMessage -- LLMChatTextContent : contains
    LLMChatMessage -- LLMChatImageContent : contains
Loading

文件级别更改

变更 详情 文件
引入了 MediaManager 类来处理媒体文件注册、引用计数和生命周期管理。
  • 添加了 MediaManager 类,其中包含从 URL、路径或数据注册媒体的方法。
  • 实现了引用计数,以跟踪媒体使用情况并启用未使用的文件清理。
  • 实现了媒体文件的延迟加载,仅在需要时才下载或复制它们。
  • 添加了元数据缓存以提高性能。
  • 添加了用于文件操作的后台任务,以避免阻塞主线程。
  • 添加了按标签、描述、来源和类型搜索媒体的方法。
  • 添加了清理方法以删除未引用的媒体文件。
  • 添加了从媒体 ID 创建 MediaMessage 对象的方法。
kirara_ai/im/message.py
kirara_ai/media/manager.py
kirara_ai/media/media_object.py
kirara_ai/media/metadata.py
kirara_ai/media/utils/mime.py
kirara_ai/media/types/media_type.py
kirara_ai/media/__init__.py
kirara_ai/media/types/__init__.py
kirara_ai/media/utils/__init__.py
tests/test_media.py
examples/media_example.py
data/media/.gitignore
kirara_ai/entry.py
修改了 LLM 适配器以支持多模态内容,包括图像。
  • 更新了 GeminiAdapter 以处理聊天消息中的图像内容,将图像转换为内联数据。
  • 更新了 OpenAIAdapter 以处理聊天消息中的图像内容,将图像转换为 URL。
  • 修改了 LLMChatMessage 以支持内容元素列表,包括文本和图像。
  • 添加了 LLMChatTextContent 和 LLMChatImageContent 类来表示聊天消息中的文本和图像内容。
  • 修改了聊天块,以在 LLM 消息中包含来自用户消息的图像内容。
kirara_ai/plugins/llm_preset_adapters/gemini_adapter.py
kirara_ai/plugins/llm_preset_adapters/openai_adapter.py
kirara_ai/llm/format/message.py
kirara_ai/workflow/implementations/blocks/llm/chat.py
kirara_ai/llm/format/__init__.py
修改了工作流核心以支持 id。
  • 在 Workflow 类中添加 id 字段。
  • 在 WorkflowBuilder 类中添加 id 字段。
kirara_ai/workflow/core/workflow/base.py
kirara_ai/workflow/core/workflow/builder.py
kirara_ai/workflow/core/workflow/registry.py
更新了分发规则。
  • 在 rules.yaml 中更新 workflow id。
data/dispatch_rules/rules.yaml

提示和命令

与 Sourcery 互动

  • 触发新的审查: 在拉取请求上评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的审查评论。
  • 从审查评论生成 GitHub 问题: 通过回复审查评论,要求 Sourcery 从审查评论创建一个问题。您也可以回复审查评论并使用 @sourcery-ai issue 从中创建一个问题。
  • 生成拉取请求标题: 在拉取请求标题中的任何位置写入 @sourcery-ai 以随时生成标题。您也可以在拉取请求上评论 @sourcery-ai title 以随时(重新)生成标题。
  • 生成拉取请求摘要: 在拉取请求正文中的任何位置写入 @sourcery-ai summary 以随时在您想要的位置生成 PR 摘要。您也可以在拉取请求上评论 @sourcery-ai summary 以随时(重新)生成摘要。
  • 生成评审员指南: 在拉取请求上评论 @sourcery-ai guide 以随时(重新)生成评审员指南。
  • 解决所有 Sourcery 评论: 在拉取请求上评论 @sourcery-ai resolve 以解决所有 Sourcery 评论。如果您已经解决了所有评论并且不想再看到它们,这将非常有用。
  • 驳回所有 Sourcery 审查: 在拉取请求上评论 @sourcery-ai dismiss 以驳回所有现有的 Sourcery 审查。如果您想从新的审查开始,这将特别有用 - 不要忘记评论 @sourcery-ai review 以触发新的审查!
  • 为问题生成行动计划: 在问题上评论 @sourcery-ai plan 以为其生成行动计划。

自定义您的体验

访问您的 仪表板 以:

  • 启用或禁用审查功能,例如 Sourcery 生成的拉取请求摘要、评审员指南等。
  • 更改审查语言。
  • 添加、删除或编辑自定义审查说明。
  • 调整其他审查设置。

获取帮助

```
Original review guide in English

Reviewer's Guide by Sourcery

This pull request introduces multi-modal support by adding a MediaManager class to handle media files and modifying LLM adapters to support image content. It also includes changes to the workflow core to support id and updates dispatch rules.

Sequence diagram for registering media from URL

sequenceDiagram
    participant Client
    participant MediaManager
    participant MediaMetadata
    participant FileSystem

    Client->>MediaManager: register_from_url(url, source, description, tags, reference_id)
    activate MediaManager
    MediaManager->>MediaManager: register_media(url, source, description, tags, reference_id)
    activate MediaManager
    MediaManager->>MediaManager: Generate media_id
    MediaManager->>MediaMetadata: Create MediaMetadata(media_id, url, source, description, tags, reference_id)
    activate MediaMetadata
    deactivate MediaMetadata
    MediaManager->>MediaManager: _save_metadata(metadata)
    MediaManager->>FileSystem: Save metadata to file
    FileSystem-->>MediaManager: OK
    deactivate MediaManager
    MediaManager-->>Client: media_id
    deactivate MediaManager
Loading

Sequence diagram for OpenAIAdapter chat with image

sequenceDiagram
    participant OpenAIAdapter
    participant LLMChatMessage
    participant LLMChatImageContent
    participant MediaManager
    participant OpenAI

    OpenAIAdapter->>LLMChatMessage: convert_llm_chat_message_to_openai_message(msg, media_manager)
    activate LLMChatMessage
    loop for each element in msg.content
        LLMChatMessage->>LLMChatImageContent: isinstance(element, LLMChatImageContent)
        alt is ImageContent
            LLMChatMessage->>MediaManager: media = media_manager.get_media(element.media_id)
            activate MediaManager
            MediaManager-->>LLMChatMessage: media
            deactivate MediaManager
            LLMChatMessage->>media: get_url()
            activate media
            media->>MediaManager: get_url(media_id)
            activate MediaManager
            MediaManager-->>media: url
            deactivate MediaManager
            media-->>LLMChatMessage: url
            deactivate media
            LLMChatMessage->>OpenAI: {"type": "image_url", "image_url": {"url": url}}
        else is TextContent
            LLMChatMessage->>OpenAI: element.model_dump(mode="json")
        end
    end
    LLMChatMessage-->>OpenAIAdapter: contents
    deactivate LLMChatMessage
    OpenAIAdapter->>OpenAI: POST /chat
    activate OpenAI
    OpenAI-->>OpenAIAdapter: response
    deactivate OpenAI
Loading

Updated class diagram for MediaManager

classDiagram
    class MediaManager {
        - media_dir: str
        - metadata_dir: str
        - files_dir: str
        - metadata_cache: Dict[str, MediaMetadata]
        - logger: Logger
        - _pending_tasks: set
        + __init__(media_dir: str = "data/media")
        + register_media(
            url: Optional[str] = None,
            path: Optional[str] = None,
            data: Optional[bytes] = None,
            format: Optional[str] = None,
            media_type: Optional[MediaType] = None,
            size: Optional[int] = None,
            source: Optional[str] = None,
            description: Optional[str] = None,
            tags: Optional[List[str]] = None,
            reference_id: Optional[str] = None
        ) : str
        + register_from_path(
            path: str,
            source: Optional[str] = None,
            description: Optional[str] = None,
            tags: Optional[List[str]] = None,
            reference_id: Optional[str] = None
        ) : str
        + register_from_url(
            url: str,
            source: Optional[str] = None,
            description: Optional[str] = None,
            tags: Optional[List[str]] = None,
            reference_id: Optional[str] = None
        ) : str
        + register_from_data(
            data: bytes,
            format: Optional[str] = None,
            source: Optional[str] = None,
            description: Optional[str] = None,
            tags: Optional[List[str]] = None,
            reference_id: Optional[str] = None
        ) : str
        + add_reference(media_id: str, reference_id: str) : None
        + remove_reference(media_id: str, reference_id: str) : None
        - _delete_media(media_id: str) : None
        + update_metadata(
            media_id: str,
            source: Optional[str] = None,
            description: Optional[str] = None,
            tags: Optional[List[str]] = None,
            url: Optional[str] = None,
            path: Optional[str] = None
        ) : None
        + add_tags(media_id: str, tags: List[str]) : None
        + remove_tags(media_id: str, tags: List[str]) : None
        + get_metadata(media_id: str) : Optional[MediaMetadata]
        + ensure_file_exists(media_id: str) : Optional[Path]
        + get_file_path(media_id: str) : Optional[Path]
        + get_data(media_id: str) : Optional[bytes]
        + get_url(media_id: str) : Optional[str]
        + search_by_tags(tags: List[str], match_all: bool = False) : List[str]
        + search_by_description(query: str) : List[str]
        + search_by_source(source: str) : List[str]
        + search_by_type(media_type: MediaType) : List[str]
        + get_all_media_ids() : List[str]
        + cleanup_unreferenced() : int
        + create_media_message(media_id: str) : Optional[MediaMessage]
        + get_media(media_id: str) : Optional[Media]
    }
    class MediaMetadata {
        - media_id: str
        - media_type: Optional[MediaType]
        - format: Optional[str]
        - size: Optional[int]
        - created_at: Optional[datetime]
        - source: Optional[str]
        - description: Optional[str]
        - tags: Optional[List[str]]
        - references: Optional[Set[str]]
        - url: Optional[str]
        - path: Optional[str]
        + to_dict() : Dict[str, Any]
        + mime_type() : str
        + from_dict(data: Dict[str, Any]) : MediaMetadata
    }
    class MediaType {
        IMAGE
        AUDIO
        VIDEO
        FILE
        + from_mime(mime_type: str) : MediaType
    }
    MediaManager -- MediaMetadata : manages
    MediaMetadata -- MediaType : type
Loading

Updated class diagram for LLMChatMessage

classDiagram
    class LLMChatMessage {
        - role: Literal["system", "user", "assistant"]
        - content: List[Union[LLMChatTextContent, LLMChatImageContent]]
    }
    class LLMChatTextContent {
        - type: Literal["text"]
        - text: str
    }
    class LLMChatImageContent {
        - type: Literal["image"]
        - media_id: str
    }
    LLMChatMessage -- LLMChatTextContent : contains
    LLMChatMessage -- LLMChatImageContent : contains
Loading

File-Level Changes

Change Details Files
Introduced a MediaManager class to handle media file registration, reference counting, and lifecycle management.
  • Added MediaManager class with methods for registering media from URL, path, or data.
  • Implemented reference counting to track media usage and enable cleanup of unused files.
  • Implemented lazy loading of media files, downloading or copying them only when needed.
  • Added metadata caching to improve performance.
  • Added background tasks for file operations to avoid blocking the main thread.
  • Added methods for searching media by tags, description, source, and type.
  • Added a cleanup method to remove unreferenced media files.
  • Added a method to create MediaMessage objects from media IDs.
kirara_ai/im/message.py
kirara_ai/media/manager.py
kirara_ai/media/media_object.py
kirara_ai/media/metadata.py
kirara_ai/media/utils/mime.py
kirara_ai/media/types/media_type.py
kirara_ai/media/__init__.py
kirara_ai/media/types/__init__.py
kirara_ai/media/utils/__init__.py
tests/test_media.py
examples/media_example.py
data/media/.gitignore
kirara_ai/entry.py
Modified LLM adapters to support multi-modal content, including images.
  • Updated GeminiAdapter to handle image content in chat messages, converting images to inline data.
  • Updated OpenAIAdapter to handle image content in chat messages, converting images to URLs.
  • Modified the LLMChatMessage to support a list of content elements, including text and images.
  • Added LLMChatTextContent and LLMChatImageContent classes to represent text and image content in chat messages.
  • Modified the chat block to include image content from user messages in the LLM message.
kirara_ai/plugins/llm_preset_adapters/gemini_adapter.py
kirara_ai/plugins/llm_preset_adapters/openai_adapter.py
kirara_ai/llm/format/message.py
kirara_ai/workflow/implementations/blocks/llm/chat.py
kirara_ai/llm/format/__init__.py
Modified workflow core to support id.
  • Add id field in Workflow class.
  • Add id field in WorkflowBuilder class.
kirara_ai/workflow/core/workflow/base.py
kirara_ai/workflow/core/workflow/builder.py
kirara_ai/workflow/core/workflow/registry.py
Updated dispatch rules.
  • Update workflow id in rules.yaml.
data/dispatch_rules/rules.yaml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

- Remove redundant file URL generation in MediaManager, focusing on data URL retrieval.
- Update OpenAI adapter to include media handling for image content in chat messages, utilizing MediaManager for URL generation.
- Introduce asynchronous processing for converting chat messages to OpenAI format, improving performance and responsiveness.
@lss233 lss233 force-pushed the feature/media_api branch from 9d5dd81 to d4f9feb Compare March 14, 2025 16:42
Copy link
Contributor

@chuanSir123 chuanSir123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

牛逼!期待追问模式下的memory和block适配(分多次组装user和assistant消息)

Copy link
Contributor

@chuanSir123 chuanSir123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lss233 added 9 commits March 20, 2025 02:51
- Introduce a new media management API with endpoints for uploading, retrieving, and deleting media files.
- Implement media metadata models and response structures for better data handling.
- Enhance MediaManager to support asynchronous file operations and improve error handling.
- Add Pillow as a dependency for image processing and thumbnail generation.
- Update authentication middleware to support token-based authorization from query parameters.
- Introduce MultiElementDecomposer for improved handling of multi-part messages in memory.
- Update DefaultMemoryDecomposer to return a list of strings instead of a single string for better message management.
- Modify DefaultMemoryComposer to support media message formatting, including media IDs in output.
- Refactor ChatMessageConstructor to accommodate new memory content types and improve message construction.
- Update ChatMemoryQuery to allow selection of different decomposer strategies for memory retrieval.
…mposables

- Replace direct instantiation of registries in MemoryManager with dependency injection using Inject.
- Update ScopeRegistry, ComposerRegistry, and DecomposerRegistry to utilize Inject for instance creation.
- Introduce container attribute in MemoryComposer and MemoryDecomposer for improved dependency management.
- Enhance MultiElementDecomposer to validate media resources and log warnings for invalid media IDs.
- Introduce MediaManager dependency in LLMBackendAdapter for improved media management.
- Refactor LLMChatMessage to support a unified content type for text and image messages.
- Update response handling in various LLM adapters to accommodate new message structures and media integration.
- Enhance chat memory processing to handle multi-part messages, including images, more effectively.
- Implement threading for blocking media registration in MediaMessage, allowing for smoother execution.
- Refactor _register_media method to be asynchronous, improving performance during media file registration.
- Update MediaManager to support synchronous file downloads and ensure compatibility with async operations.
- Modify LLM adapters to utilize async media registration methods, enhancing overall media handling in chat responses.
- Remove deprecated methods in Media class for cleaner code and improved maintainability.
… and SHA1 hashing

- Add SHA1 hashing for media ID generation to ensure uniqueness.
- Implement robust error handling for file reading and downloading processes.
- Refactor media registration logic to streamline data retrieval and type detection.
- Remove deprecated lazy loading tests to focus on updated media management functionality.
…ndling

- Implement a function to replace dots in URLs with full stops to avoid message filtering.
- Refactor message sending logic to utilize the new URL replacement function.
- Add explanatory text when URLs are modified to inform users of changes.
- Introduce a retry mechanism in GeminiAdapter for robust API request handling.
- Update thumbnail generation to optimize image processing and support WEBP format.
- Update LogBroadcaster to send all recent logs in a single callback instead of iterating through each log entry.
- Remove error handling for individual log entry callbacks to streamline the logging process.
@lss233
Copy link
Owner Author

lss233 commented Mar 25, 2025

各平台多模态能力检查结果:

  • Gemini
  • Ollama
  • SiliconFlow
  • TencentCloud
  • OpenAI like
  • Claude (?)

…andling

- Implement get_base64_url method in Media class to generate base64 encoded media URLs.
- Update OpenAIAdapter to utilize the new base64 URL for image content in chat messages.
- Introduce URL_PATTERN utility in QQBotAdapter for improved URL handling and replacement logic.
- Enhance MultiElementDecomposer to merge adjacent messages from the same role for better message management.
@Haibersut Haibersut marked this pull request as ready for review March 26, 2025 14:06
- Refactor OpenAIAdapter to safely extract message content and usage data from API responses.
- Update media listing logic in routes.py to handle timezone-aware date comparisons for filtering media based on creation dates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants