Skip to content

Feat : Venice audio (Text to Speech) #504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

yornfifty
Copy link
Contributor

@yornfifty yornfifty commented Apr 26, 2025

Description

Please include a summary of the changes and the related issue.

Text to speech with venice ai, support multiple voice models

af_alloy, af_aoede, af_bella, af_heart, af_jadzia, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky, am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa, bf_alice, bf_emma, bf_lily, bm_daniel, bm_fable, bm_george, bm_lewis, zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang, ff_siwis, hf_alpha, hf_beta, hm_omega, hm_psi, if_sara, im_nicola, jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo, pf_dora, pm_alex, pm_santa, ef_dora, em_alex, em_santa

Utils.s3 update

now we can use store_file_bytes to store most common file type and ability to limit the filesize

class FileType(str, Enum):
    IMAGE = "image"
    VIDEO = "video"
    AUDIO = "audio"
    PDF = "pdf"


async def store_file_bytes(
    file_bytes: bytes,
    key: str,
    file_type: FileType,
    size_limit_bytes: Optional[int] = None,
) -> str:

Type of Change

  • Bugfix
  • New Feature
  • Improvement
  • Documentation Update

Checklist

  • I have read the contributing guidelines.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

*Note

i'm not add this to models/agent_schema.json's property yet, to avoid conflic with other pr

tell me if i should add this right away

        "venice_audio": {
          "title" : "Venice Audio",
          "$ref": "../skills/venice_audio/schema.json"
        },

Related Issue

Showcase

i made a simple chat app to interact with the agent to showcase the functionality
https://crestal.s3.ap-southeast-1.amazonaws.com/local/intentkit/2025-04-27%2003-40-17.mp4

@bluntbrain
Copy link
Contributor

Hi @yornfifty , great work here!

I see these 2 minor issues, this PR can be merged as soon as these are fixed, thanks!

  1. Config class incorrectly uses TypedDict instead of inheriting from SkillConfig:
# Current problematic code:
class Config(TypedDict):
    enabled: bool
    api_key: str
    states: SkillStates

# Should be changed to:
class Config(SkillConfig):
    """Configuration for Venice Audio skills."""
    enabled: bool
    api_key: str
    states: SkillStates
  1. Missing proper base.py implementation:
# Missing required base.py file with proper inheritance pattern:
from typing import Type
from pydantic import BaseModel, Field
from abstracts.skill import SkillStoreABC
from skills.base import IntentKitSkill

class VeniceAudioBaseTool(IntentKitSkill):
    """Base class for Venice Audio tools."""
    
    name: str = Field(description="The name of the tool")
    description: str = Field(description="A description of what the tool does")
    args_schema: Type[BaseModel]
    skill_store: SkillStoreABC = Field(description="The skill store for persisting data")
    
    @property
    def category(self) -> str:
        return "venice_audio"

@bluntbrain bluntbrain requested a review from hyacinthus May 5, 2025 06:05
@yornfifty
Copy link
Contributor Author

thank you for the review, i already fix it, should i also add it to agent_schema.json?



class SkillStates(TypedDict):
af_alloy: SkillState
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a enum field to config, and get config in _arun in runtime...

Too many opinions here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just deep dive to how draft-07 schema works, and i've come to this
what do you think?

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Venice Audio Skills",
    "description": "Configuration for the Venice Audio skill.",
    "type": "object",
    "x-tags": [
        "AI",
        "Audio",
        "Text to Speech"
    ],
    "properties": {
        "enabled": {
            "type": "boolean",
            "description": "Enable or disable the Venice Audio skill.",
            "default": false
        },
        "voice_model": {
            "type": "string",
            "enum": [
                "af_heart",
                "bm_lewis",
                "custom"
            ],
            "x-enum-title": [
                "af_heart (default female)",
                "bm_lewis (default male)",
                "Custom"
            ],
            "description": "Text to speech tool",
            "default": "disabled"
        },
        "states": {
            "type": "object",
            "title": "Skill States",
            "description": "Enable/disable specific voice models. Only enable one if you want a consistent characteristic for your agent. See docs for voice details and quality grades.",
            "properties": {
                "text_to_speech": {
                    "type": "string",
                    "enum": [
                        "disabled",
                        "public",
                        "private"
                    ],
                    "x-enum-title": [
                        "Disabled",
                        "Agent Owner + All Users",
                        "Agent Owner Only"
                    ],
                    "description": "Text to speech tool",
                    "default": "disabled"
                }
            }
        },
        "api_key_provider": {
            "type": "string",
            "title": "API Key Provider",
            "description": "Provider of the API key",
            "enum": [
                "agent_owner"
            ],
            "x-enum-title": [
                "Owner Provided"
            ],
            "default": "agent_owner"
        }
    },
    "required": [
        "states",
        "enabled"
    ],
    "allOf": [
        {
            "if": {
                "allOf": [
                    {
                        "properties": {
                            "enabled": {
                                "const": true
                            }
                        }
                    },
                    {
                        "properties": {
                            "api_key_provider": {
                                "const": "agent_owner"
                            }
                        }
                    }
                ]
            },
            "then": {
                "properties": {
                    "api_key": {
                        "type": "string",
                        "title": "Venice API Key",
                        "x-link": "[Get your API key](https://venice.ai/)",
                        "x-sensitive": true,
                        "description": "API Key for authenticating with the Venice AI API."
                    }
                },
                "required": [
                    "api_key"
                ]
            }
        },
        {
            "if": {
                "properties": {
                    "voice_model": {
                        "const": "custom"
                    }
                }
            },
            "then": {
                "properties": {
                    "voice_model_custom": {
                        "type": "array",
                        "items": {
                            "type": "string"
                        },
                        "title": "Voice Model (Custom)",
                        "x-link": "[Supported Voice Model](https://docs.venice.ai/api-reference/endpoint/audio/speech#body-voice)",
                        "description": "You can add one or more custom voice models.",
                        "default": [
                            "af_heart", "bm_lewis"
                        ]
                    }
                },
                "required": [
                    "voice_model_custom"
                ]
            }
        }
    ],
    "additionalProperties": true
}

@yornfifty yornfifty requested a review from hyacinthus May 27, 2025 06:52
@yornfifty
Copy link
Contributor Author

latest test screenshot
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants