Skip to content

Latest commit

 

History

History
199 lines (128 loc) · 4.36 KB

File metadata and controls

199 lines (128 loc) · 4.36 KB

CapabilityWorker API Reference

The CapabilityWorker is the core SDK class for all I/O inside an Ability. Access it via self.capability_worker after initializing in call().


Speaking (Text-to-Speech)

speak(text)

Converts text to speech using the Personality's default voice.

await self.capability_worker.speak("Hello! How can I help?")

text_to_speech(text, voice_id)

Converts text to speech using a specific Voice ID. Use when your Ability needs its own voice.

await self.capability_worker.text_to_speech("Welcome!", "pNInz6obpgDQGcFmaJgB")

See the Voice ID catalog for available voices.


Listening (User Input)

user_response()

Waits for the user's next input. Returns a string.

user_input = await self.capability_worker.user_response()

wait_for_complete_transcription()

Waits until the user has completely finished speaking. Use when you need the full utterance without premature cutoff.

full_input = await self.capability_worker.wait_for_complete_transcription()

Combined Speak + Listen

run_io_loop(text)

Speaks the text, then waits for a response. Returns the user's reply.

answer = await self.capability_worker.run_io_loop("What's your name?")

run_confirmation_loop(text)

Asks a yes/no question. Loops until the user confirms. Returns True or False.

confirmed = await self.capability_worker.run_confirmation_loop("Should I continue?")

Text Generation (LLM)

text_to_text_response(prompt_text, history=[], system_prompt="")

Generates a text response using the configured LLM. This is synchronous (no await).

response = self.capability_worker.text_to_text_response(
    "Explain quantum computing in one sentence."
)

With conversation history:

history = [
    {"role": "user", "content": "Tell me about dogs"},
    {"role": "assistant", "content": "Dogs are loyal companions..."},
]
response = self.capability_worker.text_to_text_response(
    "What breeds are best for apartments?",
    history=history,
)

With a system prompt:

response = self.capability_worker.text_to_text_response(
    "The user asked about cooking pasta",
    system_prompt="You are a professional Italian chef. Keep responses under 2 sentences.",
)

Audio Playback

play_audio(file_content)

Plays audio from bytes or a file-like object.

import requests
resp = requests.get("https://example.com/sound.mp3")
await self.capability_worker.play_audio(resp.content)

play_from_audio_file(file_name)

Plays an audio file from the Ability's folder.

await self.capability_worker.play_from_audio_file("alert.mp3")

Audio Streaming

For longer audio or real-time streaming:

await self.capability_worker.stream_init()
await self.capability_worker.send_audio_data_in_stream(audio_bytes, chunk_size=4096)
await self.capability_worker.stream_end()

WebSocket Communication

send_data_over_websocket(data_type, data)

Sends structured data over WebSocket. Used for music mode, DevKit actions, and custom events.

await self.capability_worker.send_data_over_websocket("music-mode", {"mode": "on"})

send_devkit_action(action)

Sends a hardware action to the DevKit.

await self.capability_worker.send_devkit_action("led-on")

Flow Control

resume_normal_flow()

You MUST call this when your Ability is done. Returns control to the Personality.

self.capability_worker.resume_normal_flow()

If you forget this, the Personality will be stuck and unresponsive.


AgentWorker Reference

Access via self.worker:

Logging

self.worker.editor_logging_handler.info("Something happened")
self.worker.editor_logging_handler.error("Something broke")
self.worker.editor_logging_handler.warning("Something looks off")

Never use print(). Always use the logging handler.

Session Tasks

self.worker.session_tasks.create(some_coroutine())   # Instead of asyncio.create_task()
await self.worker.session_tasks.sleep(2.0)            # Instead of asyncio.sleep()

Music Mode

self.worker.music_mode_event.set()    # Enter music mode
self.worker.music_mode_event.clear()  # Exit music mode