daytonaio · shuimaoiko · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026
diff --git a/authors/shuimao.md b/authors/shuimao.md
@@ -0,0 +1,9 @@
+Author: Shuimao Title: AI Workflow Builder Description: Shuimao is an AI
+workflow builder and independent writer focused on practical automation,
+developer tooling, and reproducible AI workflows. He builds small systems that
+turn messy browser, code, and content tasks into reviewable processes that can
+be tested, documented, and handed off. Author Image:
+![shuimao](https://github.com/shuimaoiko.png?size=512) Author LinkedIn: Author
+Twitter: Company Name: Independent Company Description: Independent AI
+workflow builder focused on automation and developer tooling. Company Logo Dark:
+Company Logo White:
diff --git a/definitions/20260621_definition_openai_compatible_transcription_endpoint.md b/definitions/20260621_definition_openai_compatible_transcription_endpoint.md
@@ -0,0 +1,36 @@
+---
+title: 'OpenAI-compatible transcription endpoint'
+description:
+  'An OpenAI-compatible transcription endpoint accepts audio transcription
+  requests using a familiar multipart HTTP shape.'
+date: 2026-06-21
+author: 'Shuimao'
+---
+
+# OpenAI-compatible transcription endpoint
+
+## Definition
+
+An OpenAI-compatible transcription endpoint is an HTTP API that follows the
+same broad request pattern as OpenAI's audio transcription API: a client sends
+a multipart form request with an audio file, a model name, and authentication
+headers, then receives structured text in response.
+
+The phrase does not always mean every optional OpenAI field is supported. Some
+providers accept only the common core fields, while others also accept language,
+prompt, timestamp, or response-format options. Production code should follow
+the provider's own documentation instead of assuming every OpenAI-style option
+is portable.
+
+## Context and Usage
+
+For speech-to-text tooling, OpenAI-compatible transcription endpoints are useful
+because one CLI can route similar audio requests to multiple providers. A tool
+such as Sapat can keep a consistent command shape while each provider module
+handles details such as API key names, endpoint URLs, supported models, file
+size limits, and provider-specific response parsing.
+
+The safest implementation pattern is to start with the documented minimum
+request body, add aliases for common model names, and cover the request shape
+with mock tests. That gives users a predictable workflow without committing API
+keys, sample recordings, or provider-specific secrets to the repository.
diff --git a/guides/20260621_run_siliconflow_asr_with_sapat_in_daytona.md b/guides/20260621_run_siliconflow_asr_with_sapat_in_daytona.md
@@ -0,0 +1,336 @@
+---
+title: 'Run SiliconFlow ASR with Sapat in Daytona'
+description:
+  'Build a reproducible Sapat workflow for SiliconFlow SenseVoice transcription
+  inside a Daytona workspace.'
+date: 2026-06-21
+author: 'Shuimao'
+tags: ['daytona', 'speech-to-text', 'python', 'siliconflow']
+---
+
+# Run SiliconFlow ASR with Sapat in Daytona
+
+# Introduction
+
+Speech-to-text experiments often start as one-off scripts. Someone exports a
+meeting clip, tries a provider, saves a transcript, and only later discovers
+that the exact setup is hard to repeat. The API key lived in a shell history,
+the audio conversion settings were not recorded, or the provider accepted a
+slightly different multipart form than the code assumed.
+
+This guide shows how to run SiliconFlow automatic speech recognition through
+Sapat inside a Daytona workspace. Sapat provides a small Python CLI for routing
+media through speech-to-text providers. Daytona gives the workflow a clean,
+reproducible development environment. The companion Sapat provider adds a
+`siliconflow` route that sends audio to SiliconFlow's
+[OpenAI-compatible transcription endpoint](/definitions/20260621_definition_openai_compatible_transcription_endpoint.md)
+while keeping the request body aligned with SiliconFlow's documented `file` and
+`model` fields.
+
+The workflow is useful for AI engineers who want a practical transcription
+path for Mandarin, Cantonese, English, Japanese, Korean, or mixed multilingual
+recordings. SiliconFlow exposes models such as `FunAudioLLM/SenseVoiceSmall`
+and `TeleAI/TeleSpeechASR` through a simple HTTP API. SenseVoice is especially
+interesting when a team needs Chinese-language or multilingual recognition but
+still wants a lightweight CLI workflow that can be tested without shipping real
+audio or secrets in the repository.
+
+## TL;DR
+
+- Use a Daytona workspace so Python, `ffmpeg`, and Sapat are installed the same
+  way for every run.
+- Store `SILICONFLOW_API_KEY` in the workspace environment or a local ignored
+  `.env` file.
+- Use `sapat --provider siliconflow --model sensevoice` for the default
+  SiliconFlow SenseVoice path.
+- Keep short sample clips for validation before running customer recordings or
+  long meetings.
+- The companion Sapat PR includes mock tests, so the request shape can be
+  validated without a real SiliconFlow key.
+
+## How the workflow fits together
+
+![SiliconFlow ASR workflow with Sapat in Daytona](/assets/20260621_run_siliconflow_asr_with_sapat_in_daytona_workflow.svg)
+
+The flow has five parts:
+
+- Daytona creates a reproducible workspace with Python and system tools.
+- Sapat converts media into the provider's preferred audio format.
+- The SiliconFlow provider sends a multipart request with the audio file and
+  model name.
+- SiliconFlow returns transcript text.
+- Sapat writes a `.txt` file next to the source media for review, handoff, or
+  downstream processing.
+
+This separation keeps provider details out of the operator's daily command.
+The person running transcription should not need to remember the endpoint URL,
+which model names are supported, or which request fields are safe to send. That
+belongs in provider code and tests.
+
+## Prerequisites
+
+You need:
+
+- A Daytona workspace or another clean Linux development environment.
+- Python 3.8 or newer.
+- `ffmpeg`, because Sapat can convert source videos before transcription.
+- A SiliconFlow API key.
+- A short audio or video sample that you are allowed to process.
+
+If you are testing before the companion Sapat PR is merged, install from the
+fork branch:
+
+```bash
+git clone https://github.com/shuimaoiko/sapat.git
+cd sapat
+git checkout codex/siliconflow-sapat-provider
+python3 -m venv .venv
+source .venv/bin/activate
+pip install -e '.[dev]'
+```
+
+The companion provider implementation is
+[nibzard/sapat#68](https://github.com/nibzard/sapat/pull/68). After that PR is
+merged, replace the fork checkout with the normal Sapat install path.
+
+## Step 1: Create a workspace for the transcription run
+
+Start with a dedicated workspace rather than your main project folder. The goal
+is to keep setup, inputs, outputs, and validation files easy to inspect.
+
+Inside the workspace, check the tools:
+
+```bash
+python3 --version
+ffmpeg -version
+```
+
+If `ffmpeg` is missing in a Debian or Ubuntu image, install it:
+
+```bash
+sudo apt-get update
+sudo apt-get install -y ffmpeg
+```
+
+Create folders for input samples and reviewed transcripts:
+
+```bash
+mkdir -p samples reviewed
+```
+
+Use copies of test recordings in `samples/`. Do not start with private
+customer media. A good validation set has a clean clip, a noisy clip, and one
+clip with real product names or domain terms.
+
+## Step 2: Configure SiliconFlow credentials
+
+Set the API key as an environment variable:
+
+```bash
+export SILICONFLOW_API_KEY="your-siliconflow-api-key"
+```
+
+If you prefer a `.env` file, keep it local and ignored by Git:
+
+```bash
+printf 'SILICONFLOW_API_KEY=your-siliconflow-api-key\n' > .env
+```
+
+Never commit `.env`, transcripts from private recordings, or generated audio
+artifacts. The provider tests use mocks, so code review does not require real
+credentials.
+
+SiliconFlow's transcription documentation lists a bearer token in the
+`Authorization` header. The provider maps that to `SILICONFLOW_API_KEY`, then
+sends:
+
+- endpoint: `https://api.siliconflow.cn/v1/audio/transcriptions`
+- auth: `Authorization: Bearer $SILICONFLOW_API_KEY`
+- form field: `file`
+- form field: `model`
+
+The implementation intentionally avoids forwarding generic CLI fields such as
+`language`, `prompt`, or `temperature` because they are not part of the
+documented SiliconFlow transcription body. That makes the request easier to
+reason about and keeps failures focused on credentials, file limits, or model
+selection.
+
+## Step 3: Choose the model alias
+
+The provider exposes short aliases for the documented models:
+
+| Sapat model | SiliconFlow model |
+| --- | --- |
+| `sensevoice` | `FunAudioLLM/SenseVoiceSmall` |
+| `sensevoice-small` | `FunAudioLLM/SenseVoiceSmall` |
+| `teleai` | `TeleAI/TeleSpeechASR` |
+| `telespeech` | `TeleAI/TeleSpeechASR` |
+
+Start with SenseVoice:
+
+```bash
+sapat samples/demo.wav \
+  --provider siliconflow \
+  --model sensevoice \
+  --quality M
+```
+
+Use `TeleAI/TeleSpeechASR` when you specifically want to compare the second
+SiliconFlow transcription option:
+
+```bash
+sapat samples/demo.wav \
+  --provider siliconflow \
+  --model teleai \
+  --quality M
+```
+
+The input file can be audio or video. Sapat converts it to MP3 for this
+provider, sends the converted audio, writes `samples/demo.txt`, then removes
+only the temporary converted file. The companion PR also fixes a processing
+edge case so a source file that is already in the preferred audio format is not
+deleted as if it were temporary output.
+
+## Step 4: Keep the first run small
+
+SiliconFlow's documentation currently describes a maximum transcription upload
+of 50 MB and a maximum duration of one hour. Those limits are generous enough
+for many demos and voice notes, but the first run should still be short. A
+30-second sample is easier to debug than a 45-minute recording.
+
+Run one file:
+
+```bash
+sapat samples/design-review.wav \
+  --provider siliconflow \
+  --model sensevoice \
+  --quality M
+```
+
+Then inspect the output:
+
+```bash
+ls -la samples
+sed -n '1,120p' samples/design-review.txt
+```
+
+If the transcript is empty, check these items first:
+
+- `SILICONFLOW_API_KEY` is present in the same shell that runs Sapat.
+- The selected model resolves to a SiliconFlow model name.
+- The source media can be decoded by `ffmpeg`.
+- The converted file stays under the provider's file size and duration limits.
+- The account has access to the selected SiliconFlow model.
+
+## Step 5: Build a repeatable review loop
+
+Raw transcription is only the first step. A useful workflow also records how
+the transcript was produced and how it was reviewed.
+
+For every provider comparison, keep a small scorecard:
+
+```markdown
+## Transcript review
+
+- Source file: samples/design-review.wav
+- Provider: siliconflow
+- Model: sensevoice
+- Audio quality flag: M
+- Strong points:
+- Weak points:
+- Product names corrected:
+- Follow-up action:
+```
+
+Review one transcript against the audio before processing a folder. Pay special
+attention to:
+
+- product names,
+- mixed Chinese-English terms,
+- speaker names,
+- numbers and dates,
+- code identifiers,
+- places where background noise hides a word.
+
+For team workflows, store reviewed transcripts separately:
+
+```bash
+cp samples/design-review.txt reviewed/design-review.siliconflow.txt
+```
+
+That gives you an audit trail without committing private audio. If the
+transcript belongs in a public repository, remove private names and internal
+details first.
+
+## Step 6: Validate the provider without secrets
+
+The companion Sapat PR includes tests for the provider and for the processing
+edge case mentioned above. They verify that:
+
+- `SILICONFLOW_API_KEY` controls provider availability.
+- The provider sends `Authorization: Bearer ...`.
+- The endpoint is SiliconFlow's transcription URL.
+- The form body contains the resolved model name.
+- Undocumented generic fields are not sent to SiliconFlow.
+- API errors raise a clear runtime error.
+- Existing source audio is not deleted during cleanup.
+
+Run the targeted checks:
+
+```bash
+python -m pytest tests/providers/test_siliconflow.py tests/test_registry.py tests/test_process.py -q
+```
+
+Run the full test suite before opening a PR:
+
+```bash
+python -m pytest -q
+python -m black --check sapat/providers/siliconflow.py sapat/process.py tests/providers/test_siliconflow.py tests/test_process.py
+python -m compileall sapat tests/providers/test_siliconflow.py tests/test_process.py
+git diff --check
+```
+
+Mock-based tests are important here. They prove the provider's request shape
+without exposing a real API key or uploading private audio during review.
+
+## Common issues and troubleshooting
+
+**Problem:** Sapat says the provider is not available.
+
+**Solution:** Confirm `SILICONFLOW_API_KEY` is exported in the current shell.
+Provider discovery skips providers whose required environment variables are
+missing.
+
+**Problem:** The request fails with an authentication error.
+
+**Solution:** Regenerate or re-copy the SiliconFlow key, then make sure there
+are no surrounding quotes or spaces in the environment variable.
+
+**Problem:** The transcript quality changes between files.
+
+**Solution:** Compare audio quality first. Use the same `--quality` value,
+avoid clipping, and keep a validation set with known expected terms.
+
+**Problem:** A long meeting is slow or fails near the provider limits.
+
+**Solution:** Split the recording into smaller sections before transcription
+or keep the file below SiliconFlow's documented upload limits. Review section
+boundaries manually so names and decisions are not split in confusing places.
+
+## Conclusion
+
+You now have a reproducible SiliconFlow transcription path inside Daytona:
+create a workspace, install the Sapat branch, set `SILICONFLOW_API_KEY`, run
+`sapat --provider siliconflow`, and review the generated transcript. The
+provider keeps SiliconFlow-specific request details in code, while the Daytona
+workspace keeps the environment easy to recreate.
+
+This is also a good pattern for future providers. Start with the provider's
+documented minimum request, add clear model aliases, write mock tests for the
+HTTP request, and keep real recordings and credentials out of the repository.
+
+## References
+
+- [SiliconFlow Create transcription API](https://docs.siliconflow.cn/en/api-reference/audio/create-audio-transcriptions)
+- [FunAudioLLM/SenseVoiceSmall model card](https://huggingface.co/FunAudioLLM/SenseVoiceSmall)
+- [Sapat SiliconFlow provider pull request](https://github.com/nibzard/sapat/pull/68)