The original NeuTTS is built for researchers and developers.
NeuTTS Studio is built for everyone — especially mobile users.
The original NeuTTS by Neuphonic is an incredible open-source project — state-of-the-art text-to-speech that runs on-device. But it was built for developers who are comfortable with command-line flags and technical setups.
I asked myself: "Why should only developers get to use this?"
So I reverse-engineered the interface to create a user-friendly shell that anyone can use — no terminal expertise required. Just pick numbers from a menu and go.
| Original NeuTTS | NeuTTS Studio |
|---|---|
| Command-line flags | Interactive numbered menus |
| Manual model each run | Load once, use everywhere |
| No progress feedback | Animated progress bars with RTF stats |
| 30-second text limit | Unlimited text — auto-chunking |
| Manual audio encoding | Auto-encode + save voice profiles |
| Files save anywhere | Organized data/outputs/ folders |
| Requires developer knowledge | Anyone can use it |
Hidden cache (~/.cache) |
Models inside project folder |
⚠️ This project does NOT claim ownership of any AI model.
All TTS models, the NeuCodec audio codec, and the core inference engine are the intellectual property of Neuphonic.
| Component | Owner | License |
|---|---|---|
| NeuTTS-Nano models | Neuphonic | NeuTTS Open License 1.0 |
| NeuCodec audio codec | Neuphonic | NeuTTS Open License 1.0 |
| Core inference engine | neuphonic/neutts | See repo |
| espeak-ng phonemizer | espeak-ng | GPL v3 |
| Perth watermarking | resemble-ai/perth | MIT |
| llama.cpp GGUF backend | ggml-org/llama.cpp | MIT |
| NeuTTS Studio interface | This project | MIT |
💛 Huge thanks to the entire Neuphonic team for open-sourcing such high-quality on-device TTS and making it accessible to the community.
👨💻 My contribution: 20+ hours of debugging, reverse-engineering, and optimizing to make this work seamlessly on mobile devices — especially Android via Termux.
| Platform | Status | Requirements | Tested On |
|---|---|---|---|
| Android | ✅ Optimised | Termux + Ubuntu (via proot-distro) | Galaxy A25, S23, Pixel 7 |
| iOS | ✅ Optimised | iSH or a-Shell | iPhone 14, iPad Pro |
| Linux | ✅ Supported | Python 3.10+, build-essential | Ubuntu 22.04+, Debian, Arch |
| macOS | ✅ Supported | Python 3.10+, Xcode CLT | Intel & Apple Silicon |
| Windows | WSL2 with Ubuntu | Windows 10/11 | |
| Raspberry Pi | ✅ Supported | Raspberry Pi OS | Pi 4, Pi 5 |
| Platform | Recommended Model | Why |
|---|---|---|
| Android (High-end) 8GB+ RAM | NeuTTS-Nano Q8 GGUF |
Better quality while staying fast on devices like S23, Pixel 7 Pro |
| Android (Mid-range) 4-6GB RAM | NeuTTS-Nano Q4 GGUF |
Optimized for most phones, fastest on ARM, streaming ready |
| iOS (High-end) iPhone Pro Max / iPad Pro | NeuTTS-Nano Q8 GGUF |
Take advantage of more RAM for better quality |
| iOS (Mid-range) Standard iPhone/iPad | NeuTTS-Nano Q4 GGUF |
Smooth performance, lowest resource usage |
| Linux (High-end) 16GB+ RAM, modern CPU | NeuTTS-Nano SafeTensors |
Best quality, finetuning capable |
| Linux (Mid-range) 8-16GB RAM | NeuTTS-Nano Q8 GGUF |
Good balance of quality and speed |
| Linux (Low-end) 4-8GB RAM, older hardware | NeuTTS-Nano Q4 GGUF |
If you have limited resources |
| macOS (Apple Silicon) M1/M2/M3 | NeuTTS-Nano Q8 GGUF |
Optimized for Apple Silicon, great performance |
| macOS (Intel) | NeuTTS-Nano SafeTensors |
Works natively on Intel Macs |
| Windows (WSL2) | NeuTTS-Nano SafeTensors |
Full performance via Ubuntu WSL2 |
| Raspberry Pi 4/5 | NeuTTS-Nano Q4 GGUF |
Only model that runs smoothly on ARM SBCs |
Quick Guide:
- Q4 GGUF = Fastest, lowest memory, streaming ready — Best for mid-range mobile
- Q8 GGUF = Better quality, needs more RAM — Great for high-end mobile and Apple Silicon
- SafeTensors = Best quality, requires more RAM, finetuning capable — Best for desktops
| 🗣️ Text to Speech | 🎤 Voice Cloning |
|---|---|
| Type, paste, or load text from file | Clone any voice from 3+ seconds of audio |
| No length limit — smart auto-chunking | Save as named reusable .pt profiles |
| Live progress bar per chunk with RTF stats | Test cloned voice with any phrase |
| Merge chunks OR save individually OR both | Add language & gender metadata with flags |
Output saved to data/outputs/tts/ |
Output saved to data/outputs/cloning/ |
| ⚡ Streaming Mode | 🔧 Fine Tuning |
|---|---|
| Audio plays as it generates — no waiting | Train on your own voice data |
| Live chunk stats: duration, gen time, RTF | Interactive config builder |
| Stream to speakers only | Launch training from inside the app |
| Stream + save simultaneously | Resume from checkpoints |
Output saved to data/outputs/streaming/ |
Dataset guide built in |
The NeuTTS model has a 2048 token context window (~30 seconds per call). NeuTTS Studio solves this automatically with a 4-tier chunking strategy:
Your text (any length — sentence, page, chapter, book)
↓
┌─────────────────────────────────────────────┐
│ Tier 1 · Split at sentence endings │ . ! ?
│ Tier 2 · Split at clause boundaries │ , ; :
│ Tier 3 · Split at word boundaries │ spaces
│ Tier 4 · Hard cut at 250 characters │ last resort
└─────────────────────────────────────────────┘
↓
[chunk 1] [chunk 2] [chunk 3] ... [chunk N]
↓
Same voice applied to every single chunk
↓
All chunks stitched with smooth 200ms gaps
↓
✅ One seamless final .wav file
Example — 10,000 character input:
- Splits into ~40 chunks automatically
- Generates ~15 minutes of audio
- Zero manual intervention needed
NeuTTS-Studio/
│
├── 🚀 run.py ← Entry point — run this to start
├── ⚙️ config.py ← All settings, paths, model definitions
├── 📋 requirements.txt ← Python dependencies (platform-specific)
├── 📖 README.md ← You are here
│
├── 🧠 core/
│ ├── engine.py ← NeuTTS wrapper (model loading & inference)
│ ├── chunker.py ← Smart 4-tier text splitting system
│ ├── audio.py ← Audio stitching, saving, file management
│ ├── ui.py ← Interactive menus, colors, input prompts
│ └── progress.py ← Animated progress bars & spinners
│
├── 📦 modules/
│ ├── tts.py ← Text to Speech module
│ ├── cloning.py ← Voice Cloning module
│ ├── streaming.py ← Streaming Mode module
│ ├── finetuning.py ← Fine Tuning module
│ ├── settings.py ← Settings & model management
│ └── voice_selector.py ← Shared voice picker
│
└── 💾 data/
├── voices/ ← Your cloned voice profiles (.pt + .txt + .wav)
├── samples/ ← Built-in reference voices (.wav + .txt)
├── models/ ← Downloaded models cached here (NOT hidden)
└── outputs/
├── tts/ ← Audio from Text to Speech
├── streaming/ ← Recordings from Streaming sessions
└── cloning/ ← Test audio from Voice Cloning
All platforms require Python 3.10 or higher. The installation steps are platform-specific due to different PyTorch requirements:
| Platform | PyTorch Setup |
|---|---|
| Android / iOS / Raspberry Pi (ARM) | CPU-only PyTorch (no CUDA) |
| Linux / Windows WSL2 (x86_64) | Full PyTorch with CUDA (if GPU available) |
| macOS (Apple Silicon) | Native Metal-optimized PyTorch |
| macOS (Intel) | Standard PyTorch |
The requirements.txt file includes a commented line for ARM devices. Simply uncomment it before installation if you're on ARM.
Default Termux uses its own package system (pkg) which is missing many packages required by NeuTTS Studio such as libopenblas-dev, portaudio19-dev, pkg-config, cmake and more.
You MUST set up a full Ubuntu environment inside Termux first. This gives you access to the complete apt-get ecosystem.
Install Termux from F-Droid (NOT the Play Store — F-Droid version is actively maintained):
https://f-droid.org/packages/com.termux/
Open Termux and run:
# Update Termux base packages
pkg update && pkg upgrade -y
# Install proot-distro — the Ubuntu manager for Termux
pkg install proot-distro -y
# Install Ubuntu
proot-distro install ubuntu
# Enter Ubuntu environment
proot-distro login ubuntuYour prompt will change to:
root@localhost:~#
You are now inside a full Ubuntu environment with complete apt-get access.
💡 Every time you open Termux, you must re-enter Ubuntu before using NeuTTS Studio:
proot-distro login ubuntu
Create a shortcut so you never forget:
# Run this in regular Termux (NOT inside Ubuntu)
echo "alias ubuntu='proot-distro login ubuntu'" >> ~/.bashrc
source ~/.bashrc
# Now just type this to enter Ubuntu anytime:
ubuntuapt-get update && apt-get upgrade -yapt-get install python3 python3-pip python3-venv -y
python3 --version # Must be 3.10+apt-get install espeak-ng -y
espeak-ng --version # Must be 1.52+apt-get install build-essential cmake git pkg-config -yapt-get install libopenblas-dev -yapt-get install portaudio19-dev -yapt-get install ffmpeg -ypython3 -m venv ai-env
source ai-env/bin/activate💡 Always run
source ai-env/bin/activatebefore using the app.
git clone https://github.com/fardinsabid/NeuTTS-Studio.git
cd NeuTTS-Studio# Edit requirements.txt to uncomment the ARM PyTorch line
sed -i 's/^# --index-url/--index-url/' requirements.txt
# Install all dependencies
pip install -r requirements.txtCMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" \
pip install "neutts[llama]" --force-reinstallDownload from original NeuTTS samples and copy into data/samples/:
data/samples/
├── dave.wav + dave.txt ← English male
├── jo.wav + jo.txt ← English female
├── mateo.wav + mateo.txt ← Spanish male
├── greta.wav + greta.txt ← German female
└── juliette.wav + juliette.txt ← French female
python run.pySearch: iSH Shell → Download → Open
apk update && apk upgrade
apk add python3 py3-pip cmake build-base git pkgconfig
apk add espeak-ng espeak-ng-dev
apk add portaudio-dev
apk add openblas-dev
apk add ffmpeggit clone https://github.com/fardinsabid/NeuTTS-Studio.git
cd NeuTTS-Studio
# Create virtual environment
python3 -m venv ai-env
source ai-env/bin/activate
# Edit requirements.txt for ARM
sed -i 's/^# --index-url/--index-url/' requirements.txt
# Install dependencies
pip install -r requirements.txt
# Install llama-cpp-python with OpenBLAS
CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" \
pip install "neutts[llama]" --force-reinstallpython run.py# Install system dependencies
sudo apt update
sudo apt install python3 python3-pip python3-venv espeak-ng \
build-essential cmake git pkg-config libopenblas-dev portaudio19-dev \
ffmpeg -y
# Clone and setup
git clone https://github.com/fardinsabid/NeuTTS-Studio.git
cd NeuTTS-Studio
python3 -m venv ai-env
source ai-env/bin/activate
# Install dependencies (keep --index-url line commented)
pip install -r requirements.txt
# Optional: For better performance on Linux with NVIDIA GPU
pip install "neutts[llama]" --force-reinstall
# Launch
python run.py# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install dependencies
brew install python3 espeak-ng cmake pkg-config openblas portaudio ffmpeg
# Clone and setup
git clone https://github.com/fardinsabid/NeuTTS-Studio.git
cd NeuTTS-Studio
python3 -m venv ai-env
source ai-env/bin/activate
# Install dependencies (keep --index-url line commented)
pip install -r requirements.txt
# Launch
python run.py# Same steps as Apple Silicon above
# PyTorch will install standard x86_64 version# In PowerShell (Admin)
wsl --install -d Ubuntu
# Restart your computer when prompted
# Open Ubuntu WSL terminal
# Follow Linux installation steps abovesudo apt update
sudo apt install python3 python3-pip python3-venv espeak-ng \
build-essential cmake git pkg-config libopenblas-dev portaudio19-dev \
ffmpeg -y
git clone https://github.com/fardinsabid/NeuTTS-Studio.git
cd NeuTTS-Studio
python3 -m venv ai-env
source ai-env/bin/activate
# Edit requirements.txt for ARM
sed -i 's/^# --index-url/--index-url/' requirements.txt
# Install dependencies
pip install -r requirements.txt
# CRITICAL for Raspberry Pi ARM
CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" \
pip install "neutts[llama]" --force-reinstall
# Launch
python run.py╔══════════════════════════════════════════════════════════════╗
║ Main Menu
╚══════════════════════════════════════════════════════════════╝
[1] 🗣️ Text to Speech — convert text to audio with chunking
[2] 🎤 Voice Cloning — clone & manage voice profiles
[3] ⚡ Streaming Mode — real-time audio generation
[4] 🔧 Fine Tuning — train on custom voice data
[5] ⚙️ Settings — load model, manage outputs
[0] Exit
────────────────────────────────────────────────────────
[0] ← Back
Select ❯
- Select
[1] Text to Speech - Choose input mode:
[1] Single line— short sentences[2] Multi-paragraph— paste long text (Enter twice to finish)[3] Load from .txt file— read from file
- Preview chunk breakdown
- Pick a voice (sample or your cloned voice)
- Choose output format:
[1] Merged single file— one seamless audio[2] Individual chunk files— per-chunk files[3] Both— everything!
- Watch real-time progress:
Generating [████████████████████████████] 100.0% [1/1] 12.5s ETA: 0.0s
✓ Generated 2.44s audio in 12.5s · RTF 5.1
- Audio saved to
data/outputs/tts/
- Record 3–15 seconds of clear speech on your phone
- Convert to WAV if needed:
ffmpeg -i recording.m4a -ar 16000 -ac 1 -sample_fmt s16 voice.wav- Select
[2] Voice Cloning → [1] Clone new voice - Provide:
- Path to WAV file
- Exact transcript (word-for-word)
- Voice name
- Language (with flag support! 🇧🇩)
- Gender
- Watch encoding progress:
Loading encoder model...
✓ Encoder loaded in 16.1s
Loading audio file...
✓ Audio loaded: 32000Hz, 8.9s in 10.1s
Encoding voice...
✓ Encoding complete in 207.9s
- Test immediately with
[3] Test a voice
- Select
[3] Streaming Mode(GGUF model required) - Choose mode:
[1] Stream to speakers— live playback[2] Stream and save— generate + save[3] Stream, play, and save— both!
- Type your text
- Watch real-time chunk stats:
[01] TTFA 512ms audio gen 920ms ✅ 55% RT
[02] 480ms audio gen 460ms ✅ 96% RT
[03] 495ms audio gen 480ms ✅ 97% RT
NeuTTS requires .wav format. Use ffmpeg for conversion:
# Universal command — works for ALL formats
ffmpeg -i input_file.m4a -ar 16000 -ac 1 -sample_fmt s16 output.wav
# Examples:
ffmpeg -i recording.mp3 -ar 16000 -ac 1 voice.wav
ffmpeg -i audio.ogg -ar 16000 -ac 1 voice.wav
ffmpeg -i sound.aac -ar 16000 -ac 1 voice.wav
ffmpeg -i music.flac -ar 16000 -ac 1 voice.wavWhat each flag means:
| Flag | Meaning | Why |
|---|---|---|
-i input.m4a |
Input file | Your original recording |
-ar 16000 |
Sample rate = 16kHz | What NeuTTS expects |
-ac 1 |
Mono channel | Single speaker, no stereo |
-sample_fmt s16 |
16-bit PCM | Standard WAV format |
output.wav |
Output filename | The file for NeuTTS |
Check your converted file:
ffprobe output.wav
# Should show: Audio: pcm_s16le, 16000 Hz, monoTrim to optimal length (3-15 seconds):
# Trim from 0s to 10s
ffmpeg -i output.wav -ss 0 -t 10 trimmed.wavCause: proot-distro not installed in Termux.
# In regular Termux (NOT Ubuntu)
pkg update && pkg install proot-distro -yCause: Ubuntu installation corrupted.
proot-distro remove ubuntu
proot-distro install ubuntu
proot-distro login ubuntupkg= Termux (outside Ubuntu)apt-get= Ubuntu (inside proot-distro)
# ✅ Correct — in regular Termux
pkg install proot-distro
# ✅ Correct — inside Ubuntu
apt-get install python3
# ❌ Wrong — apt-get in regular Termux
# ❌ Wrong — pkg inside Ubuntu# In regular Termux (NOT Ubuntu) first
termux-setup-storage
# Grant permission when prompted
# Then inside Ubuntu, access files at:
ls /storage/emulated/0/Cause: Ubuntu's home directory is separate.
# Work directly in Android storage
cd /storage/emulated/0/Repository/NeuTTS-Studio
python run.pypip install resampypip install soundfileCause: Build tools or OpenBLAS missing.
apt-get install build-essential cmake pkg-config libopenblas-dev -y
CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" \
pip install "neutts[llama]" --force-reinstall --no-cache-dirsource ai-env/bin/activate # Activate venv first!
pip install -r requirements.txt# Android / Ubuntu
apt-get install pkg-config -y
# iOS / Alpine
apk add pkgconfig# Android / Ubuntu
apt-get install portaudio19-dev -y
# iOS / Alpine
apk add portaudio-devError: OSError: libcudart.so.13: cannot open shared object file
Cause: PyTorch 2.4+ bundles CUDA libraries by default. ARM devices don't have NVIDIA GPUs.
Fix: Edit requirements.txt and uncomment the ARM line:
# Edit requirements.txt
nano requirements.txt
# Find and uncomment this line (remove the #):
--index-url https://download.pytorch.org/whl/cpu
# Then reinstall
pip install --force-reinstall torch
pip install -r requirements.txt# Increase timeout
export HF_HUB_DOWNLOAD_TIMEOUT=60
# Or use mirror if blocked
export HF_ENDPOINT=https://hf-mirror.com
# Then run again
python run.py- Switch to Q4 GGUF model (smallest)
- Close all other apps
- Restart Termux/Ubuntu session
- On Android: disable background apps in system settings
Cause: SafeTensors model doesn't support streaming.
Go to: [5] Settings → [1] Load Model → Select [2] Q8 or [3] Q4
Cause: Sample voices missing.
Download from: https://github.com/neuphonic/neutts/tree/main/samples
Copy to NeuTTS-Studio/data/samples/
Checklist:
- Audio is 3-15 seconds long
- Format is WAV (not MP3/M4A)
- Sample rate 16-44kHz
- Mono channel (not stereo)
- No background noise
- Transcript matches EXACTLY word for word
Fix audio:
ffmpeg -i input.m4a -ar 16000 -ac 1 -sample_fmt s16 output.wavNormal for first time:
- Downloads
facebook/w2v-bert-2.0(~2-3GB) - Takes 3-5 minutes on mobile
- Only happens once!
Fixed in v2.0.0 — update your code.
Fixed! Now uses original transcript as reference.
# Normalize audio levels
ffmpeg -i input.m4a -ar 16000 -ac 1 -af "loudnorm" output.wav# Check actual format
ffprobe your_file.m4a
# Try forcing format
ffmpeg -f mp4 -i your_file.m4a -ar 16000 -ac 1 output.wavFixed in v2.0.0 — now uses > instead of special chars.
Fixed with InputSafeSpinner class.
Fixed with custom ask_multiline() function.
export PYTHONUNBUFFERED=1
# Already set in run.py# Work from home directory instead
cd ~
git clone https://github.com/fardinsabid/NeuTTS-Studio.git
cd NeuTTS-Studio && python run.pyNormal for mobile:
- 50-100x real-time is expected
- 2s audio = 100-200s on mobile CPU
- Switch to Q4 GGUF for 2-3x speedup
Edit config.py:
CHUNK_SILENCE_MS = 300 # Increase from 200msCause: CPU too slow to feed audio buffer.
- Switch to Q4 GGUF for faster generation
- This is a warning, not an error
Safe to ignore completely — just a packaging warning.
| Device | Model | Speed | RTF |
|---|---|---|---|
| Galaxy A25 (Mid-range) | Q4 GGUF | 45 tok/s | 50-60x |
| Galaxy S23 (High-end) | Q4 GGUF | 80 tok/s | 30-40x |
| Galaxy S23 | Q8 GGUF | 70 tok/s | 35-45x |
| Pixel 7 | Q4 GGUF | 70 tok/s | 35-45x |
| iPhone 14 (iSH) | Q4 GGUF | 60 tok/s | 40-50x |
| iPad Pro | Q8 GGUF | 90 tok/s | 25-35x |
| Raspberry Pi 4 | Q4 GGUF | 30 tok/s | 80-100x |
| PC (i5, no GPU) | SafeTensors | 150 tok/s | 15-20x |
| PC (with GPU) | SafeTensors | 500+ tok/s | <5x |
RTF = Real-Time Factor (lower is better)
50 tok/s ≈ 1 second of audio per second of generation
Every audio file generated includes an invisible Perth watermark that cryptographically identifies it as AI-generated.
- ❌ Do not impersonate real people without explicit consent
- ❌ Do not generate deceptive, harmful, or fraudulent audio
- ✅ Respect the privacy and dignity of all individuals
- ✅ Follow all applicable laws in your jurisdiction
- ✅ Use for creative, educational, and personal projects
| Component | License |
|---|---|
| NeuTTS Studio Interface | MIT |
| NeuTTS-Nano Models | NeuTTS Open License 1.0 |
| NeuCodec | NeuTTS Open License 1.0 |
| espeak-ng | GPL v3 |
| Perth | MIT |
| llama.cpp | MIT |
| 🏠 Original NeuTTS repo | github.com/neuphonic/neutts |
| 🌍 Neuphonic website | neuphonic.com |
| 🤗 HuggingFace models | huggingface.co/neuphonic |
| 🎮 Try online | HuggingFace Space |
| 🐛 Report issues | GitHub Issues |