TorchTS Project

Overview

TorchTS is a text-to-speech application built with Python and Vue.js. It provides an interface for converting text from various document formats into speech using the Kokoro TTS model. The project combines a FastAPI backend with a Vue.js frontend to create a practical tool for text-to-speech conversion.

Features

Text Processing: Text handling and chunking utilities
Document Support: Parse and extract text from PDF, DOCX, ODT, and markdown files
Audio Generation: Text-to-speech conversion using Kokoro TTS
Multi-Speaker Support: Generate audio with different voices for different speakers in dialogues
Profile Management: Create and manage profiles with customizable voice and volume settings
File Management: Upload, store, and organize files within profiles
RESTful API: FastAPI backend endpoints for file processing and audio generation
Modern Interface: Vue.js frontend with Vuetify components for a responsive design

Project Structure

torchts/
├── requirements.txt           # Python dependencies
├── src/
│   ├── backend/              # Python backend
│   │   ├── api/             # API endpoints and routing
│   │   ├── storage/         # Database models and storage
│   │   ├── processing/      # Text and audio processing
│   │   └── main.py         # Main entry point
│   └── frontend/            # Frontend applications
│       └── templates/
│           └── vue/        # Vue.js application

Installation

Quick Start (Docker)

Clone the repository

Run the application:

docker compose up -d

To run with CUDA use:

docker compose -f docker-compose.cuda.yml up -d

Access the web interface at http://localhost:5173

That's it! Docker will automatically set up everything needed.

Development Setup

Prerequisites

Python 3.11+
Node.js 18+
npm 9+
espeak-ng (macOS only)

For local development on macOS, install espeak-ng:

brew install espeak-ng

Backend Setup (Python)

Create and activate a virtual environment (recommended):

python -m venv .venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate     # On Windows

Install dependencies:
```
pip install -r requirements.txt
```
Start the backend server:
```
python src/backend/main.py
```

Frontend Setup (Vue.js)

Navigate to the Vue directory:
```
cd src/frontend/templates/vue
```
Install dependencies:
```
npm install
```
Start development server:
```
npm run dev
```

Usage

Access the web interface at http://localhost:5173 after starting both the backend and frontend servers.
Create a profile by clicking "Create New Profile" and setting your preferred voice and volume settings.
Upload text or documents to your profile using the file upload area.
Click on any uploaded file to load its content into the text editor.
Choose between Single Speaker or Multi Speaker mode:
- Single Speaker: Select one voice for the entire text
- Multi Speaker: Use multiple voices for different speakers in dialogues
Adjust voice settings if needed and click "Convert to Speech" to generate audio.
Use the profile settings (cogwheel icon) to manage your files and profile.

Multi-Speaker Mode

In Multi-Speaker mode, you can assign different voices to different speakers in your text. Use the following format:

>>> 1 Hello everyone! This is the first speaker.
>>> 2 And I'm the second speaker!
>>> 1 We can have a conversation like this.

Each speaker is identified by a number (>>> 1, >>> 2, etc.) and can be assigned a different voice using the voice selection dropdown menus.

Keyboard Controls

Space: Play/Pause audio
←/→: Seek backward/forward 5 seconds
↑/↓: Increase/decrease volume by 5%

Profile Management

Create Profile: Set up profiles with custom voice presets and volume settings
Upload Files: Each profile maintains its own collection of uploaded files
File Organization: Files are stored per profile for better organization
Profile Settings: Access profile settings via the cogwheel icon to:
- Delete all files in the profile
- Delete the entire profile and its associated files

Contributing

Feel free to open issues or submit pull requests if you'd like to contribute to the project.

License

This project is licensed under the MIT License.

Acknowledgments

This project relies heavily on the Kokoro-82M text-to-speech model created by hexgrad. Their work on developing this high-quality TTS model made this project possible.
Built with FastAPI, Vue.js, and Vuetify

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
img		img
src		src
.gitignore		.gitignore
Dockerfile.backend		Dockerfile.backend
Dockerfile.backend.cuda		Dockerfile.backend.cuda
Dockerfile.frontend		Dockerfile.frontend
LICENSE		LICENSE
README.md		README.md
docker-compose.cuda.yml		docker-compose.cuda.yml
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TorchTS Project

Overview

Features

Project Structure

Installation

Quick Start (Docker)

Development Setup

Prerequisites

Backend Setup (Python)

Frontend Setup (Vue.js)

Usage

Multi-Speaker Mode

Keyboard Controls

Profile Management

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

License

alexykn/TorchTS

Folders and files

Latest commit

History

Repository files navigation

TorchTS Project

Overview

Features

Project Structure

Installation

Quick Start (Docker)

Development Setup

Prerequisites

Backend Setup (Python)

Frontend Setup (Vue.js)

Usage

Multi-Speaker Mode

Keyboard Controls

Profile Management

Contributing

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages