This project extends the Multimodal Live API Web Console to implement text-to-speech functionality using Gemini's audio capabilities. The implementation is primarily in src/components/text-to-speech/TextToSpeech.tsx
.
- Text-to-Speech Conversion: Convert any text to natural-sounding speech
- Multiple Voice Options: Choose from different voices:
- Puck
- Charon
- Kore
- Fenrir
- Aoede
- Customizable Prompts: Modify how the AI processes your text
- Real-time Audio Streaming: Hear the speech as it's being generated
- Audio Download: Save generated speech as WAV files
- Error Handling: Robust error handling with retry mechanisms
- Get your Gemini API key
- Set up the project:
# Clone the repository
git clone https://github.com/Muhtasham/text-to-speech-gemini.git
cd text-to-speech-gemini
# Install dependencies
npm install
# Create .env file and add your API key
echo "GEMINI_API_KEY=your_api_key_here" > .env
# Start the development server
npm start
- Open http://localhost:3000 in your browser
- Click "Show Settings" to access:
- Voice selection dropdown
- Custom prompt configuration
- Enter your text in the main textarea
- Click "Speak" to generate and play the audio
- Use "Download Audio" to save as WAV file
The text-to-speech functionality is implemented in TextToSpeech.tsx
with these key features:
// Key components:
- AudioStreamer for real-time audio playback
- Voice selection from available options
- Customizable prompts with default:
"Please convert this text to speech and recite it verbatim do not start with sure here it is etc:"
- WAV file generation for downloads
This project is based on the Multimodal Live API Web Console by Google. The original project provides modules for streaming audio playback, recording user media, and a unified log view.
Built with:
- React + TypeScript
- Web Audio API
- Gemini's Multimodal Live API
- SCSS for styling
This project maintains the original Apache License 2.0 from the base project.
This is an extension of an experiment showcasing the Multimodal Live API, not an official Google product. The original disclaimer and terms apply. See Google's policy for more information.