Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add article improving-pronunciation-indian-names-deepgram-tts #36

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Improving Pronunciation of Indian Names with Deepgram TTS

When using text-to-speech (TTS) models, accurate pronunciation of names can be a frequent challenge, particularly for names that aren't common in the model’s primary language. For example, pronunciation of Indian names like Prasanna, Mangesh, and Anil may be difficult for TTS models designed primarily for US-English speakers.

### Understanding the Limitation

Currently, Deepgram's TTS models are focused on US-English, which can lead to less-than-ideal pronunciation of non-English names or names not common in the US-English lexicon. However, plans are in place to enhance pronunciation support, allowing these models to handle a much wider variety of pronunciations more effectively in future updates.

### Temporary Solutions

Until these improvements are rolled out, there are several strategies you could try to improve pronunciation using our existing features:

1. **Customized Phonetic Spelling**: You can manually adjust the spelling of words to reflect their phonetic pronunciation more accurately. This involves spelling out names phonetically in the text you provide to the TTS service.

2. **Using Tips and Tricks from Deepgram**: Deepgram provides documentation for TTS prompting that can offer guidance on how to get better pronunciation by adjusting text inputs: [Text-to-Speech Prompting](https://developers.deepgram.com/docs/text-to-speech-prompting#pronunciation).

### Example Implementations in SDKs

Here are examples of how you might adjust phonetic spelling in several Deepgram SDKs:

#### Python SDK
```python
# Example if using Python SDK for TTS
text_to_speak = "spell phonetically"
tts_client.synthesize_voice(text_to_speak)
```

#### Node.js SDK
```javascript
// Example if using Node.js SDK for TTS
const textToSpeak = "spell phonetically";
ttsClient.synthesizeVoice(textToSpeak);
```

#### .NET SDK
```csharp
// Example if using .NET SDK for TTS
var textToSpeak = "spell phonetically";
ttsClient.SynthesizeVoice(textToSpeak);
```

#### Rust SDK
```rust
// Example in Rust SDK
let text_to_speak = "spell phonetically";
tts_client.synthesize_voice(text_to_speak);
```

#### Go SDK
```go
// Example in Go SDK
textToSpeak := "spell phonetically"
ttsClient.SynthesizeVoice(textToSpeak)
```

### Conclusion

Addressing pronunciation issues for specific names can be challenging but can often be enhanced by using phonetic spellings and adjustments to your input text. While waiting for improved support for diverse name pronunciation in future Deepgram updates, employing these strategies can lead to more satisfying TTS output.

### References
- [Deepgram Text-to-Speech Prompting](https://developers.deepgram.com/docs/text-to-speech-prompting#pronunciation)
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
using System;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using Newtonsoft.Json;

class DeepgramTTS
{
private static readonly HttpClient client = new HttpClient();

public static async Task Main(string[] args)
{
// Set the API key from environment variables
string apiKey = Environment.GetEnvironmentVariable("DEEPGRAM_API_KEY");

// Set the API endpoint
string apiEndpoint = "https://api.deepgram.com/v1/speak";

// Construct text with phonetic spelling to improve pronunciation
string text = "Prasanna (p-r-ah-s-a-n-n-a), Mangesh (m-uh-ng-ey-sh), Anil (ah-n-ee-l) are example names.";

// JSON payload
var payload = new
{
text = text,
voice = "en-US" // Assuming the use of a US voice model
};

string jsonPayload = JsonConvert.SerializeObject(payload);

client.DefaultRequestHeaders.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", apiKey);

// Send POST request to Deepgram TTS API
var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync(apiEndpoint, content);

if (response.IsSuccessStatusCode)
{
Console.WriteLine("Audio generated successfully.");
}
else
{
Console.WriteLine($"Failed to generate audio: {response.StatusCode}");
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package main

import (
"bytes"
"encoding/json"
"fmt"
"net/http"
"os"
)

func main() {
// Set the API key from environment variables
apiKey := os.Getenv("DEEPGRAM_API_KEY")

// Set the API endpoint
apiEndpoint := "https://api.deepgram.com/v1/speak"

// Construct text with phonetic spelling to improve pronunciation
text := "Prasanna (p-r-ah-s-a-n-n-a), Mangesh (m-uh-ng-ey-sh), Anil (ah-n-ee-l) are example names."

// JSON payload
payload := map[string]string{
"text": text,
"voice": "en-US", // Assuming the use of a US voice model
}

payloadBytes, err := json.Marshal(payload)
if err != nil {
fmt.Printf("Error marshalling JSON: %s\n", err)
return
}

req, err := http.NewRequest("POST", apiEndpoint, bytes.NewReader(payloadBytes))
if err != nil {
fmt.Printf("Error creating request: %s\n", err)
return
}
req.Header.Set("Authorization", "Bearer "+apiKey)
req.Header.Set("Content-Type", "application/json")

resp, err := http.DefaultClient.Do(req)
if err != nil {
fmt.Printf("Error sending request: %s\n", err)
return
}
defer resp.Body.Close()

if resp.StatusCode == http.StatusOK {
fmt.Println("Audio generated successfully.")
} else {
fmt.Printf("Failed to generate audio: %d\n", resp.StatusCode)
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
const fetch = require('node-fetch');
require('dotenv').config();

// Set the API key from environment variables
const apiKey = process.env.DEEPGRAM_API_KEY;

// Set the API endpoint
const apiEndpoint = 'https://api.deepgram.com/v1/speak';

// Construct text with phonetic spelling to improve pronunciation
const text = "Prasanna (p-r-ah-s-a-n-n-a), Mangesh (m-uh-ng-ey-sh), Anil (ah-n-ee-l) are example names.";

// JSON payload
const payload = {
text: text,
voice: 'en-US' // Assuming the use of a US voice model
};

// Send POST request to Deepgram TTS API
fetch(apiEndpoint, {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(payload)
})
.then(response => {
if (response.ok) {
console.log("Audio generated successfully.");
} else {
console.error(`Failed to generate audio: ${response.status}`);
}
})
.catch(error => {
console.error('Error:', error);
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import os
import requests
import json

# Set the API key from environment variables
api_key = os.getenv('DEEPGRAM_API_KEY')

# Set the API endpoint
api_endpoint = 'https://api.deepgram.com/v1/speak'

# Construct text with phonetic spelling to improve pronunciation
text = "Prasanna (p-r-ah-s-a-n-n-a), Mangesh (m-uh-ng-ey-sh), Anil (ah-n-ee-l) are example names."

# JSON payload
payload = {
'text': text,
'voice': 'en-US' # Assuming the use of a US voice model
}

# Send POST request to Deepgram TTS API
response = requests.post(
api_endpoint,
headers={
'Authorization': f'Bearer {api_key}',
'Content-Type': 'application/json'
},
data=json.dumps(payload)
)

# Check if the response is successful
if response.status_code == 200:
print("Audio generated successfully.")
else:
print(f"Failed to generate audio: {response.status_code}")
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
use std::env;
use reqwest::{Client, Error};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Error> {
// Set the API key from environment variables
let api_key = env::var("DEEPGRAM_API_KEY").expect("DEEPGRAM_API_KEY not set");

// Set the API endpoint
let api_endpoint = "https://api.deepgram.com/v1/speak";

// Construct text with phonetic spelling to improve pronunciation
let text = "Prasanna (p-r-ah-s-a-n-n-a), Mangesh (m-uh-ng-ey-sh), Anil (ah-n-ee-l) are example names.";

// JSON payload
let payload = json!({
"text": text,
"voice": "en-US" // Assuming the use of a US voice model
});

// Create an HTTP client
let client = Client::new();

// Send POST request to Deepgram TTS API
let response = client.post(api_endpoint)
.bearer_auth(api_key)
.json(&payload)
.send()
.await?;

// Check if the response is success
if response.status().is_success() {
println!("Audio generated successfully.");
} else {
println!("Failed to generate audio: {}", response.status());
}

Ok(())
}