Skip to content

Commit 40cd801

Browse files
committed
[text-to-audio-generator] Add new function to hub
1 parent 8dd33c3 commit 40cd801

File tree

7 files changed

+886
-0
lines changed

7 files changed

+886
-0
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Client: I love MLRun!
2+
Agent: Me too!

text_to_audio_generator/function.yaml

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
kind: job
2+
metadata:
3+
name: text-to-audio-generator
4+
tag: ''
5+
hash: f36d56d620c6a69f414c9cb90e42ec012847a607
6+
project: ''
7+
labels:
8+
author: yonatans
9+
categories:
10+
- data-preparation
11+
- machine-learning
12+
spec:
13+
command: ''
14+
args: []
15+
image: ''
16+
build:
17+
functionSourceCode: 
18+
base_image: mlrun/mlrun
19+
commands: []
20+
code_origin: ''
21+
origin_filename: ''
22+
requirements:
23+
- bark
24+
- torchaudio
25+
entry_points:
26+
generate_multi_speakers_audio:
27+
name: generate_multi_speakers_audio
28+
doc: ''
29+
parameters:
30+
- name: data_path
31+
type: str
32+
doc: Path to the text file or directory containing the text files to generate
33+
audio from.
34+
- name: output_directory
35+
type: str
36+
doc: Path to the directory to save the generated audio files to.
37+
- name: speakers
38+
type: Union[List[str], Dict[str, int]]
39+
doc: List / Dict of speakers to generate audio for. If a list is given, the
40+
speakers will be assigned to channels in the order given. If dictionary,
41+
the keys will be the speakers and the values will be the channels.
42+
- name: available_voices
43+
type: List[str]
44+
doc: 'List of available voices to use for the generation. See here for the
45+
available voices: https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c'
46+
- name: use_gpu
47+
type: bool
48+
doc: Whether to use the GPU for the generation.
49+
default: true
50+
- name: use_small_models
51+
type: bool
52+
doc: Whether to use the small models for the generation.
53+
default: false
54+
- name: offload_cpu
55+
type: bool
56+
doc: 'TODO: What does this do?'
57+
default: false
58+
- name: sample_rate
59+
type: int
60+
doc: The sampling rate of the generated audio.
61+
default: 16000
62+
- name: file_format
63+
type: str
64+
doc: The format of the generated audio files.
65+
default: wav
66+
- name: verbose
67+
type: bool
68+
doc: Whether to print the progress of the generation.
69+
default: true
70+
- name: bits_per_sample
71+
type: Optional[int]
72+
doc: Changes the bit depth for the supported formats. Supported only in "wav"
73+
or "flac" formats.
74+
default: null
75+
outputs:
76+
- doc: 'A tuple of: - The output directory path. - The generated audio files
77+
dataframe. - The errors dictionary.'
78+
default: ''
79+
lineno: 30
80+
description: Generate audio file from text using different speakers
81+
default_handler: generate_multi_speakers_audio
82+
disable_auto_mount: false
83+
clone_target_dir: ''
84+
env: []
85+
priority_class_name: ''
86+
preemption_mode: prevent
87+
affinity: null
88+
tolerations: null
89+
security_context: {}
90+
verbose: false

text_to_audio_generator/item.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
apiVersion: v1
2+
categories:
3+
- data-preparation
4+
- machine-learning
5+
description: Generate audio file from text using different speakers
6+
doc: ''
7+
example: text_to_audio_generator.ipynb
8+
generationDate: 2023-12-03:15-30
9+
hidden: false
10+
icon: ''
11+
labels:
12+
author: yonatans
13+
maintainers: []
14+
marketplaceType: ''
15+
mlrunVersion: 1.5.2
16+
name: text_to_audio_generator
17+
platformVersion: 3.5.3
18+
spec:
19+
filename: text_to_audio_generator.py
20+
handler: generate_multi_speakers_audio
21+
image: mlrun/mlrun
22+
kind: job
23+
requirements:
24+
- bark
25+
- torchaudio
26+
url: ''
27+
version: 1.0.0
28+
test_valid: True
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
bark
2+
torchaudio>=2.1.0
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Copyright 2023 Iguazio
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import mlrun
16+
import tempfile
17+
import pytest
18+
19+
20+
@pytest.mark.parametrize("file_format,bits_per_sample", [("wav", 8), ("mp3", None)])
21+
def test_generate_multi_speakers_audio(file_format, bits_per_sample):
22+
text_to_audio_generator_function = mlrun.import_function("function.yaml")
23+
with tempfile.TemporaryDirectory() as test_directory:
24+
function_run = text_to_audio_generator_function.run(
25+
handler="generate_multi_speakers_audio",
26+
inputs={"data_path": "data/test_data.txt"},
27+
params={
28+
"output_directory": test_directory,
29+
"speakers": {"Agent": 0, "Client": 1},
30+
"available_voices": [
31+
"v2/en_speaker_0",
32+
"v2/en_speaker_1",
33+
],
34+
"use_small_models": True,
35+
"use_gpu": False,
36+
"offload_cpu": True,
37+
"file_format": file_format,
38+
"bits_per_sample": bits_per_sample,
39+
},
40+
local=True,
41+
returns=[
42+
"audio_files: path",
43+
"audio_files_dataframe: dataset",
44+
"text_to_speech_errors: file",
45+
],
46+
artifact_path=test_directory,
47+
)
48+
assert function_run.error == "Run state (completed) is not in error state"
49+
for key in ["audio_files", "audio_files_dataframe", "text_to_speech_errors"]:
50+
assert key in function_run.outputs and function_run.outputs[key] is not None

0 commit comments

Comments
 (0)