40% Less VRAM Usage, Slightly Faster Inference, and Improved Voice Consistency – Optimized Text Autochunking #159

RobertAgee · 2025-04-29T21:10:49Z

Title

40% Less VRAM Usage, Slightly Faster Inference, and Improved Voice Consistency – Optimized Text Autochunking

Important

This PR depends on UI updates from PR Gradio UI Improvements - Added Gradio Seed Input, Audio Prompt Transcript Field, Console Log Outputs #137.
➔ Please review and merge Gradio UI Improvements - Added Gradio Seed Input, Audio Prompt Transcript Field, Console Log Outputs #137 first.

Changes

➕ Added smart autochunking of user text input based on effective character length (~48–64 chars)
➕ Supports Tensor Core acceleration:
- Autochunking ensures sequence lengths are hardware-aligned (~64 tokens)
- Batching improves GPU occupancy and compute efficiency
🧠 Refactored model.generate() and _prepare_generation():
- Now accepts audio_prompt_text as a clean, separate parameter
- Internally joins it with input text to match previous behavior but with cleaner architecture
➕ Autocasts generation to float16 (uses torch.cuda.amp.autocast)
🧹 Includes internal memory cleanup after generation for more stable long-running usage

Why

✅ ~40% less VRAM usage (~4GB vs ~7GB) on T4 GPUs.
✅ Slightly faster inference (~0.3 RTF vs ~0.33 RTF) due to smarter batching and TensorCore activation.
✅ Improved voice consistency when using audio prompts, even across multiple chunks.
✅ Cleaner internal design (separated audio prompt vs user text, easier future upgrades).

Notes

🛠 Tested extensively on Gradio 4.x.
🔄 Should be fully backward compatible without impacting any existing workflows.
📈 Model behavior and output quality are preserved or improved.
Test it in Google Colab: https://colab.research.google.com/drive/1g5yocOEeW9IB7ZdmdWs3aHnk6-YA8DfB?usp=sharing

Thanks for reviewing this PR! 🚀

…ad of whitespace

…king

znraznra

This works well in my case.

buttercrab

Can you fix the lint & format

journeytosilius · 2025-05-04T14:35:34Z

I get really weird results when using expressions with symbols like $140 ... is this common and what are the guidelines to get it right ?

RobertAgee · 2025-05-04T19:31:37Z

Hey @buttercrab , the linting and format should now be fixed. I also went ahead and tested the merge and everything seems to work.

A couple notes from testing the merge:
1. Good news, there's much fewer weird gaps of silence in the generations 🚀

2. Voice consistency still needs to be improved. 🔧

Currently, each chunk of audio references the uploaded audio+text, with hopes of capturing the voice. I've also tried referencing each previous chunk+text as it generates, but that causes it to gradually morph away from the speaker. It is possible to tune the settings to mitigate this to a small degree but it's not a workable solution. I have a couple other ideas I want to try by inserting speaker tags but I assume simply finetuning a model will be the best and easiest option regardless.

3. Model is much slower now? 🐢

While we still have preserved the VRAM savings (~4.2GB baseline, 6.9GB peak on a T4), it seems the other PR changes in the last couple days have made the generations much slower, anywhere from 10-30%, especially on longer audios. Specifically I'm talking about the generations after the initial compilation. Prior the rate was consistent for all lengths of generated audio.

harmlessman · 2025-05-05T07:19:24Z

1.	How long did it take to generate around 10 seconds of audio from text?
2.	Did you implement streaming functionality for the TTS output?

RobertAgee · 2025-05-06T15:36:41Z

1.	How long did it take to generate around 10 seconds of audio from text?
2.	Did you implement streaming functionality for the TTS output?

@harmlessman Typically PR requests are not the place to ask these questions, just fyi. However, the information may be useful to @buttercrab.

Depends on your GPU card and set up. For reference, for 10s audio generation, original Dia repo on T4 took ~35 s on average. The new compile Dia takes ~40 s on average. My original PR took ~30 s on average.
Streaming outputs are not currently supported. A much better card(s) are needed to get faster than real time. We could implement buffered outputs, but there'd still be some initial gap between prompt and output, depending on chunk size.

buttercrab

Can you fix the lint error?

V12Hero · 2025-05-13T09:30:12Z

@RobertAgee to get around the link error just use ruff like:
ruff check path/to/file --fix --unsafe-fixes
and
ruff format path/to/file

It should help you get around any formatting errors.

RobertAgee added 30 commits April 26, 2025 18:07

add user set seed parameter to gradio

7c6975a

display seed used and console log output in gradio

e43a9a6

add visible console logging, restructure gradio interface

648fca3

small input fixes to gradio interface, set dark mode as default

0df9e04

add small UI improvements, change text prepending to line break inste…

6381399

…ad of whitespace

revert to whitespace prepending of audio prompt

7826c49

revert to line break prepend, add audio prompt text validation

e19448b

minor linting, adding comments and console logs

9940c29

clean up console logs for easier readability

dd551e6

linting

76d51b0

stop tracking .idea folder

d221673

stop tracking .idea folder

34aeb5e

first commit of text chunking generation

0e82f1f

fix audio chunking validation

1d9c4d9

add dynamic chunking and token scaling

2e5d8ea

add slight pause between chunks

61e3666

add user-selected chunking, clean up examples

cb22e11

revert examples clean up, correct parameter inputs

bd56d8d

set default chunk size to 4 lines, add clearer instructions

37e25c0

first attempt quantization to int8

b06dd69

fix quantization method model call

30428f9

add quantized model

45c8e39

correct model import

7014b56

fix quant model load and compute

7168448

add cache clearing and gc before inference

e4c822f

add support qint8

3977134

correct model computer type

0d5486d

load in fp16

a3bc6a6

Update app.py to 32fp

7f36fa4

chunking by character optimization

d81184b

RobertAgee added 8 commits April 29, 2025 12:27

test new chunking with float32

d784cd9

remove vram clear, revert to fp16

2ed8112

add cache clearing and garbage collection

f750bb5

separate audio prompt text and tokenization from text to generate

749dfb5

fix encoder step with safety validation

eb84257

fix text encoding to mimic original function but compatible with chun…

f63b618

…king

initial commit

d1ce6be

update autocast parameters

c03da95

jaehong21 added the enhancement New feature or refactor label Apr 29, 2025

znraznra approved these changes May 1, 2025

View reviewed changes

buttercrab suggested changes May 2, 2025

View reviewed changes

RobertAgee added 2 commits May 4, 2025 13:34

import linting and more granular memory clear exception handling

d08a91a

Merge branch 'main' into optimized-chunking

a334fe7

buttercrab suggested changes May 13, 2025

View reviewed changes

merge

ec467b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

40% Less VRAM Usage, Slightly Faster Inference, and Improved Voice Consistency – Optimized Text Autochunking #159

40% Less VRAM Usage, Slightly Faster Inference, and Improved Voice Consistency – Optimized Text Autochunking #159

Uh oh!

RobertAgee commented Apr 29, 2025 •

edited

Loading

Uh oh!

znraznra left a comment

Uh oh!

buttercrab left a comment

Uh oh!

journeytosilius commented May 4, 2025

Uh oh!

RobertAgee commented May 4, 2025 •

edited

Loading

Uh oh!

harmlessman commented May 5, 2025

Uh oh!

RobertAgee commented May 6, 2025

Uh oh!

buttercrab left a comment

Uh oh!

V12Hero commented May 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

40% Less VRAM Usage, Slightly Faster Inference, and Improved Voice Consistency – Optimized Text Autochunking #159

Are you sure you want to change the base?

40% Less VRAM Usage, Slightly Faster Inference, and Improved Voice Consistency – Optimized Text Autochunking #159

Uh oh!

Conversation

RobertAgee commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title

Important

Changes

Why

Notes

Uh oh!

znraznra left a comment

Choose a reason for hiding this comment

Uh oh!

buttercrab left a comment

Choose a reason for hiding this comment

Uh oh!

journeytosilius commented May 4, 2025

Uh oh!

RobertAgee commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harmlessman commented May 5, 2025

Uh oh!

RobertAgee commented May 6, 2025

Uh oh!

buttercrab left a comment

Choose a reason for hiding this comment

Uh oh!

V12Hero commented May 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

RobertAgee commented Apr 29, 2025 •

edited

Loading

RobertAgee commented May 4, 2025 •

edited

Loading