Skip to content

Commit 965d0ed

Browse files
fix: remove normalization of audio in LTX Mel spectrogram creation (Comfy-Org#11990)
For LTX Audio VAE, remove normalization of audio during MEL spectrogram creation. This aligs inference with training and prevents loud audio from being attenuated.
1 parent ddc541f commit 965d0ed

File tree

1 file changed

+0
-10
lines changed

1 file changed

+0
-10
lines changed

comfy/ldm/lightricks/vae/audio_vae.py

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -103,20 +103,10 @@ def resample(self, waveform: torch.Tensor, source_rate: int) -> torch.Tensor:
103103
return waveform
104104
return torchaudio.functional.resample(waveform, source_rate, self.target_sample_rate)
105105

106-
@staticmethod
107-
def normalize_amplitude(
108-
waveform: torch.Tensor, max_amplitude: float = 0.5, eps: float = 1e-5
109-
) -> torch.Tensor:
110-
waveform = waveform - waveform.mean(dim=2, keepdim=True)
111-
peak = torch.max(torch.abs(waveform)) + eps
112-
scale = peak.clamp(max=max_amplitude) / peak
113-
return waveform * scale
114-
115106
def waveform_to_mel(
116107
self, waveform: torch.Tensor, waveform_sample_rate: int, device
117108
) -> torch.Tensor:
118109
waveform = self.resample(waveform, waveform_sample_rate)
119-
waveform = self.normalize_amplitude(waveform)
120110

121111
mel_transform = torchaudio.transforms.MelSpectrogram(
122112
sample_rate=self.target_sample_rate,

0 commit comments

Comments
 (0)