Handle non-UTF-8 metadata in audio file stream info by Mr-Neutr0n · Pull Request #4174 · pytorch/audio

Mr-Neutr0n · 2026-02-09T18:37:23Z

Summary

Some audio files — particularly Opus files from the MLS dataset — contain
metadata with non-UTF-8 bytes (e.g. Latin-1 encoded artist/title tags with
byte 0xe9). This causes an unhandled UnicodeDecodeError when
torchaudio.load() attempts to parse the stream metadata.

This PR adds try/except UnicodeDecodeError guards around:

AudioDecoder(uri) construction — catches the error if the decoder
fails during initialization due to non-decodable metadata bytes, and
re-raises as a clear RuntimeError.
decoder.metadata.sample_rate access — catches the error if metadata
property access triggers the decode failure, falling back to None so the
existing "unable to determine sample rate" error path is used.

Test plan

Verify that loading a standard WAV/FLAC/MP3 file still works as before
Verify that loading an Opus file with ASCII-only metadata works
Verify that loading an Opus file with non-UTF-8 metadata bytes (e.g.
from MLS dataset) no longer raises UnicodeDecodeError and instead
produces a clear RuntimeError

Some audio files (e.g. Opus files from the MLS dataset) contain metadata with non-UTF-8 bytes (such as Latin-1 encoded artist names), which causes a UnicodeDecodeError when the metadata is parsed during audio loading. This adds try/except guards around the AudioDecoder construction and the metadata.sample_rate access so that files with non-decodable metadata bytes produce a clear RuntimeError instead of an unhandled UnicodeDecodeError. Fixes pytorch#3821

pytorch-bot · 2026-02-09T18:37:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/4174

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Merge Blocking SEVs

There is 1 active merge blocking SEVs. Please view them below:

(merge blocking) CI is down due to a github accident

If you must merge, use @pytorchbot merge -f.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Mr-Neutr0n requested a review from a team as a code owner February 9, 2026 18:37

meta-cla bot added the CLA Signed label Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Handle non-UTF-8 metadata in audio file stream info#4174

Handle non-UTF-8 metadata in audio file stream info#4174
Mr-Neutr0n wants to merge 1 commit intopytorch:mainfrom
Mr-Neutr0n:fix-opus-metadata-decode

Mr-Neutr0n commented Feb 9, 2026

Uh oh!

pytorch-bot bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

Mr-Neutr0n commented Feb 9, 2026

Summary

Test plan

Uh oh!

pytorch-bot bot commented Feb 9, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/4174

❗ 1 Merge Blocking SEVs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant