Skip to content

Comments

Handle non-UTF-8 metadata in audio file stream info#4174

Open
Mr-Neutr0n wants to merge 1 commit intopytorch:mainfrom
Mr-Neutr0n:fix-opus-metadata-decode
Open

Handle non-UTF-8 metadata in audio file stream info#4174
Mr-Neutr0n wants to merge 1 commit intopytorch:mainfrom
Mr-Neutr0n:fix-opus-metadata-decode

Conversation

@Mr-Neutr0n
Copy link

Summary

Fixes #3821

Some audio files — particularly Opus files from the MLS dataset — contain
metadata with non-UTF-8 bytes (e.g. Latin-1 encoded artist/title tags with
byte 0xe9). This causes an unhandled UnicodeDecodeError when
torchaudio.load() attempts to parse the stream metadata.

This PR adds try/except UnicodeDecodeError guards around:

  1. AudioDecoder(uri) construction — catches the error if the decoder
    fails during initialization due to non-decodable metadata bytes, and
    re-raises as a clear RuntimeError.
  2. decoder.metadata.sample_rate access — catches the error if metadata
    property access triggers the decode failure, falling back to None so the
    existing "unable to determine sample rate" error path is used.

Test plan

  • Verify that loading a standard WAV/FLAC/MP3 file still works as before
  • Verify that loading an Opus file with ASCII-only metadata works
  • Verify that loading an Opus file with non-UTF-8 metadata bytes (e.g.
    from MLS dataset) no longer raises UnicodeDecodeError and instead
    produces a clear RuntimeError

Some audio files (e.g. Opus files from the MLS dataset) contain
metadata with non-UTF-8 bytes (such as Latin-1 encoded artist names),
which causes a UnicodeDecodeError when the metadata is parsed during
audio loading.

This adds try/except guards around the AudioDecoder construction and
the metadata.sample_rate access so that files with non-decodable
metadata bytes produce a clear RuntimeError instead of an unhandled
UnicodeDecodeError.

Fixes pytorch#3821
@Mr-Neutr0n Mr-Neutr0n requested a review from a team as a code owner February 9, 2026 18:37
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 9, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/4174

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Merge Blocking SEVs

There is 1 active merge blocking SEVs. Please view them below:

If you must merge, use @pytorchbot merge -f.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed label Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loading Opus files from MLS dataset fails because of file metadata

1 participant