Skip to content

Conversation

TParcollet
Copy link

This is a proof of concept for #7310 . The idea is to enable the access to others column of the dataset row when loading an audio file into a table. This is to allow sliced reading. As stated in the issue, many people have very long audio files and use start and stop slicing in this audio file.

Right now, this code work as a PoC on my dataset. However, this is just to illustrate the idea. Many things are messed up, the first being that the shards have wildly varying sizes.

Could be of interest to @lhoestq and @sanchit-gandhi ?

Happy to test better ideas locally.

Titouan Parcollet/Embedded AI /SRUK/Engineer/Samsung Electronics added 2 commits December 8, 2024 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant