Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom tag mappings when reading and writing metadata #2621

Open
elfalem opened this issue Jul 4, 2017 · 20 comments
Open

Custom tag mappings when reading and writing metadata #2621

elfalem opened this issue Jul 4, 2017 · 20 comments
Labels
feature features we would like to implement mediafile Relates to MediaFile and should be migrated to beetbox/mediafile
Milestone

Comments

@elfalem
Copy link

elfalem commented Jul 4, 2017

Beets currently has a predefined mapping between fields in the database and the tags they're written to for the different file formats. In some cases a field is read from or written to multiple tags (e.g. tracktotal is written to TRACKTOTAL, TRACKC, and TOTALTRACKS Vorbis comment fields which I understand is to be compatible with a variety of music software.

Many users would like to have more control over which fields are written to files and what tag names are used when doing so. This can be achieved by introducing a new configuration where the user can specify their desired mapping.

I think the biggest concern if this is to be implemented is the difference between the various audio file formats. While Vorbis comments allow free form tags, ASF, MP3 and MP4 storage styles have a more strict structure. My recommendation would be to support this custom mapping only for Vorbis comments and follow the default mapping for the other storage styles. While this will introduce a disparity between the different formats, I don't think the more restrictive formats should hold beets back from supporting a very useful customization.

While the plan for this feature request is to customize the mappings for the standard fields beets knows about, it can serve as the foundation for allowing writing (and reading) flexible attributes to files. That would make it easier for implementing issues such as #122.

I have done a small prototype where I modified the storage styles used by the MediaField descriptor assigned to the lyrics field mediafile.py based on a configuration. It seems to work as far as I can tell. I was able to write the lyrics field to a custom tag of my choosing (one, multiple or zero tags) for a FLAC file.

@sampsyo sampsyo added the feature features we would like to implement label Jul 4, 2017
@sampsyo
Copy link
Member

sampsyo commented Jul 4, 2017

Thanks for the suggestion! This would certainly be an interesting addition for advanced users. Currently, some users even have trouble understanding that there is a level of indirection between the "beets names" for fields and how they appear in files' tags. Making that indirection customizable could help clarify what's going on internally in beets.

Here are a few concerns I foresee in implementing this:

  • It will require a nontrivial architectural change in MediaFile. The design currently starts from the assumption that there's a built-in list of fields with fixed mappings that are declared when the class is constructed; we may want to rethink how this all works.
  • We'll want to avoid messy global modifications to the class itself. Some time ago, we actually added the ability to declare new MediaFields in plugins, but this is actually very fragile the way it's currently implemented: it changes the MediaFile class itself. That makes it hard to test and creates an opportunity for mysterious bugs. A dependency-injection-style approach would be more maintainable.
  • The mapping from beets fields to Vorbis comments is not always as straightforward as a name-to-name lookup. For example, year is a composite field that actually maps to part of the DATE field under the hood. The ReplayGain fields have more storage-style parameters than just the name they use. Whatever approach we use may need to exclude "special" fields for some definition of "special."

FWIW, this is also relevant to #350, which is a specific case of wanting different mappings from built-in fields to Vorbis/etc. tag names.

@elfalem
Copy link
Author

elfalem commented Jul 4, 2017

I was thinking of making the change internal to the MediaFile class kind of like:

def map_tags(tag):
     # read if mapping exists in config
     # if so, create a tuple of storage styles from the config and add the MP3/MP4/ASF styles
     # if not, create a tuple of the default storage styles
     # return a MediaField instantiated with the above styles

# Field definitions.
title = map_tags('title')
lyrics = map_tags('lyrics')

My knowledge of beets (and python in general) is limited but I think that would be a way of implementing this that is contained to the class itself and not requiring large architectural changes. I agree dependency injection or other major changes would be needed for dealing with flex attrs and plugins but the above might work just for the standard fields.

Regarding fields like date, year and ReplayGain, they are more complex but I think it's still possible to use a custom name for them while maintaining the additional parameters supplied to the storage style such as float_places and suffix.

@sampsyo
Copy link
Member

sampsyo commented Jul 5, 2017

Yes! I do agree that changes shouldn't contaminate too much of beets itself, and shouldn't affect the rest of beets—but I'd still like to see them end up being provided to the constructor (or something like that) as opposed to modifying the base class in place.

Anyway, if your prototype change is pushed to GitHub somewhere, I'd be happy to take a look!

@elfalem
Copy link
Author

elfalem commented Jul 5, 2017

If I understand what you're suggesting, there would be a new class, say TagMap where the fields live. It would be responsible for constructing the MediaField objects for each field with correct storage styles and also handling new fields declared by plugins. TagMap is then passed to MediaFile and the current properties in that class perhaps would serve as wrappers.

I've pushed my prototype here: master...elfalem:2621-custom-tag-map

@sampsyo
Copy link
Member

sampsyo commented Jul 5, 2017

I was thinking of an even simpler distinction: just the difference between modifying the class with a preferred mapping and modifying the instances. Fore example, it would be nice if you had to say:

mf = MediaFile('/path', { 'artist': 'my_artist' })

every time you construct a MediaFile. That's in contrast to modifying the MediaFile class itself, which (silently) affects the behavior of all MediaFile instances. (The latter is how the current tag-adding functionality works.)

@elfalem
Copy link
Author

elfalem commented Jul 5, 2017

I think I'm missing why we want to modify specific instances as opposed to all. What would be the cases where MediaFile instances will be constructed with different mappings? Do we want to limit tag visibility to specific plugins? I see how it can make testing easier but wouldn't it be cumbersome in normal use to specify the same mapping everywhere?

@sampsyo
Copy link
Member

sampsyo commented Jul 5, 2017

The problem isn't changing all the instances per se, but the effect on code maintainability. Having plugins modify the class structure has been a big headache for testing and debugging in the past. For example, it makes it really easy to make a change in one test and forget to clean it up—which affects all the other tests, potentially making them fail in mysterious ways. At the core, the problem is that the extra fields are implicit rather than explicit—so I'm just advocating that we err on the side of explicit. Passing the configuration to the constructor is one way to make things explicit.

@GuilhermeHideki
Copy link
Member

usually, the "global state" is "evil"

@elfalem
Copy link
Author

elfalem commented Jul 6, 2017

I'm looking into passing the configuration in the constructor and it looks it's going to require changing all the fields in the MediaFile class from class attributes to instance attribute. Is that correct? If so, I think it's going to require larger changes in various places where the fields are accessed/enumerated.

@elfalem
Copy link
Author

elfalem commented Jul 6, 2017

I've pushed my latest changes where I changed the lyrics field to an instance attribute. The issue now is when writing tags to a file, the lyrics field is totally ignored.
master...elfalem:2621-custom-tag-map

@sampsyo
Copy link
Member

sampsyo commented Jul 7, 2017

Hmm, yes, Python descriptors only work when they're on the class (if I understand correctly). To avoid global state while still accomplishing this will require some creativity: for example, the descriptors could look up an indirection table that is itself stored on the instance.

@elfalem
Copy link
Author

elfalem commented Jul 8, 2017

Thanks for your guidance! Looks like all problems can be solved by another level of indirection :). I think I have an implementation that works with all the different tags (master...elfalem:2621-custom-tag-map).

Basically, I added a new field_type parameter to MediaField which is used to search the indirection table stored on MediaFile. The current method of passing a tuple of storage styles still works but I anticipate it will only be needed for plugins declaring new media fields.

The new configuration will look like:

map:
    albumartist: [ALBUM ARTIST, ALBUMARTIST]

I noticed that running tests locally might fail based on the contents of the user's configuration file. Is there a way in Confuse to only fetch configurations from the default layer?

I've modified a couple of tests to work with the changes. I still need to modify the rest, and add new ones as well.

@sampsyo
Copy link
Member

sampsyo commented Jul 8, 2017

Cool! I don't 100% understand how the new MediaFile architecture works, so I may need a more detailed introduction sometime… but for now, regarding the tests, you can get a "clean" configuration environment by using the infrastructure in _common.py that clears out the beets configuration in setUp and restores it in tearDown.

@elfalem
Copy link
Author

elfalem commented Jul 13, 2017

I've added a couple of tests for the custom mapping making use of the _common.py infrastructure (master...elfalem:2621-custom-tag-map).

With regards to the MediaFile architecture, in the present system, MediaField takes a tuple of StorageStyles when the field descriptors are declared in MediaFile class such as:

artist = MediaField(
        MP3StorageStyle('TPE1'),
        MP4StorageStyle('\xa9ART'),
        StorageStyle('ARTIST'),
        ASFStorageStyle('Author'),
    )

With my changes, there is a new field_name parameter for MediaField which takes a string such as:

artist = MediaField(field_name='artist')

Inside MediaField class, if field_name is provided it invokes the new get_storage_styles() method of the current MediaFile instance. get_storage_styles() returns a list of storage styles by combining the non-vorbis styles (hardcoded into the class) and the vorbis styles (injected through the MediaFile constructor).

@sampsyo
Copy link
Member

sampsyo commented Jul 13, 2017

I see! Thanks for clarifying.

I have two high-level suggestions for the new architecture:

  • Just for readability, did you consider leaving the non-Vorbis storage styles attached to each MediaField? Instead of using a string index to look up each one, it might be nice to keep them all together in one place.
  • Part of the design goal for MediaFile is that it should be useful "out of the box," as a mostly separate component from the rest of beets. With the current design, all the free-form tag names have been moved out of MediaFile and into beets's default configuration. Can we consider a version where the MediaFields keep their list of default tag names, so a MediaFile constructed without a mapping is still useful? Then, the beets configuration can consist of overrides instead of the complete information.

Finally, one technique that might help reduce redundancy would be to drop the descriptor style for defining MediaFields. We could instead use __getattr__ to look up the MediaField for a given name. I'm not sure if this will end up being more or less complex in the end, but it might be worth considering.

@elfalem
Copy link
Author

elfalem commented Jul 15, 2017

Thanks for the suggestions.

  • We can leave the non-Vorbis styles attached to the MediaField. However, I think the program flow would be more clear if all the styles are obtained the same way. Using a lookup could also make it easier to perhaps extend the custom mapping to the other more restrictive storage styles in the future (at least for a subset of the tags). I don't know how likely that is though.
  • I agree about putting the default mapping inside the class instead of the configuration.
  • Regarding the use of __getattr__, I don't know if it will be less complex. If all the MediaFields were constructed by just using a field name, it would be better. But I think the additional parameters needed for some of the fields (such as out_type) will require additional logic or some sort of data structure so I'm not sure if we will gain much from it.

@elfalem
Copy link
Author

elfalem commented Jul 20, 2017

Also, is it possible to get a default value instead of an exception when obtaining a view of a configuration? For example, with config['map'].get(), if map doesn't exist, return a default value similar to the python dictionary get() function.

@sampsyo
Copy link
Member

sampsyo commented Jul 20, 2017

There are a few common strategies:

  • Use config.add() to provide a default value (see the inline plugin).
  • Use map in config to check whether the value exists (see the filefilter plugin).
  • Catch a confit.NotFoundError to provide a default action (see the zero plugin).

@elfalem
Copy link
Author

elfalem commented Jul 22, 2017

Thanks for all the options. I'm learning a lot about beets. I thought of another option. What about having an empty map: key in the default configuration file? If that's acceptable, it might be preferable to checking if the key exists in all the places a MediaFile is constructed.

@sampsyo
Copy link
Member

sampsyo commented Jul 22, 2017

Sure! Let’s start there.

@arcresu arcresu added the mediafile Relates to MediaFile and should be migrated to beetbox/mediafile label Apr 29, 2019
@arcresu arcresu added this to the MediaFile++ milestone Jun 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature features we would like to implement mediafile Relates to MediaFile and should be migrated to beetbox/mediafile
Projects
None yet
Development

No branches or pull requests

4 participants