Reconciling various metadata inputs #77

rickj · 2021-09-29T17:10:53Z

A given title can have three (potentially conflicting) metadata feeds:

What is in the file itself
What is sent via ONIX
What is sent from the publisher independently of either of the above (typically via a spreadsheet)

We would like guidance on how to handle this.

What we currently do when viewing the metadata is you will see a11y data in three "buckets": Source file, publisher, and ONIX. These sets of data are unique, independent from each other, and can have duplicate or conflicting information. We are providing all the data we have about an asset, unaltered, so that purchasers can make informed decisions.

Our goal is to collect as much information about a title as possible. Therefore, collecting a11y data via these three buckets is summative. Updating a source file will not cause the other two buckets to empty. Sending through a spreadsheet full of onix data will not cause the publisher information to disappear, etc.

Any time one bucket is updated, that bucket will only contain the most recent information sent through. Sending a spreadsheet with abc data, then another with xyz data, will only display xyz. We will not display abcxyz.

dauwhe · 2021-09-30T16:17:23Z

The EPUB spec itself does provide some guidance for conflicting metadata, but it does require that the EPUB contain links to the external metadata.

When it comes to resolving discrepancies and conflicts between metadata expressed in the Package Document and in linked metadata records, Reading Systems MUST use the document order of link elements in the Package Document to establish precedence (i.e., metadata in the first linked record encountered has the highest precedence and metadata in the Package Document the lowest, regardless of whether the link elements occur before, within or after the package metadata elements).

rickj · 2021-09-30T18:32:08Z

Excellent point @dauwhe , however, in this case we are talking about metadata presented to a learner considering purchase (store page, catalog, etc.) and not in a reading system. Precedence order makes sense within a reading system for title related metadata. For display to a user outside a reading system I would imagine the desired precedence order would be reversed, as an ONIX feed, or publisher direct feed/spreadsheet would have more current metadata than a content file previously distributed thru a channel.

gautierchomel · 2023-11-15T15:05:51Z

See #189 & #191

gautierchomel closed this as completed Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconciling various metadata inputs #77

Reconciling various metadata inputs #77

rickj commented Sep 29, 2021 •

edited

Loading

dauwhe commented Sep 30, 2021

rickj commented Sep 30, 2021

gautierchomel commented Nov 15, 2023

Reconciling various metadata inputs #77

Reconciling various metadata inputs #77

Comments

rickj commented Sep 29, 2021 • edited Loading

dauwhe commented Sep 30, 2021

rickj commented Sep 30, 2021

gautierchomel commented Nov 15, 2023

rickj commented Sep 29, 2021 •

edited

Loading