Skip to content
DanCoughlin edited this page Feb 18, 2021 · 6 revisions

Providing a list of frequently asked questions for the Metadata Database
How frequently is data refreshed
Are duplicate publications sent emails for open access?. What is the difference between authors and contributors.

What is the process for merging and tracking duplicate publications?
11/8/2109
Whenever we import a new publication, we create a record in two different tables in the database. We create a record for the publication itself which contains the publication's metadata, and we create a record of the import which contains information about the source of the import (the name of the source, and the unique identifier for the publication within that source). The record of the import has a foreign key to the record of the publication.

Then, whenever we run a publication import, for each publication in the input data, the first thing that we do is look for an import record that has the same source name and unique identifier. If one exists, then we know that we already imported the publication from this source, and we may or may not update the existing publication record depending on its import source and whether or not the record has been updated by an administrator. If no import record exists, we create a new record as described above.

When we discover two publication records that are duplicates, each of those will have its own import record as well. For example, Publication Record 1 has an associated import record with the source being "Activity Insight" and the source_id being "123". Publication Record 2 is a duplicate of publication record 1, and it has an associated import record with the source being "Pure" and the source_id being "789". Whenever we merge these two publication records, we do the following:

  • We pick one publication record to keep (probably the one with the most complete/accurate metadata).
  • We take the import record from the publication record that we're not keeping and reassign it to the publication record that we are keeping.
  • We delete the publication record that we're not keeping.

So in the example, if we merge the duplicate publications and decide to keep Publication Record 2, then it will now have both import records - the one from Pure and the one from Activity Insight, and Publication Record 1 will be deleted. Then when we reimport publications from Pure and we come to this publication in the data, we'll find the import record for Pure attached to Publication Record 2, and we won't create a new record (but may update the existing record). Likewise when we reimport publications from Activity Insight.

Now, if an admin user of the Metadata app has modified a publication record via the admin interface, that record will be flagged, and when we come across that publication in future imports, we won't update the data that we already have with data from the import because we don't want to overwrite any changes (presumably corrections or additions) that an admin has made to the record manually. This is also the case when an admin manually merges duplicate publications. The publication record that remains after the merge will be flagged as having been modified by the admin since we assume that the admin has picked the record with the best data (and may have also manually added or corrected data after the actual merge).

Can we do auto-merging?
11/8/2019
The process would mostly be the same as the manual merging that admins can do currently, except that we would only merge duplicates that include a Pure import, and we would always blindly pick the Pure version from any group of duplicate publications. In the rare case where there is more than one copy of the same publication in Pure, I suppose we would just automatically pick the first one in the list. We then would perform the same merge process, except that we would not flag the publication that remains after the merge, since it was not manually modified. When we ran future imports, the records that resulted from the merge would still be updated by the new imports, assuming that they hadn't been updated by and admin in the meantime. However, this would still follow our existing rules for import precedence - i.e. a publication record that has a Pure import record will not be updated by an Activity Insight import of the same publication data, but will be updated by a new Pure import, and a publication record with no Pure import record will be updated by a new Activity Insight import.

If we do auto-merge, could we back it out?
11/08/2019
We could find all of the publications that have an associated import record from Pure and at least one other import record from any source and have not been flagged as having been modified by an admin user. We could then simply delete those publication records along with their associated import records and rerun the publication import for each data source (here we'd be assuming that the publications still exist in all of the original import sources).

Where should metadata in MDDB be updated, if you update data in MDDB how does that affect future data imports
11/08/2019
If you update a record (or an admin) updates a record in MDDB (which they really shouldn't do, it should be done in the system of record) then that user or that publication will no longer accept updates from data refreshes. For example, if I update my name in MDDB from Daniel to Danny the User record will no longer be updated. If I update a publication of mine in the MDDB then that publication will no longer be updated in the MDDB.

How can I add users to an organization?
11/08/2019
Currently you can add a single user to a bunch of organizations when you're editing that user. You cannot, as of 11/08/2019, add multiple users to an organization.

Is there an endpoint to return all faculty for a given organization?
11/08/2019
We don't have an API endpoint that returns all the faculty for a given organization, we do currently have an endpoint that can return all of the publications for an organization, but the pubs aren't grouped by user.

How do we provide grant data via the API
Currently there are 2 ways

  1. an API endpoint that lists grants that are associated with a given user (GET /v1/users/{webaccess_id}/grants).
  2. an endpoint that lists detailed info about each grant associated with a given publication, as we're currently doing for users, (i.e. GET /v1/publications/{id}/grants).
Clone this wiki locally