You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the postgres database, there is a table principal_investigator, which stores PIs as rows. These are populated during study ingest. This information is extracted from the principal_investigator slot. Note that in mongo, PI information is stored inline. This means that if multiple studies share a PI, that information is duplicated. On ingest, we do our best to represent each PI only once:
name is not a good equality check for PIs. If a PI is expressed differently in two different projects (e.g. "Mike" vs "Michael"), they will be represented twice in our database. The biggest consequence is that there may be two entries in the PI facet for the same person.
Potential Solution
On our end, we could prevent duplicate PIs by checking ORCid ID instead of name. Each PI should really only have one ORCid ID.
The text was updated successfully, but these errors were encountered:
Background
In the postgres database, there is a table
principal_investigator
, which stores PIs as rows. These are populated during study ingest. This information is extracted from theprincipal_investigator
slot. Note that in mongo, PI information is stored inline. This means that if multiple studies share a PI, that information is duplicated. On ingest, we do our best to represent each PI only once:nmdc-server/nmdc_server/ingest/study.py
Lines 20 to 23 in 8eb94dd
Problem
name
is not a good equality check for PIs. If a PI is expressed differently in two different projects (e.g. "Mike" vs "Michael"), they will be represented twice in our database. The biggest consequence is that there may be two entries in the PI facet for the same person.Potential Solution
On our end, we could prevent duplicate PIs by checking ORCid ID instead of name. Each PI should really only have one ORCid ID.
The text was updated successfully, but these errors were encountered: