Skip to content

Match authors across papers with different forms of name #24

Description

@anjackson

Some authors are get treated as separate because of small differences in form of name.

  • e.g. I turn up as 'Andrew Jackson' and 'Andrew N. Jackson'
  • Jack O'Sullivan reports three different forms

This is quite difficult to fix in general, but can be fixed manually, given a slightly richer data model and some clarify over where the 'master' copy of this data should reside.

One alternative measure would be to have a simple 'authority file' that matched specific names to a canonical form. This doesn't scale very well with the number of authors (as it can't handle different people having the same name), and unless the data model is modified, would also force the name itself into canonical form and away from what is recorded as being on the publication. The advantage would be that this can be deployed as a 'patch' over the source data, and so chained into the analysis process as an overlay rather than a fork.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions