Some authors are get treated as separate because of small differences in form of name.
- e.g. I turn up as 'Andrew Jackson' and 'Andrew N. Jackson'
- Jack O'Sullivan reports three different forms
This is quite difficult to fix in general, but can be fixed manually, given a slightly richer data model and some clarify over where the 'master' copy of this data should reside.
One alternative measure would be to have a simple 'authority file' that matched specific names to a canonical form. This doesn't scale very well with the number of authors (as it can't handle different people having the same name), and unless the data model is modified, would also force the name itself into canonical form and away from what is recorded as being on the publication. The advantage would be that this can be deployed as a 'patch' over the source data, and so chained into the analysis process as an overlay rather than a fork.
Some authors are get treated as separate because of small differences in form of name.
This is quite difficult to fix in general, but can be fixed manually, given a slightly richer data model and some clarify over where the 'master' copy of this data should reside.
One alternative measure would be to have a simple 'authority file' that matched specific names to a canonical form. This doesn't scale very well with the number of authors (as it can't handle different people having the same name), and unless the data model is modified, would also force the name itself into canonical form and away from what is recorded as being on the publication. The advantage would be that this can be deployed as a 'patch' over the source data, and so chained into the analysis process as an overlay rather than a fork.