As a developer of the data, I would like to make sure that there is a robust data cleaning processes used to ensure all the data is clean when uploaded, and that no records are missing due to insufficient data cleaning efforts.
More details:
There are up to 200k activity records missing from the hub as they didn't correctly upload to Postgres, and therefore aren't in the search engine (which was done to avoid pages redirecting to nowhere). Instead, a robust data cleaning effort should be done to make sure those 200k records can be uploaded correctly, and ideally in bulk using one CSV instead of the current process (which uploads them line-by-line and is particularly slow on the DigitalOcean-hosted database)
Deliverable: A fully cleaned & uploadable CSV of all 565k activities
Another minor consideration that is related to date cleaning:
There is a particular case when the all program data point output is "Charity provided description when other program areas are not applicable", this output should be changed to "Not Available" (unless we can find this 'description' somewhere else?)
As a developer of the data, I would like to make sure that there is a robust data cleaning processes used to ensure all the data is clean when uploaded, and that no records are missing due to insufficient data cleaning efforts.
More details:
There are up to 200k activity records missing from the hub as they didn't correctly upload to Postgres, and therefore aren't in the search engine (which was done to avoid pages redirecting to nowhere). Instead, a robust data cleaning effort should be done to make sure those 200k records can be uploaded correctly, and ideally in bulk using one CSV instead of the current process (which uploads them line-by-line and is particularly slow on the DigitalOcean-hosted database)
Deliverable: A fully cleaned & uploadable CSV of all 565k activities
Another minor consideration that is related to date cleaning:
There is a particular case when the all program data point output is "Charity provided description when other program areas are not applicable", this output should be changed to "Not Available" (unless we can find this 'description' somewhere else?)