Skip to content

Sample Data: document, update scripts to validate, update tests#162

Merged
SorenSpicknall merged 75 commits intomainfrom
sample-data-doc
Aug 30, 2023
Merged

Sample Data: document, update scripts to validate, update tests#162
SorenSpicknall merged 75 commits intomainfrom
sample-data-doc

Conversation

@e-lo
Copy link
Copy Markdown
Contributor

@e-lo e-lo commented Jul 22, 2023

Pull Request

This Pull Request composes the necessary documentation and tests to develop, validate, and navigate TIDES data samples.

Documentation:

  1. Documents sample data structure in samples/README.md (which then is added to the /samples page of documentation)
  2. Documents available samples in samples.md (which becomes the /samples page of the documentation)
  3. Documents the structure of a tides data package using a macro which reads tides-datapackage-profile
  4. Updates CONTRIBUTING.md documentation

Code:

  1. Adds a very lightweight script in samples/template/scripts/create_template_files.py which updates the template files based on the fields in the various table schemas. note: this isn't run automagically anywhere b/c I think this wasn't a universally desired feature
    2 Add validation script /bin/validate-datapackage: validate a datapackage and its contents based on profile/spec specified in the datapackage, a local spec, or a remote spec based on its github reference.
  2. Add validation script /bin/validate-datapackage-to-profile: validate a datapackage file to the tides datapackage profile based on profile/spec specified in the datapackage, a local spec, or a remote spec based on its github reference.
  3. Splits tests/test_all into three scripts that we may want to use with different behaviors:
  • test_local_spec validates the local spec, documentation, and code
  • test_samples_to_canonical validates the local samples to the canonical spec and
  • test_samples_to_local validates local samples to local spec

Spec:

  1. Updates tides-datapackage-profile table schemas to be a string location to be consistent with most other profiles rather than an object.
  2. Updates tides-datapackage-profile examples to make it easier to use in documentation.
  3. Removes table-schema.json b/c I don't think we needed it

Replaces #100 which was mistakenly from a fork...

  • Pull-requests must address an existing issue
  • Update relevant documentation.

By contributing to this project, all contributors certify to the Developer Certificate of Origin in CONTRIBUTING.md.

e-lo added 3 commits July 14, 2023 14:52
Also - update errors/clarity in datapackage.json template
- rename tides-data-package.json --> tides-datapackage-profile.json to be consistent with how other frictionless profiles are named and reduce confusion with an actual data package.
- Add 4 shell scripts to /bin: (1) utilities to check presence of packages, files etc. and provide good help messages (2) update a datapackage file temporarily to point to another spec location (3) validate a datapackage file to the tides datapackage profile (4) validate a datapackage and its contents
- Update contributing.md documentation
- Removes table-schema.json b/c I don't thik we needed it
- Updates tests/ files to have three primary scripts to run: (1) test_local_spec which valdiates the local spec (2) test_samples_to_canonical which validates the local samples to the canonical spec and (3) test_samples_to_local which validates local samples to local spec.
- Updates table schemas to be a string location to be consistent with most other profiles rather than an object.
@e-lo e-lo requested review from SorenSpicknall and botanize July 22, 2023 01:09
@github-actions
Copy link
Copy Markdown
Contributor

Documentation available at: http://tides-transit.github.io/TIDES/sample-data-doc

@github-actions
Copy link
Copy Markdown
Contributor

Documentation available at: http://tides-transit.github.io/TIDES/sample-data-doc

@e-lo e-lo added 📙 docs Elaborating or updating the documentation – inline or otherwise 💻 code Pertains to the infrastructure code labels Jul 24, 2023
@e-lo e-lo self-assigned this Jul 24, 2023
@e-lo e-lo added this to the v1.0 milestone Jul 24, 2023
Superceded.
@github-actions
Copy link
Copy Markdown
Contributor

Documentation available at: http://tides-transit.github.io/TIDES/sample-data-doc

@github-actions
Copy link
Copy Markdown
Contributor

Documentation available at: http://tides-transit.github.io/TIDES/sample-data-doc

@github-actions
Copy link
Copy Markdown
Contributor

Data Validation Report

Sample Status
./samples/template/TIDES ✔️

@github-actions
Copy link
Copy Markdown
Contributor

Data Validation Report

Sample Status
./samples/template/TIDES ✔️

@github-actions
Copy link
Copy Markdown
Contributor

Data Validation Report

Sample Status
./samples/template/TIDES ⚠️

@github-actions
Copy link
Copy Markdown
Contributor

Data Validation Report

Sample Status
./samples/template/TIDES ✔️

@SorenSpicknall
Copy link
Copy Markdown
Contributor

SorenSpicknall commented Aug 25, 2023

@e-lo, this PR includes the first commit of these samples, tides-datapackage-profile.json, and validation of those samples using the new profile. Since the validation process relies on the contents of samples/template/TIDES/datapackage.json, which points to raw files on the main branch, validation was failing on tides-datapackage-profile.json, which doesn't yet exist on that branch. During testing, I pointed the data package file to this branch rather than the main branch to work through other issues and ensure that validation will work once merged.

After working around a bundle of issues related to JSON parsing/transport between different steps in the validation workflow, I kicked off two runs of validation (one with errors and one without) to show that the GitHub workflow works on both - prior to the JSON parsing changes, the validation step and/or the comment publishing step would fail to complete on any non-passing data (for multiple reasons that probably aren't worth going into here).

I changed the ref in samples/template/TIDES/datapackage.json to point to this branch in order to show that validation could pass, but directly before or after we merge this we'll need to point the URLs referenced in that file back to main instead of the branch references that were used for testing this new validation step.

Other than that follow-up task at merge time, this looks ready to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 code Pertains to the infrastructure code 📙 docs Elaborating or updating the documentation – inline or otherwise

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants