-
Notifications
You must be signed in to change notification settings - Fork 6
Issue 40 validate example data #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
a24f119
74b02d9
9db30a9
7fdddfe
dba9e26
eaa009a
af15c9a
c0acbd1
bcc5a0f
3532b15
53a05b9
3aadfea
9661da0
5332619
e05cd3c
9d50d5d
06a8294
102e1c3
d7a9756
446ec34
5088e6c
ec3c7e8
1b94140
9d8f05f
88e19b0
cb6dd18
5bc9310
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| name: Validate-Samples | ||
|
|
||
| on: | ||
| push: | ||
| paths: | ||
| - 'data/*/TIDES/*' | ||
| - 'spec/*' | ||
| pull_request: | ||
| paths: | ||
| - 'data/*/TIDES/*' | ||
| - 'spec/*' | ||
| workflow_dispatch: | ||
| create: | ||
|
|
||
| jobs: | ||
| validate: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v2 | ||
| - name: Validate data | ||
| uses: frictionlessdata/repository@v2 | ||
botanize marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| with: | ||
| packages: "data/*/TIDES/datapackage.json" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,13 @@ | ||
| .DS_Store* | ||
| .vscode* | ||
| /__pycache__/* | ||
| .env | ||
| /venv* | ||
| /site | ||
| /__pycache__ | ||
| /site/* | ||
| # pages that are copied in from main repo | ||
| /docs/CONTRIBUTING.md | ||
| /docs/CODE_OF_CONDUCT.md | ||
| /docs/README.md | ||
| # pages that are generated from templates | ||
| /docs/tables.md | ||
| /docs/architecture.md |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,14 +30,6 @@ Directories with TIDES data must contain metadata in a [`datapackage.json`][tide | |
|
|
||
| [`/samples/template/datapackage.json`][template-datapackage] has a template datapackage which can be used. | ||
|
|
||
| ## Sample Data | ||
|
|
||
| [Sample data][samples] can be found in the `/samples` directory, with one directory for each sample. | ||
|
|
||
| ### Template | ||
|
|
||
| Templates of `datapackage.json` and each TIDES file type are located in the `/samples/template` directory. | ||
|
|
||
| ## Validating TIDES data | ||
|
|
||
| TIDES data with a valid [`datapackage.json`](#data-package) can be easily validated using the [frictionless framework], which can be installed and invoked as follows: | ||
|
|
@@ -53,6 +45,22 @@ Several other validation scripts and tools with more flexibility such as validat | |
| bin/validate-datapackage [-v remote_spec_ref | -l local_spec_path] [-d dataset_path] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are 5 scripts in bin/ and the usage information isn't always clear about why you'd use one script or another, in addition, the names are all very similar. I think it would help a lot to document what the purpose of each script is here. |
||
| ``` | ||
|
|
||
| ### Specific files | ||
|
|
||
| Specific files can be validated by running the frictionless framework against them and their corresponding schemas as follows: | ||
|
|
||
| ```sh | ||
| frictionless validate vehicles.csv --schema https://raw.githubusercontent.com/TIDES-transit/TIDES/main/spec/schema.vehicles.json --schema-sync | ||
| ``` | ||
|
|
||
| ## Sample Data | ||
|
|
||
| [Sample data](https://tides-transit.github.io/TIDES/main/samples) can be found in the `/samples` directory, with one directory for each sample. | ||
|
|
||
| ### Template | ||
|
|
||
| Templates of `datapackage.json` and each TIDES file type are located in the `/samples/template` directory. They can be used to build out TIDES data, particuarly samples. Most TIDES data in practice will be directly produced as an output from software or scripts. | ||
|
|
||
| ## Contributing to TIDES | ||
|
|
||
| Those who want to help with the development of the TIDES specification should review the guidance in [contributing]. | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # Script: validate_data_package | ||
| # Description: Bash script to validate a Frictionless Data Package using the Frictionless CLI. | ||
| # Usage: validate_data_package [-v tides_version | -l local_schema_location] [-d dataset_location] | ||
| # -v tides_version: Optional. Specify the version of the TIDES specification or 'local' to | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this parameter necessary? Shouldn't this always use the schema specified in the datapackage or treat
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In fact, it seems like
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, the samples/template/TIDES doesn't validate, should it? |
||
| # use a local schema. Default is to use the schema specified in the datapackage. | ||
| # -l local_schema_location: Optional. Specify the location of the local schema directory. | ||
| # Default is '../spec'. Is only used if tides_version = local. | ||
| # -d dataset_location: Optional. Specify the location of the TIDES datapackage.json. | ||
| # Default is the current directory. | ||
|
|
||
| # Set default values | ||
| tides_version="" | ||
| local_schema_location="../spec" | ||
| dataset_location="." | ||
|
|
||
| # Parse command-line arguments | ||
| while getopts ":v:l:d:" opt; do | ||
| case $opt in | ||
| v) | ||
| tides_version=$OPTARG | ||
| ;; | ||
| l) | ||
| local_schema_location=$OPTARG | ||
| ;; | ||
| d) | ||
| dataset_location=$OPTARG | ||
| ;; | ||
| \?) | ||
| echo "Invalid option: -$OPTARG" >&2 | ||
| exit 1 | ||
| ;; | ||
| esac | ||
| done | ||
|
|
||
| # Create a temporary data package if using a different schema reference or a local schema | ||
| tmp_datapackage="" | ||
| if [ "$tides_version" != "" ]; then | ||
| tmp_datapackage=$(mktemp) | ||
| cp "$dataset_location/datapackage.json" "$tmp_datapackage" | ||
| fi | ||
|
|
||
| # Set the schema URL based on the option chosen | ||
| schema_url="" | ||
| if [ "$tides_version" == "local" ]; then | ||
| schema_path_prefix="$local_schema_location" | ||
| else | ||
| schema_path_prefix="https://raw.githubusercontent.com/TIDES-transit/TIDES/$tides_version/spec" | ||
| fi | ||
|
|
||
| # Update the 'schema' property in the temporary copy of the datapackage.json file, if applicable | ||
| if [ "$tmp_datapackage" != "" ]; then | ||
| schema_file=$(echo "$tmp_datapackage" | sed 's/\//\\\//g') | ||
| sed -E -i "s/\"schema\": \"[^\/]+\.schema\.json\"/\"schema\": \"$schema_path_prefix\/\${schema_file##*\/}\"/g" "$tmp_datapackage" | ||
| dataset_location="$tmp_datapackage" | ||
| fi | ||
|
|
||
| # Validate the data package JSON against the TIDES schema | ||
| ./validate-data-package-json -v "$tides_version" -f "$dataset_location" -l "$local_schema_location" | ||
|
|
||
| # Validate the Frictionless Data Package using the Frictionless CLI | ||
| frictionless validate "$dataset_location" --schema-sync | ||
|
|
||
| # Remove the temporary data package file, if applicable | ||
| if [ "$tmp_datapackage" != "" ]; then | ||
| rm "$tmp_datapackage" | ||
| fi | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # Script to validate a local JSON file against a schema specified in a GitHub repository. | ||
| # Usage: validate-data-package-json [-r ref | -l local_schema_location] [-f datapackage_file] | ||
| # -r ref: Optional. Specify the ref name of the GitHub repository. Default is 'main'. | ||
| # -l local_schema_location: Optional. Specify the location of the local schema directory. | ||
| # -f datapackage_file: Optional. Specify the location of the datapackage.json file. Default is 'datapackage.json' in the execution directory. | ||
|
|
||
| # Check if jsonschema-cli is installed | ||
| command -v jsonschema-cli >/dev/null 2>&1 || { | ||
| echo >&2 "jsonschema-cli is required but not found. You can install it using 'pip install jsonschema-cli'. Aborting." | ||
| exit 1 | ||
| } | ||
|
|
||
| # Set default values | ||
| ref="main" | ||
| local_schema_location="" | ||
| datapackage_file="datapackage.json" | ||
|
|
||
| # Parse command-line arguments | ||
| while getopts ":r:l:f:" opt; do | ||
| case $opt in | ||
| r) | ||
| ref=$OPTARG | ||
| ;; | ||
| l) | ||
| local_schema_location=$OPTARG | ||
| ;; | ||
| f) | ||
| datapackage_file=$OPTARG | ||
| ;; | ||
| \?) | ||
| echo "Invalid option: -$OPTARG" >&2 | ||
| exit 1 | ||
| ;; | ||
| esac | ||
| done | ||
|
|
||
| echo "Validating data package file in $dataset_location" | ||
|
|
||
| # Set the temporary directory path | ||
| temp_dir=$(mktemp -d) | ||
|
|
||
| # Set the schema file path based on the option chosen | ||
| schema_file="" | ||
| if [ "$local_schema_location" != "" ]; then | ||
| schema_file="$local_schema_location/tides-data-package.json" | ||
| else | ||
| # Download the schema file to the temporary directory | ||
| schema_url="https://raw.githubusercontent.com/TIDES-transit/TIDES/$ref/spec/tides-data-package.json" | ||
| schema_file="$temp_dir/data-package.json" | ||
|
|
||
| if curl -s --head "$schema_url/tides-data-package.json" >/dev/null; then | ||
| echo "Schema file not found on GitHub for the specified TIDES version: $tides_version" | ||
| exit 1 | ||
| fi | ||
| curl -o "$schema_file" "$schema_url" | ||
| fi | ||
|
|
||
| # Validate datapackage against the downloaded schema | ||
| jsonschema-cli validate "$schema_file" "$datapackage_file" | ||
|
|
||
| # Clean up the temporary directory | ||
| rm -rf "$temp_dir" |
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This script and validate-data-package-json use |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,16 +1,16 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| description="Script to validate a local JSON file against a profile for tides-data-package specified in | ||
| description="Script to validate a local JSON file against a profile for tides-data-package specified in | ||
| profile field or optionally against a remote or local profile." | ||
|
|
||
| usage=" | ||
| Usage: validate-datapackage-to-profile [-r remote_spec_ref | -l local_spec_path] [-f datapackage_file] | ||
| -r remote_spec_ref: Optional. Specify the ref name of the GitHub repository for validating agianst | ||
| a remote profile where the profile is in the sub-path /spec/tides-data-package.json. | ||
| a remote profile where the profile is in the sub-path /spec/tides-data-package.json. | ||
| Should not be used with -l option. Example: -r main | ||
| -l local_spec_path: Optional. Specify the location of the local tides-data-package-json to use. | ||
| Should not be used with -r option. Example: -l spec | ||
| -d dataset_path: Optional. Specify the path of the datapackage.json file. | ||
| -d dataset_path: Optional. Specify the path of the datapackage.json file. | ||
|
Comment on lines
7
to
+13
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Usage line says -f, but help and parsing use -d |
||
| Default is datapackage.json. Example: -d samples/template/TIDES/datapackage.json | ||
| " | ||
|
|
||
|
|
@@ -97,7 +97,7 @@ fi | |
| if [ -f "$dataset_path" ]; then | ||
| datapackage_file="$dataset_path" | ||
| dataset_path=$(dirname "$dataset_path") | ||
| else | ||
| else | ||
| datapackage_file="$dataset_path/datapackage.json" | ||
| fi | ||
| check_valid_path "$datapackage_file" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,3 @@ | ||
| {{ include_file('README.md', downshift_h1= False) }} | ||
| # TIDES Transit Specification Suite | ||
|
|
||
| {{ include_file('README.md', start_line = 2, downshift_h1= False) }} |
Uh oh!
There was an error while loading. Please reload this page.